-
Performance triggered adaptive model reduction for soil moisture estimation in precision irrigation
Authors:
Sarupa Debnath,
Bernard T. Agyeman,
Soumya R. Sahoo,
Xunyuan Yin,
**feng Liu
Abstract:
Accurate soil moisture information is crucial for develo** precise irrigation control strategies to enhance water use efficiency. Soil moisture estimation based on limited soil moisture sensors is crucial for obtaining comprehensive soil moisture information when dealing with large-scale agricultural fields. The major challenge in soil moisture estimation lies in the high dimensionality of the s…
▽ More
Accurate soil moisture information is crucial for develo** precise irrigation control strategies to enhance water use efficiency. Soil moisture estimation based on limited soil moisture sensors is crucial for obtaining comprehensive soil moisture information when dealing with large-scale agricultural fields. The major challenge in soil moisture estimation lies in the high dimensionality of the spatially discretized agro-hydrological models. In this work, we propose a performance-triggered adaptive model reduction approach to address this challenge. The proposed approach employs a trajectory-based unsupervised machine learning technique, and a prediction performance-based triggering scheme is designed to govern model updates adaptively in a way such that the prediction error between the reduced model and the original model over a prediction horizon is maintained below a predetermined threshold. An adaptive extended Kalman filter (EKF) is designed based on the reduced model for soil moisture estimation. The applicability and performance of the proposed approach are evaluated extensively through the application to a simulated large-scale agricultural field.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Reduced-order Koopman modeling and predictive control of nonlinear processes
Authors:
Xuewen Zhang,
Minghao Han,
Xunyuan Yin
Abstract:
In this paper, we propose an efficient data-driven predictive control approach for general nonlinear processes based on a reduced-order Koopman operator. A Kalman-based sparse identification of nonlinear dynamics method is employed to select lifting functions for Koopman identification. The selected lifting functions are used to project the original nonlinear state-space into a higher-dimensional…
▽ More
In this paper, we propose an efficient data-driven predictive control approach for general nonlinear processes based on a reduced-order Koopman operator. A Kalman-based sparse identification of nonlinear dynamics method is employed to select lifting functions for Koopman identification. The selected lifting functions are used to project the original nonlinear state-space into a higher-dimensional linear function space, in which Koopman-based linear models can be constructed for the underlying nonlinear process. To curb the significant increase in the dimensionality of the resulting full-order Koopman models caused by the use of lifting functions, we propose a reduced-order Koopman modeling approach based on proper orthogonal decomposition. A computationally efficient linear robust predictive control scheme is established based on the reduced-order Koopman model. A case study on a benchmark chemical process is conducted to illustrate the effectiveness of the proposed method. Comprehensive comparisons are conducted to demonstrate the advantage of the proposed method.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
Optimal Control Synthesis of Markov Decision Processes for Efficiency with Surveillance Tasks
Authors:
Yu Chen,
Xuanyuan Yin,
Shaoyuan Li,
Xiang Yin
Abstract:
We investigate the problem of optimal control synthesis for Markov Decision Processes (MDPs), addressing both qualitative and quantitative objectives. Specifically, we require the system to fulfill a qualitative surveillance task in the sense that a specific region of interest can be visited infinitely often with probability one. Furthermore, to quantify the performance of the system, we consider…
▽ More
We investigate the problem of optimal control synthesis for Markov Decision Processes (MDPs), addressing both qualitative and quantitative objectives. Specifically, we require the system to fulfill a qualitative surveillance task in the sense that a specific region of interest can be visited infinitely often with probability one. Furthermore, to quantify the performance of the system, we consider the concept of efficiency, which is defined as the ratio between rewards and costs. This measure is more general than the standard long-run average reward metric as it aims to maximize the reward obtained per unit cost. Our objective is to synthesize a control policy that ensures the surveillance task while maximizes the efficiency. We provide an effective approach to synthesize a stationary control policy achieving $ε$-optimality by integrating state classifications of MDPs and perturbation analysis in a novel manner. Our results generalize existing works on efficiency-optimal control synthesis for MDP by incorporating qualitative surveillance tasks. A robot motion planning case study is provided to illustrate the proposed algorithm.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Prioritize Team Actions: Multi-Agent Temporal Logic Task Planning with Ordering Constraints
Authors:
Bowen Ye,
Jianing Zhao,
Shaoyuan Li,
Xiang Yin
Abstract:
In this paper, we investigate the problem of linear temporal logic (LTL) path planning for multi-agent systems, introducing the new concept of \emph{ordering constraints}. Specifically, we consider a generic objective function that is defined for the path of each individual agent. The primary objective is to find a global plan for the team of agents, ensuring they collectively meet the specified L…
▽ More
In this paper, we investigate the problem of linear temporal logic (LTL) path planning for multi-agent systems, introducing the new concept of \emph{ordering constraints}. Specifically, we consider a generic objective function that is defined for the path of each individual agent. The primary objective is to find a global plan for the team of agents, ensuring they collectively meet the specified LTL requirements. Simultaneously, we aim to maintain a pre-determined order in the values of the objective function for each agent, which we refer to as the ordering constraints. This new requirement stems from scenarios like security-aware planning, where relative orders outweigh absolute values in importance. We present an efficient algorithm to solve this problem, supported by proofs of correctness that demonstrate the optimality of our solution. Additionally, we provide a case study in security-aware path planning to illustrate the practicality and effectiveness of our proposed approach.
△ Less
Submitted 8 April, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
Boosting Adversarial Training via Fisher-Rao Norm-based Regularization
Authors:
Xiangyu Yin,
Wenjie Ruan
Abstract:
Adversarial training is extensively utilized to improve the adversarial robustness of deep neural networks. Yet, mitigating the degradation of standard generalization performance in adversarial-trained models remains an open problem. This paper attempts to resolve this issue through the lens of model complexity. First, We leverage the Fisher-Rao norm, a geometrically invariant metric for model com…
▽ More
Adversarial training is extensively utilized to improve the adversarial robustness of deep neural networks. Yet, mitigating the degradation of standard generalization performance in adversarial-trained models remains an open problem. This paper attempts to resolve this issue through the lens of model complexity. First, We leverage the Fisher-Rao norm, a geometrically invariant metric for model complexity, to establish the non-trivial bounds of the Cross-Entropy Loss-based Rademacher complexity for a ReLU-activated Multi-Layer Perceptron. Then we generalize a complexity-related variable, which is sensitive to the changes in model width and the trade-off factors in adversarial training. Moreover, intensive empirical evidence validates that this variable highly correlates with the generalization gap of Cross-Entropy loss between adversarial-trained and standard-trained models, especially during the initial and final phases of the training process. Building upon this observation, we propose a novel regularization framework, called Logit-Oriented Adversarial Training (LOAT), which can mitigate the trade-off between robustness and accuracy while imposing only a negligible increase in computational overhead. Our extensive experiments demonstrate that the proposed regularization strategy can boost the performance of the prevalent adversarial training algorithms, including PGD-AT, TRADES, TRADES (LSE), MART, and DM-AT, across various network architectures. Our code will be available at https://github.com/TrustAI/LOAT.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Authors:
Yuda Song,
Zehao Sun,
Xuanwu Yin
Abstract:
Recent advancements in diffusion models have positioned them at the forefront of image generation. Despite their superior performance, diffusion models are not without drawbacks; they are characterized by complex architectures and substantial computational demands, resulting in significant latency due to their iterative sampling process. To mitigate these limitations, we introduce a dual approach…
▽ More
Recent advancements in diffusion models have positioned them at the forefront of image generation. Despite their superior performance, diffusion models are not without drawbacks; they are characterized by complex architectures and substantial computational demands, resulting in significant latency due to their iterative sampling process. To mitigate these limitations, we introduce a dual approach involving model miniaturization and a reduction in sampling steps, aimed at significantly decreasing model latency. Our methodology leverages knowledge distillation to streamline the U-Net and image decoder architectures, and introduces an innovative one-step DM training technique that utilizes feature matching and score distillation. We present two models, SDXS-512 and SDXS-1024, achieving inference speeds of approximately 100 FPS (30x faster than SD v1.5) and 30 FPS (60x faster than SDXL) on a single GPU, respectively. Moreover, our training approach offers promising applications in image-conditioned control, facilitating efficient image-to-image translation.
△ Less
Submitted 16 April, 2024; v1 submitted 25 March, 2024;
originally announced March 2024.
-
Conformal Perturbation Theory and Tachyon-Dilaton Eschatology via String Fields
Authors:
Ben Mazel,
Joshua Sandor,
Charles Wang,
Xi Yin
Abstract:
We analyze deformations of two-dimensional conformal field theory (CFT) from the perspective of classical bosonic closed string field theory (SFT). The latter can be viewed as a version of Wilsonian renormalization group (RG) improved conformal perturbation theory, where the renormalization scheme is defined through the choice of string vertices in the construction of SFT. Furthermore, the CFT dat…
▽ More
We analyze deformations of two-dimensional conformal field theory (CFT) from the perspective of classical bosonic closed string field theory (SFT). The latter can be viewed as a version of Wilsonian renormalization group (RG) improved conformal perturbation theory, where the renormalization scheme is defined through the choice of string vertices in the construction of SFT. Furthermore, the CFT data at the RG fixed point can be recovered from the spectrum and amplitudes of string field fluctuations. As applications, we construct the Horowitz-Polchinski "string star" solution in SFT, and a solution of tachyon-dilaton condensation that deforms the noncompact boson to minimal models by creating a pair of "Runkel-Watts walls".
△ Less
Submitted 17 May, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
E-DoH: Elegantly Detecting the Depths of Open DoH Service on the Internet
Authors:
Cong Dong,
Jiahai Yang,
Yun Li,
Yue Wu,
Yufan Chen,
Chenglong Li,
Haoran Jiao,
Xia Yin,
Yuling Liu
Abstract:
In recent years, DNS over Encrypted (DoE) methods have been regarded as a novel trend within the realm of the DNS ecosystem. In these DoE methods, DNS over HTTPS (DoH) provides encryption to protect data confidentiality while providing better obfuscation to avoid censorship by multiplexing port 443 with web services. This development introduced certain inconveniences in discovering publicly availa…
▽ More
In recent years, DNS over Encrypted (DoE) methods have been regarded as a novel trend within the realm of the DNS ecosystem. In these DoE methods, DNS over HTTPS (DoH) provides encryption to protect data confidentiality while providing better obfuscation to avoid censorship by multiplexing port 443 with web services. This development introduced certain inconveniences in discovering publicly available DoH services. In this paper, we propose the E-DoH method for elegant and efficient DoH service detection. First, we optimized the probing mechanism to enable a single DoH connection to accomplish multiple tasks including service discovery, correctness validation and dependency construction. Second, we propose an efficient DoH detection tool. This tool can enhance probing efficiency while significantly reduce the required traffic volume. Third, based on the above optimization methods, we conducted an exploration of the IPv4 space and performed an in-depth analysis of DoH based on the collected information. Through experiments, our approach demonstrates a remarkable 80% improvement in time efficiency, and only requires 4%-20% traffic volume to complete the detection task. In wild detection, our approach discovered 46k DoH services, which nearly doubles the number discovered by the state-of-the-art. Based on the collected data, we present several intriguing conclusions about the current DoH service ecosystem.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
New constraints on Triton's atmosphere from the 6 October 2022 stellar occultation
Authors:
Ye Yuan,
Chen Zhang,
Fan Li,
Jian Chen,
Yanning Fu,
Chunhai Bai,
Xing Gao,
Yong Wang,
Tuhong Zhong,
Yixing Gao,
Liang Wang,
Donghua Chen,
Yixing Zhang,
Yang Zhang,
Wenpeng Xie,
Shupi Zhang,
Ding Liu,
Jun Cao,
Xiangdong Yin,
Xiaojun Mo,
**g Liu,
Xinru Han,
Tong Liu,
Yuqiang Chen,
Zhendong Gao
, et al. (25 additional authors not shown)
Abstract:
The atmosphere of Triton was probed directly by observing a ground-based stellar occultation on 6 October 2022. This rare event yielded 23 positive light curves collected from 13 separate observation stations contributing to our campaign. The significance of this event lies in its potential to directly validate the modest pressure fluctuation on Triton, a phenomenon not definitively verified by pr…
▽ More
The atmosphere of Triton was probed directly by observing a ground-based stellar occultation on 6 October 2022. This rare event yielded 23 positive light curves collected from 13 separate observation stations contributing to our campaign. The significance of this event lies in its potential to directly validate the modest pressure fluctuation on Triton, a phenomenon not definitively verified by previous observations, including only five stellar occultations, and the Voyager 2 radio occultation in 1989. Using an approach consistent with a comparable study, we precisely determined a surface pressure of $14.07_{-0.13}^{+0.21}~\mathrm{μbar}$ in 2022. This new pressure rules out any significant monotonic variation in pressure between 2017 and 2022 through direct observations, as it is in alignment with the 2017 value. Additionally, both the pressures in 2017 and 2022 align with the 1989 value. This provides further support for the conclusion drawn from the previous volatile transport model simulation, which is consistent with the observed alignment between the pressures in 1989 and 2017; that is to say, the pressure fluctuation is modest. Moreover, this conclusion suggests the existence of a northern polar cap extended down to at least $45^\circ$N$-60^\circ$N and the presence of nitrogen between $30^\circ$S and $0^\circ$.
△ Less
Submitted 24 March, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
OccFiner: Offboard Occupancy Refinement with Hybrid Propagation
Authors:
Hao Shi,
Song Wang,
Jiaming Zhang,
Xiaoting Yin,
Zhongdao Wang,
Zhijian Zhao,
Guangming Wang,
Jianke Zhu,
Kailun Yang,
Kaiwei Wang
Abstract:
Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision. Previous methods, confined to onboard processing, struggle with simultaneous geometric and semantic estimation, continuity across varying viewpoints, and single-view occlusion. Our paper introduces OccFiner, a novel offboard framework designed to enhance the acc…
▽ More
Vision-based occupancy prediction, also known as 3D Semantic Scene Completion (SSC), presents a significant challenge in computer vision. Previous methods, confined to onboard processing, struggle with simultaneous geometric and semantic estimation, continuity across varying viewpoints, and single-view occlusion. Our paper introduces OccFiner, a novel offboard framework designed to enhance the accuracy of vision-based occupancy predictions. OccFiner operates in two hybrid phases: 1) a multi-to-multi local propagation network that implicitly aligns and processes multiple local frames for correcting onboard model errors and consistently enhancing occupancy accuracy across all distances. 2) the region-centric global propagation, focuses on refining labels using explicit multi-view geometry and integrating sensor bias, especially to increase the accuracy of distant occupied voxels. Extensive experiments demonstrate that OccFiner improves both geometric and semantic accuracy across various types of coarse occupancy, setting a new state-of-the-art performance on the SemanticKITTI dataset. Notably, OccFiner elevates vision-based SSC models to a level even surpassing that of LiDAR-based onboard SSC models.
△ Less
Submitted 15 March, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
Fluent: Round-efficient Secure Aggregation for Private Federated Learning
Authors:
Xincheng Li,
Jianting Ning,
Geong Sen Poh,
Leo Yu Zhang,
Xinchun Yin,
Tianwei Zhang
Abstract:
Federated learning (FL) facilitates collaborative training of machine learning models among a large number of clients while safeguarding the privacy of their local datasets. However, FL remains susceptible to vulnerabilities such as privacy inference and inversion attacks. Single-server secure aggregation schemes were proposed to address these threats. Nonetheless, they encounter practical constra…
▽ More
Federated learning (FL) facilitates collaborative training of machine learning models among a large number of clients while safeguarding the privacy of their local datasets. However, FL remains susceptible to vulnerabilities such as privacy inference and inversion attacks. Single-server secure aggregation schemes were proposed to address these threats. Nonetheless, they encounter practical constraints due to their round and communication complexities. This work introduces Fluent, a round and communication-efficient secure aggregation scheme for private FL. Fluent has several improvements compared to state-of-the-art solutions like Bell et al. (CCS 2020) and Ma et al. (SP 2023): (1) it eliminates frequent handshakes and secret sharing operations by efficiently reusing the shares across multiple training iterations without leaking any private information; (2) it accomplishes both the consistency check and gradient unmasking in one logical step, thereby reducing another round of communication. With these innovations, Fluent achieves the fewest communication rounds (i.e., two in the collection phase) in the malicious server setting, in contrast to at least three rounds in existing schemes. This significantly minimizes the latency for geographically distributed clients; (3) Fluent also introduces Fluent-Dynamic with a participant selection algorithm and an alternative secret sharing scheme. This can facilitate dynamic client joining and enhance the system flexibility and scalability. We implemented Fluent and compared it with existing solutions. Experimental results show that Fluent improves the computational cost by at least 75% and communication overhead by at least 25% for normal clients. Fluent also reduces the communication overhead for the server at the expense of a marginal increase in computational cost.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Performance Evaluation of Semi-supervised Learning Frameworks for Multi-Class Weed Detection
Authors:
Jiajia Li,
Dong Chen,
Xunyuan Yin,
Zhaojian Li
Abstract:
Effective weed control plays a crucial role in optimizing crop yield and enhancing agricultural product quality. However, the reliance on herbicide application not only poses a critical threat to the environment but also promotes the emergence of resistant weeds. Fortunately, recent advances in precision weed management enabled by ML and DL provide a sustainable alternative. Despite great progress…
▽ More
Effective weed control plays a crucial role in optimizing crop yield and enhancing agricultural product quality. However, the reliance on herbicide application not only poses a critical threat to the environment but also promotes the emergence of resistant weeds. Fortunately, recent advances in precision weed management enabled by ML and DL provide a sustainable alternative. Despite great progress, existing algorithms are mainly developed based on supervised learning approaches, which typically demand large-scale datasets with manual-labeled annotations, which is time-consuming and labor-intensive. As such, label-efficient learning methods, especially semi-supervised learning, have gained increased attention in the broader domain of computer vision and have demonstrated promising performance. These methods aim to utilize a small number of labeled data samples along with a great number of unlabeled samples to develop high-performing models comparable to the supervised learning counterpart trained on a large amount of labeled data samples. In this study, we assess the effectiveness of a semi-supervised learning framework for multi-class weed detection, employing two well-known object detection frameworks, namely FCOS and Faster-RCNN. Specifically, we evaluate a generalized student-teacher framework with an improved pseudo-label generation module to produce reliable pseudo-labels for the unlabeled data. To enhance generalization, an ensemble student network is employed to facilitate the training process. Experimental results show that the proposed approach is able to achieve approximately 76\% and 96\% detection accuracy as the supervised methods with only 10\% of labeled data in CottenWeedDet3 and CottonWeedDet12, respectively. We offer access to the source code, contributing a valuable resource for ongoing semi-supervised learning research in weed detection and beyond.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Ultralight vector dark matter search using data from the KAGRA O3GK run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
R. Abbott,
H. Abe,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
O. D. Aguiar,
I. Aguilar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi
, et al. (1778 additional authors not shown)
Abstract:
Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese…
▽ More
Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
On the origin of topotactic reduction effect for superconductivity in infinite-layer nickelates
Authors:
Shengwei Zeng,
Chi Sin Tang,
Zhaoyang Luo,
Lin Er Chow,
Zhi Shiuh Lim,
Saurav Prakash,
** Yang,
Caozheng Diao,
Xiaojiang Yu,
Zhenxiang Xing,
Rong Ji,
Xinmao Yin,
Changjian Li,
X. Renshaw Wang,
Qian He,
Mark B. H. Breese,
A. Ariando,
Huajun Liu
Abstract:
Topotactic reduction utilizing metal hydrides as reagents emerges as an effective approach to achieve exceptionally low oxidization states of metal ions and unconventional coordination networks. This method opens avenues to the development of entirely new functional materials, with one notable example being the infinite-layer nickelate superconductors. However, the reduction effect on the atomic r…
▽ More
Topotactic reduction utilizing metal hydrides as reagents emerges as an effective approach to achieve exceptionally low oxidization states of metal ions and unconventional coordination networks. This method opens avenues to the development of entirely new functional materials, with one notable example being the infinite-layer nickelate superconductors. However, the reduction effect on the atomic reconstruction and electronic structures -- crucial for superconductivity -- remains largely unresolved. We design two sets of control Nd$_{0.8}$Sr$_{0.2}$NiO$_2$ thin films and implement secondary ion mass spectroscopy to highlight the absence of reduction-induced hydrogen intercalation. X-ray absorption spectroscopy shows a significant linear dichroism with dominant Ni 3d$_{x2{-}y2}$ orbitals on superconducting samples, indicating a Ni single-band nature of infinite-layer nickelates. Consistent with the superconducting $T_c$, the Ni 3d orbitals asymmetry manifests a dome-like reduction duration dependence. Our results unveil the critical role of reduction in modulating the Ni-3d orbital polarization and its impact on the superconducting properties.
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Transformer-based Parameter Estimation in Statistics
Authors:
Xiaoxin Yin,
David S. Yin
Abstract:
Parameter estimation is one of the most important tasks in statistics, and is key to hel** people understand the distribution behind a sample of observations. Traditionally parameter estimation is done either by closed-form solutions (e.g., maximum likelihood estimation for Gaussian distribution), or by iterative numerical methods such as Newton-Raphson method when closed-form solution does not…
▽ More
Parameter estimation is one of the most important tasks in statistics, and is key to hel** people understand the distribution behind a sample of observations. Traditionally parameter estimation is done either by closed-form solutions (e.g., maximum likelihood estimation for Gaussian distribution), or by iterative numerical methods such as Newton-Raphson method when closed-form solution does not exist (e.g., for Beta distribution).
In this paper we propose a transformer-based approach to parameter estimation. Compared with existing solutions, our approach does not require a closed-form solution or any mathematical derivations. It does not even require knowing the probability density function, which is needed by numerical methods. After the transformer model is trained, only a single inference is needed to estimate the parameters of the underlying distribution based on a sample of observations. In the empirical study we compared our approach with maximum likelihood estimation on commonly used distributions such as normal distribution, exponential distribution and beta distribution. It is shown that our approach achieves similar or better accuracy as measured by mean-square-errors.
△ Less
Submitted 27 February, 2024;
originally announced March 2024.
-
EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning
Authors:
Junzhe Zhang,
Huixuan Zhang,
Xunjian Yin,
Xiaojun Wan
Abstract:
News image captioning requires model to generate an informative caption rich in entities, with the news image and the associated news article. Though Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in addressing various vision-language tasks, our research finds that current MLLMs still bear limitations in handling entity information on news image captioning task.…
▽ More
News image captioning requires model to generate an informative caption rich in entities, with the news image and the associated news article. Though Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in addressing various vision-language tasks, our research finds that current MLLMs still bear limitations in handling entity information on news image captioning task. Besides, while MLLMs have the ability to process long inputs, generating high-quality news image captions still requires a trade-off between sufficiency and conciseness of textual input information. To explore the potential of MLLMs and address problems we discovered, we propose : an Entity-Aware Multimodal Alignment based approach for news image captioning. Our approach first aligns the MLLM through Balance Training Strategy with two extra alignment tasks: Entity-Aware Sentence Selection task and Entity Selection task, together with News Image Captioning task, to enhance its capability in handling multimodal entity information. The aligned MLLM will utilizes the additional entity-related information it explicitly extracts to supplement its textual input while generating news image captions. Our approach achieves better results than all previous models in CIDEr score on GoodNews dataset (72.33 -> 88.39) and NYTimes800k dataset (70.83 -> 85.61).
△ Less
Submitted 6 May, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Learning Commonality, Divergence and Variety for Unsupervised Visible-Infrared Person Re-identification
Authors:
Jiangming Shi,
Xiangbo Yin,
Yaoxing Wang,
Xiaofeng Liu,
Yuan Xie,
Yanyun Qu
Abstract:
Unsupervised visible-infrared person re-identification (USVI-ReID) aims to match specified people in infrared images to visible images without annotation, and vice versa. USVI-ReID is a challenging yet under-explored task. Most existing methods address the USVI-ReID problem using cluster-based contrastive learning, which simply employs the cluster center as a representation of a person. However, t…
▽ More
Unsupervised visible-infrared person re-identification (USVI-ReID) aims to match specified people in infrared images to visible images without annotation, and vice versa. USVI-ReID is a challenging yet under-explored task. Most existing methods address the USVI-ReID problem using cluster-based contrastive learning, which simply employs the cluster center as a representation of a person. However, the cluster center primarily focuses on shared information, overlooking disparity. To address the problem, we propose a Progressive Contrastive Learning with Multi-Prototype (PCLMP) method for USVI-ReID. In brief, we first generate the hard prototype by selecting the sample with the maximum distance from the cluster center. This hard prototype is used in the contrastive loss to emphasize disparity. Additionally, instead of rigidly aligning query images to a specific prototype, we generate the dynamic prototype by randomly picking samples within a cluster. This dynamic prototype is used to retain the natural variety of features while reducing instability in the simultaneous learning of both common and disparate information. Finally, we introduce a progressive learning strategy to gradually shift the model's attention towards hard samples, avoiding cluster deterioration. Extensive experiments conducted on the publicly available SYSU-MM01 and RegDB datasets validate the effectiveness of the proposed method. PCLMP outperforms the existing state-of-the-art method with an average mAP improvement of 3.9%. The source codes will be released.
△ Less
Submitted 26 May, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Authors:
Tony C. W. Mok,
Zi Li,
Yunhao Bai,
Jianpeng Zhang,
Wei Liu,
Yan-Jie Zhou,
Ke Yan,
Dakai **,
Yu Shi,
Xiaoli Yin,
Le Lu,
Ling Zhang
Abstract:
Establishing dense anatomical correspondence across distinct imaging modalities is a foundational yet challenging procedure for numerous medical image analysis studies and image-guided radiotherapy. Existing multi-modality image registration algorithms rely on statistical-based similarity measures or local structural image representations. However, the former is sensitive to locally varying noise,…
▽ More
Establishing dense anatomical correspondence across distinct imaging modalities is a foundational yet challenging procedure for numerous medical image analysis studies and image-guided radiotherapy. Existing multi-modality image registration algorithms rely on statistical-based similarity measures or local structural image representations. However, the former is sensitive to locally varying noise, while the latter is not discriminative enough to cope with complex anatomical structures in multimodal scans, causing ambiguity in determining the anatomical correspondence across scans with different modalities. In this paper, we propose a modality-agnostic structural representation learning method, which leverages Deep Neighbourhood Self-similarity (DNS) and anatomy-aware contrastive learning to learn discriminative and contrast-invariance deep structural image representations (DSIR) without the need for anatomical delineations or pre-aligned training images. We evaluate our method on multiphase CT, abdomen MR-CT, and brain MR T1w-T2w registration. Comprehensive results demonstrate that our method is superior to the conventional local structural representation and statistical-based similarity measures in terms of discriminability and accuracy.
△ Less
Submitted 31 March, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
EncodingNet: A Novel Encoding-based MAC Design for Efficient Neural Network Acceleration
Authors:
Bo Liu,
Grace Li Zhang,
Xunzhao Yin,
Ulf Schlichtmann,
Bing Li
Abstract:
Deep neural networks (DNNs) have achieved great breakthroughs in many fields such as image classification and natural language processing. However, the execution of DNNs needs to conduct massive numbers of multiply-accumulate (MAC) operations on hardware and thus incurs a large power consumption. To address this challenge, we propose a novel digital MAC design based on encoding. In this new design…
▽ More
Deep neural networks (DNNs) have achieved great breakthroughs in many fields such as image classification and natural language processing. However, the execution of DNNs needs to conduct massive numbers of multiply-accumulate (MAC) operations on hardware and thus incurs a large power consumption. To address this challenge, we propose a novel digital MAC design based on encoding. In this new design, the multipliers are replaced by simple logic gates to project the results onto a wide bit representation. These bits carry individual position weights, which can be trained for specific neural networks to enhance inference accuracy. The outputs of the new multipliers are added by bit-wise weighted accumulation and the accumulation results are compatible with existing computing platforms accelerating neural networks with either uniform or non-uniform quantization. Since the multiplication function is replaced by simple logic projection, the critical paths in the resulting circuits become much shorter. Correspondingly, pipelining stages in the MAC array can be reduced, leading to a significantly smaller area as well as a better power efficiency. The proposed design has been synthesized and verified by ResNet18-Cifar10, ResNet20-Cifar100 and ResNet50-ImageNet. The experimental results confirmed the reduction of circuit area by up to 79.63% and the reduction of power consumption of executing DNNs by up to 70.18%, while the accuracy of the neural networks can still be well maintained.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Automated Discovery of Integral with Deep Learning
Authors:
Xiaoxin Yin
Abstract:
Recent advancements in the realm of deep learning, particularly in the development of large language models (LLMs), have demonstrated AI's ability to tackle complex mathematical problems or solving programming challenges. However, the capability to solve well-defined problems based on extensive training data differs significantly from the nuanced process of making scientific discoveries. Trained o…
▽ More
Recent advancements in the realm of deep learning, particularly in the development of large language models (LLMs), have demonstrated AI's ability to tackle complex mathematical problems or solving programming challenges. However, the capability to solve well-defined problems based on extensive training data differs significantly from the nuanced process of making scientific discoveries. Trained on almost all human knowledge available, today's sophisticated LLMs basically learn to predict sequences of tokens. They generate mathematical derivations and write code in a similar way as writing an essay, and do not have the ability to pioneer scientific discoveries in the manner a human scientist would do.
In this study we delve into the potential of using deep learning to rediscover a fundamental mathematical concept: integrals. By defining integrals as area under the curve, we illustrate how AI can deduce the integral of a given function, exemplified by inferring $\int_{0}^{x} t^2 dt = \frac{x^3}{3}$ and $\int_{0}^{x} ae^{bt} dt = \frac{a}{b} e^{bx} - \frac{a}{b}$. Our experiments show that deep learning models can approach the task of inferring integrals either through a sequence-to-sequence model, akin to language translation, or by uncovering the rudimentary principles of integration, such as $\int_{0}^{x} t^n dt = \frac{x^{n+1}}{n+1}$.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
The influence of the vorticity-scalar correlation on mixing
Authors:
Xi-Yuan Yin,
Wesley Agoua,
Tong Wu,
Wouter J. T. Bos
Abstract:
We investigate the role of the correlation between a scalar quantity and the vorticity in two-dimensional mixing at infinite Péclet number. We assess, using a diffusivity independent mixing-norm, the dynamics of both Galerkin-truncated ensembles and freely evolving two-dimensional scalar mixing. Both statistical mechanics and numerical experiments show how the mixing-rate is attenuated when vortic…
▽ More
We investigate the role of the correlation between a scalar quantity and the vorticity in two-dimensional mixing at infinite Péclet number. We assess, using a diffusivity independent mixing-norm, the dynamics of both Galerkin-truncated ensembles and freely evolving two-dimensional scalar mixing. Both statistical mechanics and numerical experiments show how the mixing-rate is attenuated when vorticity and scalar are initially correlated. Since the vorticity is shown to be a poorly mixing scalar, the results suggest that, in general, mixing can be enhanced by minimizing the correlation between vorticity and passive scalar.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Formal Synthesis of Controllers for Safety-Critical Autonomous Systems: Developments and Challenges
Authors:
Xiang Yin,
Bingzhao Gao,
Xiao Yu
Abstract:
In recent years, formal methods have been extensively used in the design of autonomous systems. By employing mathematically rigorous techniques, formal methods can provide fully automated reasoning processes with provable safety guarantees for complex dynamic systems with intricate interactions between continuous dynamics and discrete logics. This paper provides a comprehensive review of formal co…
▽ More
In recent years, formal methods have been extensively used in the design of autonomous systems. By employing mathematically rigorous techniques, formal methods can provide fully automated reasoning processes with provable safety guarantees for complex dynamic systems with intricate interactions between continuous dynamics and discrete logics. This paper provides a comprehensive review of formal controller synthesis techniques for safety-critical autonomous systems. Specifically, we categorize the formal control synthesis problem based on diverse system models, encompassing deterministic, non-deterministic, and stochastic, and various formal safety-critical specifications involving logic, real-time, and real-valued domains. The review covers fundamental formal control synthesis techniques, including abstraction-based approaches and abstraction-free methods. We explore the integration of data-driven synthesis approaches in formal control synthesis. Furthermore, we review formal techniques tailored for multi-agent systems (MAS), with a specific focus on various approaches to address the scalability challenges in large-scale systems. Finally, we discuss some recent trends and highlight research challenges in this area.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Observation of Berry curvature in non-Hermitian system from far-field radiation
Authors:
Xuefan Yin,
Ye Chen,
Xiaoyu Zhang,
Zixuan Zhang,
Susumu Noda,
Chao Peng
Abstract:
Berry curvature that describes local geometrical properties of energy bands can elucidate many fascinating phenomena in solid-state, photonic, and phononic systems, given its connection to global topological invariants such as the Chern number. Despite its significance, the observation of Berry curvature poses a substantial challenging since wavefunctions are deeply embedded within the system. Her…
▽ More
Berry curvature that describes local geometrical properties of energy bands can elucidate many fascinating phenomena in solid-state, photonic, and phononic systems, given its connection to global topological invariants such as the Chern number. Despite its significance, the observation of Berry curvature poses a substantial challenging since wavefunctions are deeply embedded within the system. Here, we theoretically propose a correspondence between the geometry of far-field radiation and the underneath band topology of non-Hermitian systems, thus providing a general method to fully capture the Berry curvature without strongly disturbing the eigenstates. We further experimentally observe the Berry curvature in a honeycomb photonic crystal slab from polarimetry measurements and quantitatively obtain the non-trivial valley Chern number. Our work reveals the feasibility of retrieving the bulk band topology from esca** photons and paves the way to exploring intriguing topological landscapes in non-Hermitian systems.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
SoftQE: Learned Representations of Queries Expanded by LLMs
Authors:
Varad Pimpalkhute,
John Heyer,
Xusen Yin,
Sameer Gupta
Abstract:
We investigate the integration of Large Language Models (LLMs) into query encoders to improve dense retrieval without increasing latency and cost, by circumventing the dependency on LLMs at inference time. SoftQE incorporates knowledge from LLMs by map** embeddings of input queries to those of the LLM-expanded queries. While improvements over various strong baselines on in-domain MS-MARCO metric…
▽ More
We investigate the integration of Large Language Models (LLMs) into query encoders to improve dense retrieval without increasing latency and cost, by circumventing the dependency on LLMs at inference time. SoftQE incorporates knowledge from LLMs by map** embeddings of input queries to those of the LLM-expanded queries. While improvements over various strong baselines on in-domain MS-MARCO metrics are marginal, SoftQE improves performance by 2.83 absolute percentage points on average on five out-of-domain BEIR tasks.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation
Authors:
Xunjian Yin,
Xu Zhang,
Jie Ruan,
Xiaojun Wan
Abstract:
In recent years, substantial advancements have been made in the development of large language models, achieving remarkable performance across diverse tasks. To evaluate the knowledge ability of language models, previous studies have proposed lots of benchmarks based on question-answering pairs. We argue that it is not reliable and comprehensive to evaluate language models with a fixed question or…
▽ More
In recent years, substantial advancements have been made in the development of large language models, achieving remarkable performance across diverse tasks. To evaluate the knowledge ability of language models, previous studies have proposed lots of benchmarks based on question-answering pairs. We argue that it is not reliable and comprehensive to evaluate language models with a fixed question or limited paraphrases as the query, since language models are sensitive to prompt. Therefore, we introduce a novel concept named knowledge boundary to encompass both prompt-agnostic and prompt-sensitive knowledge within language models. Knowledge boundary avoids prompt sensitivity in language model evaluations, rendering them more dependable and robust. To explore the knowledge boundary for a given model, we propose projected gradient descent method with semantic constraints, a new algorithm designed to identify the optimal prompt for each piece of knowledge. Experiments demonstrate a superior performance of our algorithm in computing the knowledge boundary compared to existing methods. Furthermore, we evaluate the ability of multiple language models in several domains with knowledge boundary.
△ Less
Submitted 29 May, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
A Game-Theoretical Approach for Optimal Supervisory Control of Discrete Event Systems under Energy Constraints
Authors:
Peng Lv,
Shaoyuan Li,
Xiang Yin
Abstract:
In this paper, we investigate the problem of optimal supervisory control for the discrete event systems under energy constraints. We consider that the execution of events consumes energy and the energy can be replenished at specific reload states. When the energy level drops below zero, the system will be crashed. To capture the above scenario, we introduce a new model, called consumption discrete…
▽ More
In this paper, we investigate the problem of optimal supervisory control for the discrete event systems under energy constraints. We consider that the execution of events consumes energy and the energy can be replenished at specific reload states. When the energy level drops below zero, the system will be crashed. To capture the above scenario, we introduce a new model, called consumption discrete event system (cDES). Our objective is to find the minimal initial energy value and synthesize an optimal supervisor ensuring that the energy will never be exhausted. To solve this problem, we propose a game-theoretical approach by converting the cDES as a consumption two-player graph game (cTPG) and reformulate the optimal supervisory control problem in game theory. In particular, we demonstrate that the converted game can be decomposed into independent reachability games related to reload vertices, which can be solved by a fixed point iterative algorithm proposed in this paper. Through iteratively removing unsafe reload vertices and solving reachability games for the remaining reload vertices, a solution can be found.
△ Less
Submitted 10 February, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Authors:
Mantas Mazeika,
Long Phan,
Xuwang Yin,
Andy Zou,
Zifan Wang,
Norman Mu,
Elham Sakhaee,
Nathaniel Li,
Steven Basart,
Bo Li,
David Forsyth,
Dan Hendrycks
Abstract:
Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods. To address this issue, we introduce HarmBench, a standardized evaluation framework for automated red teaming. We identify several desirable properties prev…
▽ More
Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods. To address this issue, we introduce HarmBench, a standardized evaluation framework for automated red teaming. We identify several desirable properties previously unaccounted for in red teaming evaluations and systematically design HarmBench to meet these criteria. Using HarmBench, we conduct a large-scale comparison of 18 red teaming methods and 33 target LLMs and defenses, yielding novel insights. We also introduce a highly efficient adversarial training method that greatly enhances LLM robustness across a wide range of attacks, demonstrating how HarmBench enables codevelopment of attacks and defenses. We open source HarmBench at https://github.com/centerforaisafety/HarmBench.
△ Less
Submitted 26 February, 2024; v1 submitted 6 February, 2024;
originally announced February 2024.
-
Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers
Authors:
Jianbin Jiao,
Xina Cheng,
Weijie Chen,
Xiaoting Yin,
Hao Shi,
Kailun Yang
Abstract:
3D human pose estimation captures the human joint points in three-dimensional space while kee** the depth information and physical structure. That is essential for applications that require precise pose information, such as human-computer interaction, scene understanding, and rehabilitation training. Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are pr…
▽ More
3D human pose estimation captures the human joint points in three-dimensional space while kee** the depth information and physical structure. That is essential for applications that require precise pose information, such as human-computer interaction, scene understanding, and rehabilitation training. Due to the challenges in data collection, mainstream datasets of 3D human pose estimation are primarily composed of multi-view video data collected in laboratory environments, which contains rich spatial-temporal correlation information besides the image frame content. Given the remarkable self-attention mechanism of transformers, capable of capturing the spatial-temporal correlation from multi-view video datasets, we propose a multi-stage framework for 3D sequence-to-sequence (seq2seq) human pose detection. Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships and 3D spatial positional relationship features between the multi-perspective images. Secondly, the self-attention mechanism is adopted to eliminate the interference from non-human body parts and reduce computing resources. Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset. Experimental results demonstrate that our approach achieves state-of-the-art performance on this dataset. The source code will be available at https://github.com/WUJINHUAN/3D-human-pose.
△ Less
Submitted 25 March, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Learning Local Control Barrier Functions for Safety Control of Hybrid Systems
Authors:
Shuo Yang,
Yu Chen,
Xiang Yin,
Rahul Mangharam
Abstract:
Hybrid dynamical systems are ubiquitous as practical robotic applications often involve both continuous states and discrete switchings. Safety is a primary concern for hybrid robotic systems. Existing safety-critical control approaches for hybrid systems are either computationally inefficient, detrimental to system performance, or limited to small-scale systems. To amend these drawbacks, in this p…
▽ More
Hybrid dynamical systems are ubiquitous as practical robotic applications often involve both continuous states and discrete switchings. Safety is a primary concern for hybrid robotic systems. Existing safety-critical control approaches for hybrid systems are either computationally inefficient, detrimental to system performance, or limited to small-scale systems. To amend these drawbacks, in this paper, we propose a learningenabled approach to construct local Control Barrier Functions (CBFs) to guarantee the safety of a wide class of nonlinear hybrid dynamical systems. The end result is a safe neural CBFbased switching controller. Our approach is computationally efficient, minimally invasive to any reference controller, and applicable to large-scale systems. We empirically evaluate our framework and demonstrate its efficacy and flexibility through two robotic examples including a high-dimensional autonomous racing case, against other CBF-based approaches and model predictive control.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Unveiling a Novel Metal-to-Metal Transition in LuH2: Critically Challenging Superconductivity Claims in Lutetium Hydrides
Authors:
Dong Wang,
Ningning Wang,
Caoshun Zhang,
Chunsheng Xia,
Weicheng Guo,
Xia Yin,
Kejun Bu,
Takeshi Nakagawa,
Jianbo Zhang,
Federico Gorelli,
Philip Dalladay-Simpson,
Thomas Meier,
Xujie Lü,
Liling Sun,
**guang Cheng,
Qiaoshi Zeng,
Yang Ding,
Ho-kwang Mao
Abstract:
Following the recent report by Dasenbrock-Gammon et al. (2023) of near-ambient superconductivity in nitrogen-doped lutetium trihydride (LuH3-δNε), significant debate has emerged surrounding the composition and interpretation of the observed sharp resistance drop. Here, we meticulously revisit these claims through comprehensive characterization and investigations. We definitively identify the repor…
▽ More
Following the recent report by Dasenbrock-Gammon et al. (2023) of near-ambient superconductivity in nitrogen-doped lutetium trihydride (LuH3-δNε), significant debate has emerged surrounding the composition and interpretation of the observed sharp resistance drop. Here, we meticulously revisit these claims through comprehensive characterization and investigations. We definitively identify the reported material as lutetium dihydride (LuH2), resolving the ambiguity surrounding its composition. Under similar conditions (270-295 K and 1-2 GPa), we replicate the reported sharp decrease in electrical resistance with a 30% success rate, aligning with Dasenbrock-Gammon et al.'s observations. However, our extensive investigations reveal this phenomenon to be a novel, pressure-induced metal-to-metal transition intrinsic to LuH2, distinct from superconductivity. Intriguingly, nitrogen do** exerts minimal impact on this transition. Our work not only elucidates the fundamental properties of LuH2 and LuH3 but also critically challenges the notion of superconductivity in these lutetium hydride systems. These findings pave the way for future research on lutetium hydride systems while emphasizing the crucial importance of rigorous verification in claims of ambient temperature superconductivity.
△ Less
Submitted 28 January, 2024; v1 submitted 25 January, 2024;
originally announced January 2024.
-
Contribution Functions for Quantitative Bipolar Argumentation Graphs: A Principle-based Analysis
Authors:
Timotheus Kampik,
Nico Potyka,
Xiang Yin,
Kristijonas Čyras,
Francesca Toni
Abstract:
We present a principle-based analysis of contribution functions for quantitative bipolar argumentation graphs that quantify the contribution of one argument to another. The introduced principles formalise the intuitions underlying different contribution functions as well as expectations one would have regarding the behaviour of contribution functions in general. As none of the covered contribution…
▽ More
We present a principle-based analysis of contribution functions for quantitative bipolar argumentation graphs that quantify the contribution of one argument to another. The introduced principles formalise the intuitions underlying different contribution functions as well as expectations one would have regarding the behaviour of contribution functions in general. As none of the covered contribution functions satisfies all principles, our analysis can serve as a tool that enables the selection of the most suitable function based on the requirements of a given use case.
△ Less
Submitted 13 June, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
Authors:
Zhenhui Ye,
Tianyun Zhong,
Yi Ren,
Jiaqi Yang,
Weichuang Li,
Jiawei Huang,
Ziyue Jiang,
**zheng He,
Rongjie Huang,
**glin Liu,
Chen Zhang,
Xiang Yin,
Zejun Ma,
Zhou Zhao
Abstract:
One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio to generate a talking portrait video. The existing methods fail to simultaneously achieve the goals of accurate 3D avatar reconstruction and stable talking face animation. Besides, while the existing works mainly focus on synthesizing the head part, it i…
▽ More
One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio to generate a talking portrait video. The existing methods fail to simultaneously achieve the goals of accurate 3D avatar reconstruction and stable talking face animation. Besides, while the existing works mainly focus on synthesizing the head part, it is also vital to generate natural torso and background segments to obtain a realistic talking portrait video. To address these limitations, we present Real3D-Potrait, a framework that (1) improves the one-shot 3D reconstruction power with a large image-to-plane model that distills 3D prior knowledge from a 3D face generative model; (2) facilitates accurate motion-conditioned animation with an efficient motion adapter; (3) synthesizes realistic video with natural torso movement and switchable background using a head-torso-background super-resolution model; and (4) supports one-shot audio-driven talking face generation with a generalizable audio-to-motion model. Extensive experiments show that Real3D-Portrait generalizes well to unseen identities and generates more realistic talking portrait videos compared to previous methods. Video samples and source code are available at https://real3dportrait.github.io .
△ Less
Submitted 23 March, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification
Authors:
Jiangming Shi,
Xiangbo Yin,
Yeyun Chen,
Yachao Zhang,
Zhizhong Zhang,
Yuan Xie,
Yanyun Qu
Abstract:
Unsupervised visible-infrared person re-identification (USL-VI-ReID) is a promising yet challenging retrieval task. The key challenges in USL-VI-ReID are to effectively generate pseudo-labels and establish pseudo-label correspondences across modalities without relying on any prior annotations. Recently, clustered pseudo-label methods have gained more attention in USL-VI-ReID. However, previous met…
▽ More
Unsupervised visible-infrared person re-identification (USL-VI-ReID) is a promising yet challenging retrieval task. The key challenges in USL-VI-ReID are to effectively generate pseudo-labels and establish pseudo-label correspondences across modalities without relying on any prior annotations. Recently, clustered pseudo-label methods have gained more attention in USL-VI-ReID. However, previous methods fell short of fully exploiting the individual nuances, as they simply utilized a single memory that represented an identity to establish cross-modality correspondences, resulting in ambiguous cross-modality correspondences. To address the problem, we propose a Multi-Memory Matching (MMM) framework for USL-VI-ReID. We first design a Cross-Modality Clustering (CMC) module to generate the pseudo-labels through clustering together both two modality samples. To associate cross-modality clustered pseudo-labels, we design a Multi-Memory Learning and Matching (MMLM) module, ensuring that optimization explicitly focuses on the nuances of individual perspectives and establishes reliable cross-modality correspondences. Finally, we design a Soft Cluster-level Alignment (SCA) module to narrow the modality gap while mitigating the effect of noise pseudo-labels through a soft many-to-many alignment strategy. Extensive experiments on the public SYSU-MM01 and RegDB datasets demonstrate the reliability of the established cross-modality correspondences and the effectiveness of our MMM. The source codes will be released.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
FeReX: A Reconfigurable Design of Multi-bit Ferroelectric Compute-in-Memory for Nearest Neighbor Search
Authors:
Zhicheng Xu,
Che-Kai Liu,
Chao Li,
Ruibin Mao,
Jianyi Yang,
Thomas Kämpfe,
Mohsen Imani,
Can Li,
Cheng Zhuo,
Xunzhao Yin
Abstract:
Rapid advancements in artificial intelligence have given rise to transformative models, profoundly impacting our lives. These models demand massive volumes of data to operate effectively, exacerbating the data-transfer bottleneck inherent in the conventional von-Neumann architecture. Compute-in-memory (CIM), a novel computing paradigm, tackles these issues by seamlessly embedding in-memory search…
▽ More
Rapid advancements in artificial intelligence have given rise to transformative models, profoundly impacting our lives. These models demand massive volumes of data to operate effectively, exacerbating the data-transfer bottleneck inherent in the conventional von-Neumann architecture. Compute-in-memory (CIM), a novel computing paradigm, tackles these issues by seamlessly embedding in-memory search functions, thereby obviating the need for data transfers. However, existing non-volatile memory (NVM)-based accelerators are application specific. During the similarity based associative search operation, they only support a single, specific distance metric, such as Hamming, Manhattan, or Euclidean distance in measuring the query against the stored data, calling for reconfigurable in-memory solutions adaptable to various applications. To overcome such a limitation, in this paper, we present FeReX, a reconfigurable associative memory (AM) that accommodates various distance metrics including Hamming, Manhattan, and Euclidean distances. Leveraging multi-bit ferroelectric field-effect transistors (FeFETs) as the proxy and a hardware-software co-design approach, we introduce a constrained satisfaction problem (CSP)-based method to automate AM search input voltage and stored voltage configurations for different distance based search functions. Device-circuit co-simulations first validate the effectiveness of the proposed FeReX methodology for reconfigurable search distance functions. Then, we benchmark FeReX in the context of k-nearest neighbor (KNN) and hyperdimensional computing (HDC), which highlights the robustness of FeReX and demonstrates up to 250x speedup and 10^4 energy savings compared with GPU.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Inverse-like Antagonistic Scene Text Spotting via Reading-Order Estimation and Dynamic Sampling
Authors:
Shi-Xue Zhang,
Chun Yang,
Xiaobin Zhu,
Hongyang Zhou,
Hongfa Wang,
Xu-Cheng Yin
Abstract:
Scene text spotting is a challenging task, especially for inverse-like scene text, which has complex layouts, e.g., mirrored, symmetrical, or retro-flexed. In this paper, we propose a unified end-to-end trainable inverse-like antagonistic text spotting framework dubbed IATS, which can effectively spot inverse-like scene texts without sacrificing general ones. Specifically, we propose an innovative…
▽ More
Scene text spotting is a challenging task, especially for inverse-like scene text, which has complex layouts, e.g., mirrored, symmetrical, or retro-flexed. In this paper, we propose a unified end-to-end trainable inverse-like antagonistic text spotting framework dubbed IATS, which can effectively spot inverse-like scene texts without sacrificing general ones. Specifically, we propose an innovative reading-order estimation module (REM) that extracts reading-order information from the initial text boundary generated by an initial boundary module (IBM). To optimize and train REM, we propose a joint reading-order estimation loss consisting of a classification loss, an orthogonality loss, and a distribution loss. With the help of IBM, we can divide the initial text boundary into two symmetric control points and iteratively refine the new text boundary using a lightweight boundary refinement module (BRM) for adapting to various shapes and scales. To alleviate the incompatibility between text detection and recognition, we propose a dynamic sampling module (DSM) with a thin-plate spline that can dynamically sample appropriate features for recognition in the detected text region. Without extra supervision, the DSM can proactively learn to sample appropriate features for text recognition through the gradient returned by the recognition module. Extensive experiments on both challenging scene text and inverse-like scene text datasets demonstrate that our method achieves superior performance both on irregular and inverse-like text spotting.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
On Approximate Opacity of Stochastic Control Systems
Authors:
Siyuan Liu,
Xiang Yin,
Dimos V. Dimarogonas,
Majid Zamani
Abstract:
This paper investigates an important class of information-flow security property called opacity for stochastic control systems. Opacity captures whether a system's secret behavior (a subset of the system's behavior that is considered to be critical) can be kept from outside observers. Existing works on opacity for control systems only provide a binary characterization of the system's security leve…
▽ More
This paper investigates an important class of information-flow security property called opacity for stochastic control systems. Opacity captures whether a system's secret behavior (a subset of the system's behavior that is considered to be critical) can be kept from outside observers. Existing works on opacity for control systems only provide a binary characterization of the system's security level by determining whether the system is opaque or not. In this work, we introduce a quantifiable measure of opacity that considers the likelihood of satisfying opacity for stochastic control systems modeled as general Markov decision processes (gMDPs). We also propose verification methods tailored to the new notions of opacity for finite gMDPs by using value iteration techniques. Then, a new notion called approximate opacity-preserving stochastic simulation relation is proposed, which captures the distance between two systems' behaviors in terms of preserving opacity. Based on this new system relation, we show that one can verify opacity for stochastic control systems using their abstractions (modeled as finite gMDPs). We also discuss how to construct such abstractions for a class of gMDPs under certain stability conditions.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Reconfigurable Frequency Multipliers Based on Complementary Ferroelectric Transistors
Authors:
Haotian Xu,
Jianyi Yang,
Cheng Zhuo,
Thomas Kämpfe,
Kai Ni,
Xunzhao Yin
Abstract:
Frequency multipliers, a class of essential electronic components, play a pivotal role in contemporary signal processing and communication systems. They serve as crucial building blocks for generating high-frequency signals by multiplying the frequency of an input signal. However, traditional frequency multipliers that rely on nonlinear devices often require energy- and area-consuming filtering an…
▽ More
Frequency multipliers, a class of essential electronic components, play a pivotal role in contemporary signal processing and communication systems. They serve as crucial building blocks for generating high-frequency signals by multiplying the frequency of an input signal. However, traditional frequency multipliers that rely on nonlinear devices often require energy- and area-consuming filtering and amplification circuits, and emerging designs based on an ambipolar ferroelectric transistor require costly non-trivial characteristic tuning or complex technology process. In this paper, we show that a pair of standard ferroelectric field effect transistors (FeFETs) can be used to build compact frequency multipliers without aforementioned technology issues. By leveraging the tunable parabolic shape of the 2FeFET structures' transfer characteristics, we propose four reconfigurable frequency multipliers, which can switch between signal transmission and frequency doubling. Furthermore, based on the 2FeFET structures, we propose four frequency multipliers that realize triple, quadruple frequency modes, elucidating a scalable methodology to generate more multiplication harmonics of the input frequency. Performance metrics such as maximum operating frequency, power, etc., are evaluated and compared with existing works. We also implement a practical case of frequency modulation scheme based on the proposed reconfigurable multipliers without additional devices. Our work provides a novel path of scalable and reconfigurable frequency multiplier designs based on devices that have characteristics similar to FeFETs, and show that FeFETs are a promising candidate for signal processing and communication systems in terms of maximum operating frequency and power.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Low Power and Temperature-Resilient Compute-In-Memory Based on Subthreshold-FeFET
Authors:
Yifei Zhou,
Xuchu Huang,
Jianyi Yang,
Kai Ni,
Hussam Amrouch,
Cheng Zhuo,
Xunzhao Yin
Abstract:
Compute-in-memory (CiM) is a promising solution for addressing the challenges of artificial intelligence (AI) and the Internet of Things (IoT) hardware such as 'memory wall' issue. Specifically, CiM employing nonvolatile memory (NVM) devices in a crossbar structure can efficiently accelerate multiply-accumulation (MAC) computation, a crucial operator in neural networks among various AI models. Low…
▽ More
Compute-in-memory (CiM) is a promising solution for addressing the challenges of artificial intelligence (AI) and the Internet of Things (IoT) hardware such as 'memory wall' issue. Specifically, CiM employing nonvolatile memory (NVM) devices in a crossbar structure can efficiently accelerate multiply-accumulation (MAC) computation, a crucial operator in neural networks among various AI models. Low power CiM designs are thus highly desired for further energy efficiency optimization on AI models. Ferroelectric FET (FeFET), an emerging device, is attractive for building ultra-low power CiM array due to CMOS compatibility, high ION/IOFF ratio, etc. Recent studies have explored FeFET based CiM designs that achieve low power consumption. Nevertheless, subthreshold-operated FeFETs, where the operating voltages are scaled down to the subthreshold region to reduce array power consumption, are particularly vulnerable to temperature drift, leading to accuracy degradation. To address this challenge, we propose a temperature-resilient 2T-1FeFET CiM design that performs MAC operations reliably at subthreahold region from 0 to 85 Celsius, while consuming ultra-low power. Benchmarked against the VGG neural network architecture running the CIFAR-10 dataset, the proposed 2T-1FeFET CiM design achieves 89.45% CIFAR-10 test accuracy. Compared to previous FeFET based CiM designs, it exhibits immunity to temperature drift at an 8-bit wordlength scale, and achieves better energy efficiency with 2866 TOPS/W.
△ Less
Submitted 10 January, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Optical detection of small polarons in vanadium dioxide and their critical role in mediating metal-insulator transition
Authors:
Xiongfang Liu,
Tong Yang,
**g Wu,
Mengxia Sun,
Mingyao Chen,
Chi Sin Tang,
Kun Han,
Difan Zhou,
Shengwei Zeng,
Shuo Sun,
Sensen Li,
Ming Yang,
Mark B. H. Breese,
Chuanbing Cai,
Thirumalai Venkatesan,
Andrew T. S. Wee,
Xinmao Yin
Abstract:
In the pursuit of advanced photoelectric devices, researchers have uncovered near room-temperature metal-insulator transitions (MIT) in non-volatile VO2. Although theoretical investigations propose that polaron dynamics mediate the MIT, direct experimental evidence remains scarce. In this study, we present direct evidence of the polaron state in insulating VO2 through high-resolution spectroscopic…
▽ More
In the pursuit of advanced photoelectric devices, researchers have uncovered near room-temperature metal-insulator transitions (MIT) in non-volatile VO2. Although theoretical investigations propose that polaron dynamics mediate the MIT, direct experimental evidence remains scarce. In this study, we present direct evidence of the polaron state in insulating VO2 through high-resolution spectroscopic ellipsometry measurements and first-principles calculations. We demonstrate that polaron dynamics play a complementary role in facilitating Peierls and Mott transitions to contribute to the MIT processes. Moreover, our observations and characterizations of conventional metallic and correlated plasmons in the respective phases of the VO2 film provide valuable insights into their electron structures. This study provides an understanding of the MIT mechanism in correlated systems and highlights how polarons, lattice distortions and electron correlations facilitate the phase transition processes in strongly-correlated systems, while further inspiring the development of new device functionalities.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
EnchantDance: Unveiling the Potential of Music-Driven Dance Movement
Authors:
Bo Han,
Yi Ren,
Hao Peng,
Teng Zhang,
Zeyu Ling,
Xiang Yin,
Feilin Han
Abstract:
The task of music-driven dance generation involves creating coherent dance movements that correspond to the given music. While existing methods can produce physically plausible dances, they often struggle to generalize to out-of-set data. The challenge arises from three aspects: 1) the high diversity of dance movements and significant differences in the distribution of music modalities, which make…
▽ More
The task of music-driven dance generation involves creating coherent dance movements that correspond to the given music. While existing methods can produce physically plausible dances, they often struggle to generalize to out-of-set data. The challenge arises from three aspects: 1) the high diversity of dance movements and significant differences in the distribution of music modalities, which make it difficult to generate music-aligned dance movements. 2) the lack of a large-scale music-dance dataset, which hinders the generation of generalized dance movements from music. 3) The protracted nature of dance movements poses a challenge to the maintenance of a consistent dance style. In this work, we introduce the EnchantDance framework, a state-of-the-art method for dance generation. Due to the redundancy of the original dance sequence along the time axis, EnchantDance first constructs a strong dance latent space and then trains a dance diffusion model on the dance latent space. To address the data gap, we construct a large-scale music-dance dataset, ChoreoSpectrum3D Dataset, which includes four dance genres and has a total duration of 70.32 hours, making it the largest reported music-dance dataset to date. To enhance consistency between music genre and dance style, we pre-train a music genre prediction network using transfer learning and incorporate music genre as extra conditional information in the training of the dance diffusion model. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance on dance quality, diversity, and consistency.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Bootstrap** M-theory Orbifolds
Authors:
Shai M. Chester,
Silviu S. Pufu,
Yifan Wang,
Xi Yin
Abstract:
We analyze correlation functions of $SU(k) \times SU(2)_F$ flavor currents in a family of three-dimensional ${\cal N}=4$ superconformal field theories, combining analytic bootstrap methods with input from supersymmetric localization. Via holographic duality, we extract gluon and graviton scattering amplitudes of M-theory on ${\rm AdS}_4\times S^7/\mathbb{Z}_k$ which contains a…
▽ More
We analyze correlation functions of $SU(k) \times SU(2)_F$ flavor currents in a family of three-dimensional ${\cal N}=4$ superconformal field theories, combining analytic bootstrap methods with input from supersymmetric localization. Via holographic duality, we extract gluon and graviton scattering amplitudes of M-theory on ${\rm AdS}_4\times S^7/\mathbb{Z}_k$ which contains a $\mathbb{C}^2/\mathbb{Z}_{k}$ orbifold singularity. From these results, we derive aspects of the effective description of M-theory on the orbifold singularity beyond its leading low energy limit. We also determine a threshold correction to the holographic correlator from the combined contribution of two-loop gluon and tree-level bulk graviton exchange.
△ Less
Submitted 5 May, 2024; v1 submitted 20 December, 2023;
originally announced December 2023.
-
Point Deformable Network with Enhanced Normal Embedding for Point Cloud Analysis
Authors:
Xingyilang Yin,
Xi Yang,
Liangchen Liu,
Nannan Wang,
Xinbo Gao
Abstract:
Recently MLP-based methods have shown strong performance in point cloud analysis. Simple MLP architectures are able to learn geometric features in local point groups yet fail to model long-range dependencies directly. In this paper, we propose Point Deformable Network (PDNet), a concise MLP-based network that can capture long-range relations with strong representation ability. Specifically, we put…
▽ More
Recently MLP-based methods have shown strong performance in point cloud analysis. Simple MLP architectures are able to learn geometric features in local point groups yet fail to model long-range dependencies directly. In this paper, we propose Point Deformable Network (PDNet), a concise MLP-based network that can capture long-range relations with strong representation ability. Specifically, we put forward Point Deformable Aggregation Module (PDAM) to improve representation capability in both long-range dependency and adaptive aggregation among points. For each query point, PDAM aggregates information from deformable reference points rather than points in limited local areas. The deformable reference points are generated data-dependent, and we initialize them according to the input point positions. Additional offsets and modulation scalars are learned on the whole point features, which shift the deformable reference points to the regions of interest. We also suggest estimating the normal vector for point clouds and applying Enhanced Normal Embedding (ENE) to the geometric extractors to improve the representation ability of single-point. Extensive experiments and ablation studies on various benchmarks demonstrate the effectiveness and superiority of our PDNet.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
Authors:
Rui Liu,
Yifan Hu,
Yi Ren,
Xiang Yin,
Haizhou Li
Abstract:
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. While recognising the significance of CSS task, the prior studies have not thoroughly investigated the emotional expressiveness problems due to the scarcity of emotional conversational datasets and the difficulty of stateful emotion mo…
▽ More
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. While recognising the significance of CSS task, the prior studies have not thoroughly investigated the emotional expressiveness problems due to the scarcity of emotional conversational datasets and the difficulty of stateful emotion modeling. In this paper, we propose a novel emotional CSS model, termed ECSS, that includes two main components: 1) to enhance emotion understanding, we introduce a heterogeneous graph-based emotional context modeling mechanism, which takes the multi-source dialogue history as input to model the dialogue context and learn the emotion cues from the context; 2) to achieve emotion rendering, we employ a contrastive learning-based emotion renderer module to infer the accurate emotion style for the target utterance. To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity, and annotate additional emotional information on the existing conversational dataset (DailyTalk). Both objective and subjective evaluations suggest that our model outperforms the baseline models in understanding and rendering emotions. These evaluations also underscore the importance of comprehensive emotional annotations. Code and audio samples can be found at: https://github.com/walker-hyf/ECSS.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
CLIP-guided Federated Learning on Heterogeneous and Long-Tailed Data
Authors:
Jiangming Shi,
Shanshan Zheng,
Xiangbo Yin,
Yang Lu,
Yuan Xie,
Yanyun Qu
Abstract:
Federated learning (FL) provides a decentralized machine learning paradigm where a server collaborates with a group of clients to learn a global model without accessing the clients' data. User heterogeneity is a significant challenge for FL, which together with the class-distribution imbalance further enhances the difficulty of FL. Great progress has been made in large vision-language models, such…
▽ More
Federated learning (FL) provides a decentralized machine learning paradigm where a server collaborates with a group of clients to learn a global model without accessing the clients' data. User heterogeneity is a significant challenge for FL, which together with the class-distribution imbalance further enhances the difficulty of FL. Great progress has been made in large vision-language models, such as Contrastive Language-Image Pre-training (CLIP), which paves a new way for image classification and object recognition. Inspired by the success of CLIP on few-shot and zero-shot learning, we use CLIP to optimize the federated learning between server and client models under its vision-language supervision. It is promising to mitigate the user heterogeneity and class-distribution balance due to the powerful cross-modality representation and rich open-vocabulary prior knowledge. In this paper, we propose the CLIP-guided FL (CLIP2FL) method on heterogeneous and long-tailed data. In CLIP2FL, the knowledge of the off-the-shelf CLIP model is transferred to the client-server models, and a bridge is built between the client and server. Specifically, for client-side learning, knowledge distillation is conducted between client models and CLIP to improve the ability of client-side feature representation. For server-side learning, in order to mitigate the heterogeneity and class-distribution imbalance, we generate federated features to retrain the server model. A prototype contrastive learning with the supervision of the text encoder of CLIP is introduced to generate federated features depending on the client-side gradients, and they are used to retrain a balanced server classifier.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
ReRoGCRL: Representation-based Robustness in Goal-Conditioned Reinforcement Learning
Authors:
Xiangyu Yin,
Sihao Wu,
Jiaxu Liu,
Meng Fang,
Xingyu Zhao,
Xiaowei Huang,
Wenjie Ruan
Abstract:
While Goal-Conditioned Reinforcement Learning (GCRL) has gained attention, its algorithmic robustness against adversarial perturbations remains unexplored. The attacks and robust representation training methods that are designed for traditional RL become less effective when applied to GCRL. To address this challenge, we first propose the Semi-Contrastive Representation attack, a novel approach ins…
▽ More
While Goal-Conditioned Reinforcement Learning (GCRL) has gained attention, its algorithmic robustness against adversarial perturbations remains unexplored. The attacks and robust representation training methods that are designed for traditional RL become less effective when applied to GCRL. To address this challenge, we first propose the Semi-Contrastive Representation attack, a novel approach inspired by the adversarial contrastive attack. Unlike existing attacks in RL, it only necessitates information from the policy function and can be seamlessly implemented during deployment. Then, to mitigate the vulnerability of existing GCRL algorithms, we introduce Adversarial Representation Tactics, which combines Semi-Contrastive Adversarial Augmentation with Sensitivity-Aware Regularizer to improve the adversarial robustness of the underlying RL agent against various types of perturbations. Extensive experiments validate the superior performance of our attack and defence methods across multiple state-of-the-art GCRL algorithms. Our tool ReRoGCRL is available at https://github.com/TrustAI/ReRoGCRL.
△ Less
Submitted 19 December, 2023; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Class-Aware Pruning for Efficient Neural Networks
Authors:
Mengnan Jiang,
**gcun Wang,
Amro Eldebiky,
Xunzhao Yin,
Cheng Zhuo,
Ing-Chao Lin,
Grace Li Zhang
Abstract:
Deep neural networks (DNNs) have demonstrated remarkable success in various fields. However, the large number of floating-point operations (FLOPs) in DNNs poses challenges for their deployment in resource-constrained applications, e.g., edge devices. To address the problem, pruning has been introduced to reduce the computational cost in executing DNNs. Previous pruning strategies are based on weig…
▽ More
Deep neural networks (DNNs) have demonstrated remarkable success in various fields. However, the large number of floating-point operations (FLOPs) in DNNs poses challenges for their deployment in resource-constrained applications, e.g., edge devices. To address the problem, pruning has been introduced to reduce the computational cost in executing DNNs. Previous pruning strategies are based on weight values, gradient values and activation outputs. Different from previous pruning solutions, in this paper, we propose a class-aware pruning technique to compress DNNs, which provides a novel perspective to reduce the computational cost of DNNs. In each iteration, the neural network training is modified to facilitate the class-aware pruning. Afterwards, the importance of filters with respect to the number of classes is evaluated. The filters that are only important for a few number of classes are removed. The neural network is then retrained to compensate for the incurred accuracy loss. The pruning iterations end until no filter can be removed anymore, indicating that the remaining filters are very important for many classes. This pruning technique outperforms previous pruning solutions in terms of accuracy, pruning ratio and the reduction of FLOPs. Experimental results confirm that this class-aware pruning technique can significantly reduce the number of weights and FLOPs, while maintaining a high inference accuracy.
△ Less
Submitted 18 February, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Synthesis of Temporally-Robust Policies for Signal Temporal Logic Tasks using Reinforcement Learning
Authors:
Siqi Wang,
Shaoyuan Li,
Li Yin,
Xiang Yin
Abstract:
This paper investigates the problem of designing control policies that satisfy high-level specifications described by signal temporal logic (STL) in unknown, stochastic environments. While many existing works concentrate on optimizing the spatial robustness of a system, our work takes a step further by also considering temporal robustness as a critical metric to quantify the tolerance of time unce…
▽ More
This paper investigates the problem of designing control policies that satisfy high-level specifications described by signal temporal logic (STL) in unknown, stochastic environments. While many existing works concentrate on optimizing the spatial robustness of a system, our work takes a step further by also considering temporal robustness as a critical metric to quantify the tolerance of time uncertainty in STL. To this end, we formulate two relevant control objectives to enhance the temporal robustness of the synthesized policies. The first objective is to maximize the probability of being temporally robust for a given threshold. The second objective is to maximize the worst-case spatial robustness value within a bounded time shift. We use reinforcement learning to solve both control synthesis problems for unknown systems. Specifically, we approximate both control objectives in a way that enables us to apply the standard Q-learning algorithm. Theoretical bounds in terms of the approximations are also derived. We present case studies to demonstrate the feasibility of our approach.
△ Less
Submitted 23 March, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
History Matters: Temporal Knowledge Editing in Large Language Model
Authors:
Xunjian Yin,
** Jiang,
Liming Yang,
Xiaojun Wan
Abstract:
The imperative task of revising or updating the knowledge stored within large language models arises from two distinct sources: intrinsic errors inherent in the model which should be corrected and outdated knowledge due to external shifts in the real world which should be updated. Prevailing efforts in model editing conflate these two distinct categories of edits arising from distinct reasons and…
▽ More
The imperative task of revising or updating the knowledge stored within large language models arises from two distinct sources: intrinsic errors inherent in the model which should be corrected and outdated knowledge due to external shifts in the real world which should be updated. Prevailing efforts in model editing conflate these two distinct categories of edits arising from distinct reasons and directly modify the original knowledge in models into new knowledge. However, we argue that preserving the model's original knowledge remains pertinent. Specifically, if a model's knowledge becomes outdated due to evolving worldly dynamics, it should retain recollection of the historical knowledge while integrating the newfound knowledge. In this work, we introduce the task of Temporal Knowledge Editing (TKE) and establish a benchmark AToKe (Assessment of TempOral Knowledge Editing) to evaluate current model editing methods. We find that while existing model editing methods are effective at making models remember new knowledge, the edited model catastrophically forgets historical knowledge. To address this gap, we propose a simple and general framework termed Multi-Editing with Time Objective (METO) for enhancing existing editing models, which edits both historical and new knowledge concurrently and optimizes the model's prediction for the time of each fact. Our assessments demonstrate that while AToKe is still difficult, METO maintains the effectiveness of learning new knowledge and meanwhile substantially improves the performance of edited models on utilizing historical knowledge.
△ Less
Submitted 14 December, 2023; v1 submitted 9 December, 2023;
originally announced December 2023.
-
Signal Temporal Logic Control Synthesis among Uncontrollable Dynamic Agents with Conformal Prediction
Authors:
Xinyi Yu,
Yiqi Zhao,
Xiang Yin,
Lars Lindemann
Abstract:
The control of dynamical systems under temporal logic specifications among uncontrollable dynamic agents is challenging due to the agents' a-priori unknown behavior. Existing works have considered the problem where either all agents are controllable, the agent models are deterministic and known, or no safety guarantees are provided. We propose a predictive control synthesis framework that guarante…
▽ More
The control of dynamical systems under temporal logic specifications among uncontrollable dynamic agents is challenging due to the agents' a-priori unknown behavior. Existing works have considered the problem where either all agents are controllable, the agent models are deterministic and known, or no safety guarantees are provided. We propose a predictive control synthesis framework that guarantees, with high probability, the satisfaction of signal temporal logic (STL) tasks that are defined over the system and uncontrollable stochastic agents. We use trajectory predictors and conformal prediction to construct probabilistic prediction regions for each uncontrollable agent that are valid over multiple future time steps. Specifically, we reduce conservatism and increase data efficiency compared to existing works by constructing a normalized prediction region over all agents and time steps. We then formulate a worst-case mixed integer program (MIP) that accounts for all agent realizations within the prediction region to obtain control inputs that provably guarantee task satisfaction with high probability. To efficiently solve this MIP, we propose an equivalent MIP program based on KKT conditions of the original one. We illustrate our control synthesis framework on two case studies.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
OplixNet: Towards Area-Efficient Optical Split-Complex Networks with Real-to-Complex Data Assignment and Knowledge Distillation
Authors:
Ruidi Qiu,
Amro Eldebiky,
Grace Li Zhang,
Xunzhao Yin,
Cheng Zhuo,
Ulf Schlichtmann,
Bing Li
Abstract:
Having the potential for high speed, high throughput, and low energy cost, optical neural networks (ONNs) have emerged as a promising candidate for accelerating deep learning tasks. In conventional ONNs, light amplitudes are modulated at the input and detected at the output. However, the light phases are still ignored in conventional structures, although they can also carry information for computi…
▽ More
Having the potential for high speed, high throughput, and low energy cost, optical neural networks (ONNs) have emerged as a promising candidate for accelerating deep learning tasks. In conventional ONNs, light amplitudes are modulated at the input and detected at the output. However, the light phases are still ignored in conventional structures, although they can also carry information for computing. To address this issue, in this paper, we propose a framework called OplixNet to compress the areas of ONNs by modulating input image data into the amplitudes and phase parts of light signals. The input and output parts of the ONNs are redesigned to make full use of both amplitude and phase information. Moreover, mutual learning across different ONN structures is introduced to maintain the accuracy. Experimental results demonstrate that the proposed framework significantly reduces the areas of ONNs with the accuracy within an acceptable range. For instance, 75.03% area is reduced with a 0.33% accuracy decrease on fully connected neural network (FCNN) and 74.88% area is reduced with a 2.38% accuracy decrease on ResNet-32.
△ Less
Submitted 15 December, 2023; v1 submitted 3 December, 2023;
originally announced December 2023.