Search | arXiv e-print repository

BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models

Authors: Yi Zeng, Weiyu Sun, Tran Ngoc Huynh, Dawn Song, Bo Li, Ruoxi Jia

Abstract: Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions. The high dimensionality of potential triggers in the token space and the diverse range of malicious behaviors make this a critical challenge. We present BEEAR, a mitigation approach leveraging the insight that backdoor triggers induce relati… ▽ More Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions. The high dimensionality of potential triggers in the token space and the diverse range of malicious behaviors make this a critical challenge. We present BEEAR, a mitigation approach leveraging the insight that backdoor triggers induce relatively uniform drifts in the model's embedding space. Our bi-level optimization method identifies universal embedding perturbations that elicit unwanted behaviors and adjusts the model parameters to reinforce safe behaviors against these perturbations. Experiments show BEEAR reduces the success rate of RLHF time backdoor attacks from >95% to <1% and from 47% to 0% for instruction-tuning time backdoors targeting malicious code generation, without compromising model utility. Requiring only defender-defined safe and unwanted behaviors, BEEAR represents a step towards practical defenses against safety backdoors in LLMs, providing a foundation for further advancements in AI safety and security. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:1406.4200 [pdf, other]

Lifted Tree-Reweighted Variational Inference

Authors: Hung Hai Bui, Tuyen N. Huynh, David Sontag

Abstract: We analyze variational inference for highly symmetric graphical models such as those arising from first-order probabilistic models. We first show that for these graphical models, the tree-reweighted variational objective lends itself to a compact lifted formulation which can be solved much more efficiently than the standard TRW formulation for the ground graphical model. Compared to earlier work o… ▽ More We analyze variational inference for highly symmetric graphical models such as those arising from first-order probabilistic models. We first show that for these graphical models, the tree-reweighted variational objective lends itself to a compact lifted formulation which can be solved much more efficiently than the standard TRW formulation for the ground graphical model. Compared to earlier work on lifted belief propagation, our formulation leads to a convex optimization problem for lifted marginal inference and provides an upper bound on the partition function. We provide two approaches for improving the lifted TRW upper bound. The first is a method for efficiently computing maximum spanning trees in highly symmetric graphs, which can be used to optimize the TRW edge appearance probabilities. The second is a method for tightening the relaxation of the marginal polytope using lifted cycle inequalities and novel exchangeable cluster consistency constraints. △ Less

Submitted 19 June, 2014; v1 submitted 16 June, 2014; originally announced June 2014.

Comments: In: UAI (Uncertainty in Artificial Intelligence) 2014

arXiv:1308.0914 [pdf, other]

A general phase noise relationship for four-wave mixing

Authors: Aravind P. Anthur, Regan T. Watts, Tam N. Huynh, Deepa Venkitesh, Liam P. Barry

Abstract: We propose and verify the use of the power spectral density of the FM noise spectrum to study the phase noise relationship between the four-wave mixing components. We propose and verify the use of the power spectral density of the FM noise spectrum to study the phase noise relationship between the four-wave mixing components. △ Less

Submitted 5 August, 2013; originally announced August 2013.

Comments: 5 pages, 3 figures

arXiv:1207.4814 [pdf, other]

Automorphism Groups of Graphical Models and Lifted Variational Inference

Authors: Hung Hai Bui, Tuyen N. Huynh, Sebastian Riedel

Abstract: Using the theory of group action, we first introduce the concept of the automorphism group of an exponential family or a graphical model, thus formalizing the general notion of symmetry of a probabilistic model. This automorphism group provides a precise mathematical framework for lifted inference in the general exponential family. Its group action partitions the set of random variables and featur… ▽ More Using the theory of group action, we first introduce the concept of the automorphism group of an exponential family or a graphical model, thus formalizing the general notion of symmetry of a probabilistic model. This automorphism group provides a precise mathematical framework for lifted inference in the general exponential family. Its group action partitions the set of random variables and feature functions into equivalent classes (called orbits) having identical marginals and expectations. Then the inference problem is effectively reduced to that of computing marginals or expectations for each class, thus avoiding the need to deal with each individual variable or feature. We demonstrate the usefulness of this general framework in lifting two classes of variational approximation for MAP inference: local LP relaxation and local LP relaxation with cycle constraints; the latter yields the first lifted inference that operate on a bound tighter than local constraints. Initial experimental results demonstrate that lifted MAP inference with cycle constraints achieved the state of the art performance, obtaining much better objective function values than local approximation while remaining relatively efficient. △ Less

Submitted 19 July, 2012; originally announced July 2012.

Comments: Extended version of the paper to appear in Statistical Relational AI (StaRAI-12) workshop at UAI '12

Showing 1–4 of 4 results for author: Huynh, T N