-
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models
Authors:
Yi Zeng,
Weiyu Sun,
Tran Ngoc Huynh,
Dawn Song,
Bo Li,
Ruoxi Jia
Abstract:
Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions. The high dimensionality of potential triggers in the token space and the diverse range of malicious behaviors make this a critical challenge. We present BEEAR, a mitigation approach leveraging the insight that backdoor triggers induce relati…
▽ More
Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions. The high dimensionality of potential triggers in the token space and the diverse range of malicious behaviors make this a critical challenge. We present BEEAR, a mitigation approach leveraging the insight that backdoor triggers induce relatively uniform drifts in the model's embedding space. Our bi-level optimization method identifies universal embedding perturbations that elicit unwanted behaviors and adjusts the model parameters to reinforce safe behaviors against these perturbations. Experiments show BEEAR reduces the success rate of RLHF time backdoor attacks from >95% to <1% and from 47% to 0% for instruction-tuning time backdoors targeting malicious code generation, without compromising model utility. Requiring only defender-defined safe and unwanted behaviors, BEEAR represents a step towards practical defenses against safety backdoors in LLMs, providing a foundation for further advancements in AI safety and security.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Lifted Tree-Reweighted Variational Inference
Authors:
Hung Hai Bui,
Tuyen N. Huynh,
David Sontag
Abstract:
We analyze variational inference for highly symmetric graphical models such as those arising from first-order probabilistic models. We first show that for these graphical models, the tree-reweighted variational objective lends itself to a compact lifted formulation which can be solved much more efficiently than the standard TRW formulation for the ground graphical model. Compared to earlier work o…
▽ More
We analyze variational inference for highly symmetric graphical models such as those arising from first-order probabilistic models. We first show that for these graphical models, the tree-reweighted variational objective lends itself to a compact lifted formulation which can be solved much more efficiently than the standard TRW formulation for the ground graphical model. Compared to earlier work on lifted belief propagation, our formulation leads to a convex optimization problem for lifted marginal inference and provides an upper bound on the partition function. We provide two approaches for improving the lifted TRW upper bound. The first is a method for efficiently computing maximum spanning trees in highly symmetric graphs, which can be used to optimize the TRW edge appearance probabilities. The second is a method for tightening the relaxation of the marginal polytope using lifted cycle inequalities and novel exchangeable cluster consistency constraints.
△ Less
Submitted 19 June, 2014; v1 submitted 16 June, 2014;
originally announced June 2014.
-
A general phase noise relationship for four-wave mixing
Authors:
Aravind P. Anthur,
Regan T. Watts,
Tam N. Huynh,
Deepa Venkitesh,
Liam P. Barry
Abstract:
We propose and verify the use of the power spectral density of the FM noise spectrum to study the phase noise relationship between the four-wave mixing components.
We propose and verify the use of the power spectral density of the FM noise spectrum to study the phase noise relationship between the four-wave mixing components.
△ Less
Submitted 5 August, 2013;
originally announced August 2013.
-
Automorphism Groups of Graphical Models and Lifted Variational Inference
Authors:
Hung Hai Bui,
Tuyen N. Huynh,
Sebastian Riedel
Abstract:
Using the theory of group action, we first introduce the concept of the automorphism group of an exponential family or a graphical model, thus formalizing the general notion of symmetry of a probabilistic model. This automorphism group provides a precise mathematical framework for lifted inference in the general exponential family. Its group action partitions the set of random variables and featur…
▽ More
Using the theory of group action, we first introduce the concept of the automorphism group of an exponential family or a graphical model, thus formalizing the general notion of symmetry of a probabilistic model. This automorphism group provides a precise mathematical framework for lifted inference in the general exponential family. Its group action partitions the set of random variables and feature functions into equivalent classes (called orbits) having identical marginals and expectations. Then the inference problem is effectively reduced to that of computing marginals or expectations for each class, thus avoiding the need to deal with each individual variable or feature. We demonstrate the usefulness of this general framework in lifting two classes of variational approximation for MAP inference: local LP relaxation and local LP relaxation with cycle constraints; the latter yields the first lifted inference that operate on a bound tighter than local constraints. Initial experimental results demonstrate that lifted MAP inference with cycle constraints achieved the state of the art performance, obtaining much better objective function values than local approximation while remaining relatively efficient.
△ Less
Submitted 19 July, 2012;
originally announced July 2012.