-
Online Bayesian Meta-Learning for Cognitive Tracking Radar
Authors:
Charles E. Thornton,
R. Michael Buehrer,
Anthony F. Martone
Abstract:
A key component of cognitive radar is the ability to generalize, or achieve consistent performance across a range of sensing environments, since aspects of the physical scene may vary over time. This presents a challenge for learning-based waveform selection approaches, since transmission policies which are effective in one scene may be highly suboptimal in another. We address this problem by stra…
▽ More
A key component of cognitive radar is the ability to generalize, or achieve consistent performance across a range of sensing environments, since aspects of the physical scene may vary over time. This presents a challenge for learning-based waveform selection approaches, since transmission policies which are effective in one scene may be highly suboptimal in another. We address this problem by strategically biasing a learning algorithm by exploiting high-level structure across tracking instances, referred to as meta-learning. In this work, we develop an online meta-learning approach for waveform-agile tracking. This approach uses information gained from previous target tracks to speed up and enhance learning in new tracking instances. This results in sample-efficient learning across a class of finite state target channels by exploiting inherent similarity across tracking scenes, attributed to common physical elements such as target type or clutter statistics. We formulate the online waveform selection problem within the framework of Bayesian learning, and provide prior-dependent performance bounds for the meta-learning problem using Probability Approximately Correct (PAC)-Bayes theory. We present a computationally feasible meta-posterior sampling algorithm and study the performance in a simulation study consisting of diverse scenes. Finally, we examine the potential performance benefits and practical challenges associated with online meta-learning for waveform-agile tracking.
△ Less
Submitted 7 February, 2023; v1 submitted 7 July, 2022;
originally announced July 2022.
-
Universal Learning Waveform Selection Strategies for Adaptive Target Tracking
Authors:
Charles E. Thornton,
R. Michael Buehrer,
Harpreet S. Dhillon,
Anthony F. Martone
Abstract:
Online selection of optimal waveforms for target tracking with active sensors has long been a problem of interest. Many conventional solutions utilize an estimation-theoretic interpretation, in which a waveform-specific Cramér-Rao lower bound on measurement error is used to select the optimal waveform for each tracking step. However, this approach is only valid in the high SNR regime, and requires…
▽ More
Online selection of optimal waveforms for target tracking with active sensors has long been a problem of interest. Many conventional solutions utilize an estimation-theoretic interpretation, in which a waveform-specific Cramér-Rao lower bound on measurement error is used to select the optimal waveform for each tracking step. However, this approach is only valid in the high SNR regime, and requires a rather restrictive set of assumptions regarding the target motion and measurement models. Further, due to computational concerns, many traditional approaches are limited to near-term, or myopic, optimization, even though radar scenes exhibit strong temporal correlation. More recently, reinforcement learning has been proposed for waveform selection, in which the problem is framed as a Markov decision process (MDP), allowing for long-term planning. However, a major limitation of reinforcement learning is that the memory length of the underlying Markov process is often unknown for realistic target and channel dynamics, and a more general framework is desirable. This work develops a universal sequential waveform selection scheme which asymptotically achieves Bellman optimality in any radar scene which can be modeled as a $U^{\text{th}}$ order Markov process for a finite, but unknown, integer $U$. Our approach is based on well-established tools from the field of universal source coding, where a stationary source is parsed into variable length phrases in order to build a context-tree, which is used as a probabalistic model for the scene's behavior. We show that an algorithm based on a multi-alphabet version of the Context-Tree Weighting (CTW) method can be used to optimally solve a broad class of waveform-agile tracking problems while making minimal assumptions about the environment's behavior.
△ Less
Submitted 10 February, 2022;
originally announced February 2022.
-
Online Meta-Learning for Scene-Diverse Waveform-Agile Radar Target Tracking
Authors:
Charles E. Thornton,
R. Michael Buehrer,
Anthony F. Martone
Abstract:
A fundamental problem for waveform-agile radar systems is that the true environment is unknown, and transmission policies which perform well for a particular tracking instance may be sub-optimal for another. Additionally, there is a limited time window for each target track, and the radar must learn an effective strategy from a sequence of measurements in a timely manner. This paper studies a Baye…
▽ More
A fundamental problem for waveform-agile radar systems is that the true environment is unknown, and transmission policies which perform well for a particular tracking instance may be sub-optimal for another. Additionally, there is a limited time window for each target track, and the radar must learn an effective strategy from a sequence of measurements in a timely manner. This paper studies a Bayesian meta-learning model for radar waveform selection which seeks to learn an inductive bias to quickly optimize tracking performance across a class of radar scenes. We cast the waveform selection problem in the framework of sequential Bayesian inference, and introduce a contextual bandit variant of the recently proposed meta-Thompson Sampling algorithm, which learns an inductive bias in the form of a prior distribution. Each track is treated as an instance of a contextual bandit learning problem, coming from a task distribution. We show that the meta-learning process results in an appreciably faster learning, resulting in significantly fewer lost tracks than a conventional learning approach equipped with an uninformative prior.
△ Less
Submitted 21 October, 2021;
originally announced October 2021.
-
Waveform Selection for Radar Tracking in Target Channels With Memory via Universal Learning
Authors:
Charles E. Thornton,
R. Michael Buehrer,
Anthony F. Martone
Abstract:
In tracking radar, the sensing environment often varies significantly over a track duration due to the target's trajectory and dynamic interference. Adapting the radar's waveform using partial information about the state of the scene has been shown to provide performance benefits in many practical scenarios. Moreover, radar measurements generally exhibit strong temporal correlation, allowing memor…
▽ More
In tracking radar, the sensing environment often varies significantly over a track duration due to the target's trajectory and dynamic interference. Adapting the radar's waveform using partial information about the state of the scene has been shown to provide performance benefits in many practical scenarios. Moreover, radar measurements generally exhibit strong temporal correlation, allowing memory-based learning algorithms to effectively learn waveform selection strategies. This work examines a radar system which builds a compressed model of the radar-environment interface in the form of a context-tree. The radar uses this context tree-based model to select waveforms in a signal-dependent target channel, which may respond adversarially to the radar's strategy. This approach is guaranteed to asymptotically converge to the average-cost optimal policy for any stationary target channel that can be represented as a Markov process of order U < $\infty$, where the constant U is unknown to the radar. The proposed approach is tested in a simulation study, and is shown to provide tracking performance improvements over two state-of-the-art waveform selection schemes.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
Constrained Contextual Bandit Learning for Adaptive Radar Waveform Selection
Authors:
Charles E. Thornton,
R. Michael Buehrer,
Anthony F. Martone
Abstract:
A sequential decision process in which an adaptive radar system repeatedly interacts with a finite-state target channel is studied. The radar is capable of passively sensing the spectrum at regular intervals, which provides side information for the waveform selection process. The radar transmitter uses the sequence of spectrum observations as well as feedback from a collocated receiver to select w…
▽ More
A sequential decision process in which an adaptive radar system repeatedly interacts with a finite-state target channel is studied. The radar is capable of passively sensing the spectrum at regular intervals, which provides side information for the waveform selection process. The radar transmitter uses the sequence of spectrum observations as well as feedback from a collocated receiver to select waveforms which accurately estimate target parameters. It is shown that the waveform selection problem can be effectively addressed using a linear contextual bandit formulation in a manner that is both computationally feasible and sample efficient. Stochastic and adversarial linear contextual bandit models are introduced, allowing the radar to achieve effective performance in broad classes of physical environments. Simulations in a radar-communication coexistence scenario, as well as in an adversarial radar-jammer scenario, demonstrate that the proposed formulation provides a substantial improvement in target detection performance when Thompson Sampling and EXP3 algorithms are used to drive the waveform selection process. Further, it is shown that the harmful impacts of pulse-agile behavior on coherently processed radar data can be mitigated by adopting a time-varying constraint on the radar's waveform catalog.
△ Less
Submitted 14 June, 2021; v1 submitted 9 March, 2021;
originally announced March 2021.
-
Multi-player Bandits for Distributed Cognitive Radar
Authors:
William W. Howard,
Charles E. Thornton,
Anthony F. Martone,
R. Michael Buehrer
Abstract:
With new applications for radar networks such as automotive control or indoor localization, the need for spectrum sharing and general interoperability is expected to rise. This paper describes the application of multi-player bandit algorithms for waveform selection to a distributed cognitive radar network that must coexist with a communications system. Specifically, we make the assumption that rad…
▽ More
With new applications for radar networks such as automotive control or indoor localization, the need for spectrum sharing and general interoperability is expected to rise. This paper describes the application of multi-player bandit algorithms for waveform selection to a distributed cognitive radar network that must coexist with a communications system. Specifically, we make the assumption that radar nodes in the network have no dedicated communication channel. As we will discuss later, nodes can communicate indirectly by taking actions which intentionally interfere with other nodes and observing the resulting collisions. The radar nodes attempt to optimize their own spectrum utilization while avoiding collisions, not only with each other, but with the communications system. The communications system is assumed to statically occupy some subset of the bands available to the radar network. First, we examine models that assume each node experiences equivalent channel conditions, and later examine a model that relaxes this assumption.
△ Less
Submitted 30 January, 2021;
originally announced February 2021.
-
Constrained Online Learning to Mitigate Distortion Effects in Pulse-Agile Cognitive Radar
Authors:
Charles E. Thornton,
R. Michael Buehrer,
Anthony F. Martone
Abstract:
Pulse-agile radar systems have demonstrated favorable performance in dynamic electromagnetic scenarios. However, the use of non-identical waveforms within a radar's coherent processing interval may lead to harmful distortion effects when pulse-Doppler processing is used. This paper presents an online learning framework to optimize detection performance while mitigating harmful sidelobe levels. The…
▽ More
Pulse-agile radar systems have demonstrated favorable performance in dynamic electromagnetic scenarios. However, the use of non-identical waveforms within a radar's coherent processing interval may lead to harmful distortion effects when pulse-Doppler processing is used. This paper presents an online learning framework to optimize detection performance while mitigating harmful sidelobe levels. The radar waveform selection process is formulated as a linear contextual bandit problem, within which waveform adaptations which exceed a tolerable level of expected distortion are eliminated. The constrained online learning approach is effective and computationally feasible, evidenced by simulations in a radar-communication coexistence scenario and in the presence of intentional adaptive jamming. This approach is applied to both stochastic and adversarial contextual bandit learning models and the detection performance in dynamic scenarios is evaluated.
△ Less
Submitted 26 February, 2021; v1 submitted 29 October, 2020;
originally announced October 2020.
-
Efficient Online Learning for Cognitive Radar-Cellular Coexistence via Contextual Thompson Sampling
Authors:
Charles E. Thornton,
R. Michael Buehrer,
Anthony F. Martone
Abstract:
This paper describes a sequential, or online, learning scheme for adaptive radar transmissions that facilitate spectrum sharing with a non-cooperative cellular network. First, the interference channel between the radar and a spatially distant cellular network is modeled. Then, a linear Contextual Bandit (CB) learning framework is applied to drive the radar's behavior. The fundamental trade-off bet…
▽ More
This paper describes a sequential, or online, learning scheme for adaptive radar transmissions that facilitate spectrum sharing with a non-cooperative cellular network. First, the interference channel between the radar and a spatially distant cellular network is modeled. Then, a linear Contextual Bandit (CB) learning framework is applied to drive the radar's behavior. The fundamental trade-off between exploration and exploitation is balanced by a proposed Thompson Sampling (TS) algorithm, a pseudo-Bayesian approach which selects waveform parameters based on the posterior probability that a specific waveform is optimal, given discounted channel information as context. It is shown that the contextual TS approach converges more rapidly to behavior that minimizes mutual interference and maximizes spectrum utilization than comparable contextual bandit algorithms. Additionally, we show that the TS learning scheme results in a favorable SINR distribution compared to other online learning algorithms. Finally, the proposed TS algorithm is compared to a deep reinforcement learning model. We show that the TS algorithm maintains competitive performance with a more complex Deep Q-Network (DQN).
△ Less
Submitted 23 August, 2020;
originally announced August 2020.
-
Deep Reinforcement Learning Control for Radar Detection and Tracking in Congested Spectral Environments
Authors:
Charles E. Thornton,
Mark A. Kozy,
R. Michael Buehrer,
Anthony F. Martone,
Kelly D. Sherbondy
Abstract:
In this paper, dynamic non-cooperative coexistence between a cognitive pulsed radar and a nearby communications system is addressed by applying nonlinear value function approximation via deep reinforcement learning (Deep RL) to develop a policy for optimal radar performance. The radar learns to vary the bandwidth and center frequency of its linear frequency modulated (LFM) waveforms to mitigate mu…
▽ More
In this paper, dynamic non-cooperative coexistence between a cognitive pulsed radar and a nearby communications system is addressed by applying nonlinear value function approximation via deep reinforcement learning (Deep RL) to develop a policy for optimal radar performance. The radar learns to vary the bandwidth and center frequency of its linear frequency modulated (LFM) waveforms to mitigate mutual interference with other systems and improve target detection performance while also maintaining sufficient utilization of the available frequency bands required for a fine range resolution. We demonstrate that our approach, based on the Deep Q-Learning (DQL) algorithm, enhances important radar metrics, including SINR and bandwidth utilization, more effectively than policy iteration or sense-and-avoid (SAA) approaches in a variety of realistic coexistence environments. We also extend the DQL-based approach to incorporate Double Q-learning and a recurrent neural network to form a Double Deep Recurrent Q-Network (DDRQN). We demonstrate the DDRQN results in favorable performance and stability compared to DQL and policy iteration. Finally, we demonstrate the practicality of our proposed approach through a discussion of experiments performed on a software defined radar (SDRadar) prototype system. Our experimental results indicate that the proposed Deep RL approach significantly improves radar detection performance in congested spectral environments when compared to policy iteration and SAA.
△ Less
Submitted 27 August, 2020; v1 submitted 23 June, 2020;
originally announced June 2020.
-
Experimental Analysis of Reinforcement Learning Techniques for Spectrum Sharing Radar
Authors:
Charles E. Thornton,
R. Michael Buehrer,
Anthony F. Martone,
Kelly D. Sherbondy
Abstract:
In this work, we first describe a framework for the application of Reinforcement Learning (RL) control to a radar system that operates in a congested spectral setting. We then compare the utility of several RL algorithms through a discussion of experiments performed on Commercial off-the-shelf (COTS) hardware. Each RL technique is evaluated in terms of convergence, radar detection performance achi…
▽ More
In this work, we first describe a framework for the application of Reinforcement Learning (RL) control to a radar system that operates in a congested spectral setting. We then compare the utility of several RL algorithms through a discussion of experiments performed on Commercial off-the-shelf (COTS) hardware. Each RL technique is evaluated in terms of convergence, radar detection performance achieved in a congested spectral environment, and the ability to share 100MHz spectrum with an uncooperative communications system. We examine policy iteration, which solves an environment posed as a Markov Decision Process (MDP) by directly solving for a stochastic map** between environmental states and radar waveforms, as well as Deep RL techniques, which utilize a form of Q-Learning to approximate a parameterized function that is used by the radar to select optimal actions. We show that RL techniques are beneficial over a Sense-and-Avoid (SAA) scheme and discuss the conditions under which each approach is most effective.
△ Less
Submitted 13 March, 2020; v1 submitted 6 January, 2020;
originally announced January 2020.