Search | arXiv e-print repository

Resilience of the Electric Grid through Trustable IoT-Coordinated Assets

Authors: Vineet J. Nair, Venkatesh Venkataramanan, Priyank Srivastava, Partha S. Sarker, Anurag Srivastava, Laurentiu D. Marinovici, Jun Zha, Christopher Irwin, Prateek Mittal, John Williams, H. Vincent Poor, Anuradha M. Annaswamy

Abstract: The electricity grid has evolved from a physical system to a cyber-physical system with digital devices that perform measurement, control, communication, computation, and actuation. The increased penetration of distributed energy resources (DERs) that include renewable generation, flexible loads, and storage provides extraordinary opportunities for improvements in efficiency and sustainability. Ho… ▽ More The electricity grid has evolved from a physical system to a cyber-physical system with digital devices that perform measurement, control, communication, computation, and actuation. The increased penetration of distributed energy resources (DERs) that include renewable generation, flexible loads, and storage provides extraordinary opportunities for improvements in efficiency and sustainability. However, they can introduce new vulnerabilities in the form of cyberattacks, which can cause significant challenges in ensuring grid resilience. %, i.e. the ability to rapidly restore grid services in the face of severe disruptions. We propose a framework in this paper for achieving grid resilience through suitably coordinated assets including a network of Internet of Things (IoT) devices. A local electricity market is proposed to identify trustable assets and carry out this coordination. Situational Awareness (SA) of locally available DERs with the ability to inject power or reduce consumption is enabled by the market, together with a monitoring procedure for their trustability and commitment. With this SA, we show that a variety of cyberattacks can be mitigated using local trustable resources without stressing the bulk grid. The demonstrations are carried out using a variety of platforms with a high-fidelity co-simulation platform, real-time hardware-in-the-loop validation, and a utility-friendly simulator. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Submitted to the Proceedings of the National Academy of Sciences (PNAS), under review

arXiv:2406.07797 [pdf, other]

Real-time Deformation Correction in Additively Printed Flexible Antenna Arrays

Authors: Sreeni Poolakkal, Abdullah Islam, Shrestha Bansal, Arpit Rao, Ted Dabrowski, Kalsi Kwan, Amit Mishra, Quiyan Xu, Erfan Ghaderi, Pradeep Lall, Sudip Shekhar, Julio Navarro, Shenqiang Ren, John Williams, Subhanshu Gupta

Abstract: Conformal phased arrays provide multiple degrees of freedom to the scan angle, which is typically limited by antenna aperture in rigid arrays. Silicon-based RF signal processing offers reliable, reconfigurable, multi-functional, and compact control for conformal phased arrays that can be used for on-the-move communication. While the lightweight, compactness, and shape-changing properties of the co… ▽ More Conformal phased arrays provide multiple degrees of freedom to the scan angle, which is typically limited by antenna aperture in rigid arrays. Silicon-based RF signal processing offers reliable, reconfigurable, multi-functional, and compact control for conformal phased arrays that can be used for on-the-move communication. While the lightweight, compactness, and shape-changing properties of the conformal phased arrays are attractive, these features result in dynamic deformation of the array during motion leading to significant dynamic beam pointing errors. We propose a silicon-based, compact, reconfigurable solution to self-correct these dynamic deformation-induced beam pointing errors. Furthermore, additive printing is leveraged to enhance the flexibility of the conformal phased arrays, as the printed conductive ink is more flexible than bulk copper and can be easily deposited on flexible sheets using different printing tools, providing an environmentally-friendly solution for large-scale production. The inks such as conventional silver inks are expensive and copper-based printable inks suffer from spontaneous metal oxidation that alters trace impedance and degrades beamforming performance. This work uses a low-cost molecular copper decomposition ink with reliable RF properties at different temperature and strain to print the proposed intelligent conformal phased array operating at 2.1 GHz. Proof-of-concept prototype $2\times2$ array self-corrects the deformation induces beampointing error with an error $<1.25^\circ$. The silicon based array processing part occupying only 2.58 mm$^2$ area and 83 mW power per tile. △ Less

Submitted 21 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2403.06899 [pdf, other]

Multiobject Tracking for Thresholded Cell Measurements

Authors: Thomas Kropfreiter, Jason L. Williams, Florian Meyer

Abstract: In many multiobject tracking applications, including radar and sonar tracking, after prefiltering the received signal, measurement data is typically structured in cells. The cells, e.g., represent different range and bearing values. However, conventional multiobject tracking methods use so-called point measurements. Point measurements are provided by a preprocessing stage that applies a threshold… ▽ More In many multiobject tracking applications, including radar and sonar tracking, after prefiltering the received signal, measurement data is typically structured in cells. The cells, e.g., represent different range and bearing values. However, conventional multiobject tracking methods use so-called point measurements. Point measurements are provided by a preprocessing stage that applies a threshold or detector and breaks up the cell's structure by converting cell indexes into, e.g., range and bearing measurements. We here propose a Bayesian multiobject tracking method that processes measurements that have been thresholded but are still cell-structured. We first derive a likelihood function that systematically incorporates an adjustable detection threshold which makes it possible to control the number of cell measurements. We then propose a Poisson Multi-Bernoulli (PMB) filter based on the likelihood function for cell measurements. Furthermore, we establish a link to the conventional point measurement model by deriving the likelihood function for point measurements with amplitude information (AM) and discuss the PMB filter that uses point measurements with AM. Our numerical results demonstrate the advantages of the proposed method that relies on thresholded cell measurements compared to the conventional multiobject tracking based on point measurements with and without AM. △ Less

Submitted 1 June, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: FUSION-24 conference

arXiv:2402.06304 [pdf, ps, other]

A New Approach to Voice Authenticity

Authors: Nicolas M. Müller, Piotr Kawa, Shen Hu, Matthias Neu, Jennifer Williams, Philip Sperl, Konstantin Böttinger

Abstract: Voice faking, driven primarily by recent advances in text-to-speech (TTS) synthesis technology, poses significant societal challenges. Currently, the prevailing assumption is that unaltered human speech can be considered genuine, while fake speech comes from TTS synthesis. We argue that this binary distinction is oversimplified. For instance, altered playback speeds can be used for malicious purpo… ▽ More Voice faking, driven primarily by recent advances in text-to-speech (TTS) synthesis technology, poses significant societal challenges. Currently, the prevailing assumption is that unaltered human speech can be considered genuine, while fake speech comes from TTS synthesis. We argue that this binary distinction is oversimplified. For instance, altered playback speeds can be used for malicious purposes, like in the 'Drunken Nancy Pelosi' incident. Similarly, editing of audio clips can be done ethically, e.g., for brevity or summarization in news reporting or podcasts, but editing can also create misleading narratives. In this paper, we propose a conceptual shift away from the binary paradigm of audio being either 'fake' or 'real'. Instead, our focus is on pinpointing 'voice edits', which encompass traditional modifications like filters and cuts, as well as TTS synthesis and VC systems. We delineate 6 categories and curate a new challenge dataset rooted in the M-AILABS corpus, for which we present baseline detection systems. And most importantly, we argue that merely categorizing audio as fake or real is a dangerous over-simplification that will fail to move the field of speech technology forward. △ Less

Submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.04753 [pdf, other]

Cortical Surface Diffusion Generative Models

Authors: Zhenshan Xie, Simon Dahan, Logan Z. J. Williams, M. Jorge Cardoso, Emma C. Robinson

Abstract: Cortical surface analysis has gained increased prominence, given its potential implications for neurological and developmental disorders. Traditional vision diffusion models, while effective in generating natural images, present limitations in capturing intricate development patterns in neuroimaging due to limited datasets. This is particularly true for generating cortical surfaces where individua… ▽ More Cortical surface analysis has gained increased prominence, given its potential implications for neurological and developmental disorders. Traditional vision diffusion models, while effective in generating natural images, present limitations in capturing intricate development patterns in neuroimaging due to limited datasets. This is particularly true for generating cortical surfaces where individual variability in cortical morphology is high, leading to an urgent need for better methods to model brain development and diverse variability inherent across different individuals. In this work, we proposed a novel diffusion model for the generation of cortical surface metrics, using modified surface vision transformers as the principal architecture. We validate our method in the develo** Human Connectome Project (dHCP), the results suggest our model demonstrates superior performance in capturing the intricate details of evolving cortical surfaces. Furthermore, our model can generate high-quality realistic samples of cortical surfaces conditioned on postmenstrual age(PMA) at scan. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 4 pages

arXiv:2401.03936 [pdf, other]

Exploratory Evaluation of Speech Content Masking

Authors: Jennifer Williams, Karla Pizzi, Paul-Gauthier Noe, Sneha Das

Abstract: Most recent speech privacy efforts have focused on anonymizing acoustic speaker attributes but there has not been as much research into protecting information from speech content. We introduce a toy problem that explores an emerging type of privacy called "content masking" which conceals selected words and phrases in speech. In our efforts to define this problem space, we evaluate an introductory… ▽ More Most recent speech privacy efforts have focused on anonymizing acoustic speaker attributes but there has not been as much research into protecting information from speech content. We introduce a toy problem that explores an emerging type of privacy called "content masking" which conceals selected words and phrases in speech. In our efforts to define this problem space, we evaluate an introductory baseline masking technique based on modifying sequences of discrete phone representations (phone codes) produced from a pre-trained vector-quantized variational autoencoder (VQ-VAE) and re-synthesized using WaveRNN. We investigate three different masking locations and three types of masking strategies: noise substitution, word deletion, and phone sequence reversal. Our work attempts to characterize how masking affects two downstream tasks: automatic speech recognition (ASR) and automatic speaker verification (ASV). We observe how the different masks types and locations impact these downstream tasks and discuss how these issues may influence privacy goals. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: Accepted to ITG Speech Conference 2023

arXiv:2308.05474 [pdf, other]

Spatio-Temporal Encoding of Brain Dynamics with Surface Masked Autoencoders

Authors: Simon Dahan, Logan Z. J. Williams, Yourong Guo, Daniel Rueckert, Emma C. Robinson

Abstract: The development of robust and generalisable models for encoding the spatio-temporal dynamics of human brain activity is crucial for advancing neuroscientific discoveries. However, significant individual variation in the organisation of the human cerebral cortex makes it difficult to identify population-level trends in these signals. Recently, Surface Vision Transformers (SiTs) have emerged as a pr… ▽ More The development of robust and generalisable models for encoding the spatio-temporal dynamics of human brain activity is crucial for advancing neuroscientific discoveries. However, significant individual variation in the organisation of the human cerebral cortex makes it difficult to identify population-level trends in these signals. Recently, Surface Vision Transformers (SiTs) have emerged as a promising approach for modelling cortical signals, yet they face some limitations in low-data scenarios due to the lack of inductive biases in their architecture. To address these challenges, this paper proposes the surface Masked AutoEncoder (sMAE) and video surface Masked AutoEncoder (vsMAE) - for multivariate and spatio-temporal pre-training of cortical signals over regular icosahedral grids. These models are trained to reconstruct cortical feature maps from masked versions of the input by learning strong latent representations of cortical structure and function. Such representations translate into better modelling of individual phenotypes and enhanced performance in downstream tasks. The proposed approach was evaluated on cortical phenotype regression using data from the young adult Human Connectome Project (HCP) and develo** HCP (dHCP). Results show that (v)sMAE pre-trained models improve phenoty** prediction performance on multiple tasks by $\ge 26\%$, and offer faster convergence relative to models trained from scratch. Finally, we show that pre-training vision transformers on large datasets, such as the UK Biobank (UKB), supports transfer learning to low-data regimes. Our code and pre-trained models are publicly available at https://github.com/metrics-lab/surface-masked-autoencoders . △ Less

Submitted 11 June, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: Accepted for publications for MIDL 2024; 20 figures; 7 figures

arXiv:2307.05426 [pdf, ps, other]

Using BOLD-fMRI to Compute the Respiration Volume per Time (RTV) and Respiration Variation (RV) with Convolutional Neural Networks (CNN) in the Human Connectome Development Cohort

Authors: Abdoljalil Addeh, Fernando Vega, Rebecca J Williams, Ali Golestani, G. Bruce Pike, M. Ethan MacDonald

Abstract: In many fMRI studies, respiratory signals are unavailable or do not have acceptable quality. Consequently, the direct removal of low-frequency respiratory variations from BOLD signals is not possible. This study proposes a one-dimensional CNN model for reconstruction of two respiratory measures, RV and RVT. Results show that a CNN can capture informative features from resting BOLD signals and reco… ▽ More In many fMRI studies, respiratory signals are unavailable or do not have acceptable quality. Consequently, the direct removal of low-frequency respiratory variations from BOLD signals is not possible. This study proposes a one-dimensional CNN model for reconstruction of two respiratory measures, RV and RVT. Results show that a CNN can capture informative features from resting BOLD signals and reconstruct realistic RV and RVT timeseries. It is expected that application of the proposed method will lower the cost of fMRI studies, reduce complexity, and decrease the burden on participants as they will not be required to wear a respiratory bellows. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: 6 pages, 5 figures

arXiv:2306.01375 [pdf, other]

Robust and Generalisable Segmentation of Subtle Epilepsy-causing Lesions: a Graph Convolutional Approach

Authors: Hannah Spitzer, Mathilde Ripart, Abdulah Fawaz, Logan Z. J. Williams, MELD project, Emma Robinson, Juan Eugenio Iglesias, Sophie Adler, Konrad Wagstyl

Abstract: Focal cortical dysplasia (FCD) is a leading cause of drug-resistant focal epilepsy, which can be cured by surgery. These lesions are extremely subtle and often missed even by expert neuroradiologists. "Ground truth" manual lesion masks are therefore expensive, limited and have large inter-rater variability. Existing FCD detection methods are limited by high numbers of false positive predictions, p… ▽ More Focal cortical dysplasia (FCD) is a leading cause of drug-resistant focal epilepsy, which can be cured by surgery. These lesions are extremely subtle and often missed even by expert neuroradiologists. "Ground truth" manual lesion masks are therefore expensive, limited and have large inter-rater variability. Existing FCD detection methods are limited by high numbers of false positive predictions, primarily due to vertex- or patch-based approaches that lack whole-brain context. Here, we propose to approach the problem as semantic segmentation using graph convolutional networks (GCN), which allows our model to learn spatial relationships between brain regions. To address the specific challenges of FCD identification, our proposed model includes an auxiliary loss to predict distance from the lesion to reduce false positives and a weak supervision classification loss to facilitate learning from uncertain lesion masks. On a multi-centre dataset of 1015 participants with surface-based features and manual lesion masks from structural MRI data, the proposed GCN achieved an AUC of 0.74, a significant improvement against a previously used vertex-wise multi-layer perceptron (MLP) classifier (AUC 0.64). With sensitivity thresholded at 67%, the GCN had a specificity of 71% in comparison to 49% when using the MLP. This improvement in specificity is vital for clinical integration of lesion-detection tools into the radiological workflow, through increasing clinical confidence in the use of AI radiological adjuncts and reducing the number of areas requiring expert review. △ Less

Submitted 5 June, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: accepted at MICCAI 2023

arXiv:2303.11909 [pdf, other]

The Multiscale Surface Vision Transformer

Authors: Simon Dahan, Logan Z. J. Williams, Daniel Rueckert, Emma C. Robinson

Abstract: Surface meshes are a favoured domain for representing structural and functional information on the human cortex, but their complex topology and geometry pose significant challenges for deep learning analysis. While Transformers have excelled as domain-agnostic architectures for sequence-to-sequence learning, the quadratic cost of the self-attention operation remains an obstacle for many dense pred… ▽ More Surface meshes are a favoured domain for representing structural and functional information on the human cortex, but their complex topology and geometry pose significant challenges for deep learning analysis. While Transformers have excelled as domain-agnostic architectures for sequence-to-sequence learning, the quadratic cost of the self-attention operation remains an obstacle for many dense prediction tasks. Inspired by some of the latest advances in hierarchical modelling with vision transformers, we introduce the Multiscale Surface Vision Transformer (MS-SiT) as a backbone architecture for surface deep learning. The self-attention mechanism is applied within local-mesh-windows to allow for high-resolution sampling of the underlying data, while a shifted-window strategy improves the sharing of information between windows. Neighbouring patches are successively merged, allowing the MS-SiT to learn hierarchical representations suitable for any prediction task. Results demonstrate that the MS-SiT outperforms existing surface deep learning methods for neonatal phenoty** prediction tasks using the Develo** Human Connectome Project (dHCP) dataset. Furthermore, building the MS-SiT backbone into a U-shaped architecture for surface segmentation demonstrates competitive results on cortical parcellation using the UK Biobank (UKB) and manually-annotated MindBoggle datasets. Code and trained models are publicly available at https://github.com/metrics-lab/surface-vision-transformers. △ Less

Submitted 11 June, 2024; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: Accepted for publication at MIDL 2024, 17 pages, 6 figures

arXiv:2301.08925 [pdf, other]

doi 10.21437/SPSC.2022-1

New Challenges for Content Privacy in Speech and Audio

Authors: Jennifer Williams, Karla Pizzi, Shuvayanti Das, Paul-Gauthier Noe

Abstract: Privacy in speech and audio has many facets. A particularly under-developed area of privacy in this domain involves consideration for information related to content and context. Speech content can include words and their meaning or even stylistic markers, pathological speech, intonation patterns, or emotion. More generally, audio captured in-the-wild may contain background speech or reveal context… ▽ More Privacy in speech and audio has many facets. A particularly under-developed area of privacy in this domain involves consideration for information related to content and context. Speech content can include words and their meaning or even stylistic markers, pathological speech, intonation patterns, or emotion. More generally, audio captured in-the-wild may contain background speech or reveal contextual information such as markers of location, room characteristics, paralinguistic sounds, or other audible events. Audio recording devices and speech technologies are becoming increasingly commonplace in everyday life. At the same time, commercialised speech and audio technologies do not provide consumers with a range of privacy choices. Even where privacy is regulated or protected by law, technical solutions to privacy assurance and enforcement fall short. This position paper introduces three important and timely research challenges for content privacy in speech and audio. We highlight current gaps and opportunities, and identify focus areas, that could have significant implications for develo** ethical and safer speech technologies. △ Less

Submitted 21 January, 2023; originally announced January 2023.

Comments: Accepted for publication in ISCA SPSC Symposium 2022

arXiv:2301.04416 [pdf, other]

pyssam -- a Python library for statistical modelling of biomedical shape and appearance

Authors: Josh Williams, Ali Ozel, Uwe Wolfram

Abstract: pyssam is a Python library for creating statistical shape and appearance models (SSAMs) for biological (and other) shapes such as bones, lungs or other organs. A point cloud best describing the anatomical 'landmarks' of the organ are required from each sample in a small population as an input. Additional information such as landmark gray-value can be included to incorporate joint correlations of s… ▽ More pyssam is a Python library for creating statistical shape and appearance models (SSAMs) for biological (and other) shapes such as bones, lungs or other organs. A point cloud best describing the anatomical 'landmarks' of the organ are required from each sample in a small population as an input. Additional information such as landmark gray-value can be included to incorporate joint correlations of shape and 'appearance' into the model. Our library performs alignment and scaling of the input data and creates a SSAM based on covariance across the population. The output SSAM can be used to parameterise and quantify shape change across a population. pyssam is a small and low dependency codebase with examples included as Jupyter notebooks for several common SSAM computations. The given examples can easily be extended to alternative datasets, and also alternative tasks such as medical image segmentation by incorporating a SSAM as a constraint for segmented organs. △ Less

Submitted 11 January, 2023; originally announced January 2023.

Comments: 5 pages, 3 figures, Journal of Open Source Software submission

arXiv:2212.11594 [pdf, other]

Electromagnetic Based Communication Model for Dynamic Metasurface Antennas

Authors: Robin Jess Williams, Pablo Ramirez-Espinosa, Jide Yuan, Elisabeth De Carvalho

Abstract: Dynamic metasurface antennas (DMAs) arise as a promising technology in the field of massive multiple-input multiple-output (mMIMO) systems, offering the possibility of integrating a large number of antennas in a limited -- and potentially large -- aperture while kee** the required number of radio-frequency (RF) chains under control. Although envisioned as practical realizations of mMIMO systems,… ▽ More Dynamic metasurface antennas (DMAs) arise as a promising technology in the field of massive multiple-input multiple-output (mMIMO) systems, offering the possibility of integrating a large number of antennas in a limited -- and potentially large -- aperture while kee** the required number of radio-frequency (RF) chains under control. Although envisioned as practical realizations of mMIMO systems, DMAs represent a new paradigm in the design of signal processing techniques (such as beamforming) due to the constraints inherent to their physical implementation, for which no complete models are available yet. In this work, we propose a complete and electromagnetic-compliant narrowband communication model for a generic DMA based system. Specifically, the model accounts for: i) the wave propagation and reflections throughout the waveguides that feed the antenna elements, ii) the mutual coupling both through the air and the waveguides, and iii) the insertion losses. Also, we integrate the electromagnetic model in the conventional digital communication model, providing a complete and useful framework to design and characterize the performance of these systems. Finally, the accuracy of the model is verified through full-wave simulations. △ Less

Submitted 22 December, 2022; originally announced December 2022.

arXiv:2211.15510 [pdf, other]

Localized Shortcut Removal

Authors: Nicolas M. Müller, Jochen Jacobs, Jennifer Williams, Konstantin Böttinger

Abstract: Machine learning is a data-driven field, and the quality of the underlying datasets plays a crucial role in learning success. However, high performance on held-out test data does not necessarily indicate that a model generalizes or learns anything meaningful. This is often due to the existence of machine learning shortcuts - features in the data that are predictive but unrelated to the problem at… ▽ More Machine learning is a data-driven field, and the quality of the underlying datasets plays a crucial role in learning success. However, high performance on held-out test data does not necessarily indicate that a model generalizes or learns anything meaningful. This is often due to the existence of machine learning shortcuts - features in the data that are predictive but unrelated to the problem at hand. To address this issue for datasets where the shortcuts are smaller and more localized than true features, we propose a novel approach to detect and remove them. We use an adversarially trained lens to detect and eliminate highly predictive but semantically unconnected clues in images. In our experiments on both synthetic and real-world data, we show that our proposed approach reliably identifies and neutralizes such shortcuts without causing degradation of model performance on clean data. We believe that our approach can lead to more meaningful and generalizable machine learning models, especially in scenarios where the quality of the underlying datasets is crucial. △ Less

Submitted 23 May, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

Comments: Accepted at XAI4CV @ CVPR2023

arXiv:2207.10164 [pdf, other]

Trajectory PMB Filters for Extended Object Tracking Using Belief Propagation

Authors: Yuxuan Xia, Ángel F. García-Fernández, Florian Meyer, Jason L. Williams, Karl Granström, Lennart Svensson

Abstract: In this paper, we propose a Poisson multi-Bernoulli (PMB) filter for extended object tracking (EOT), which directly estimates the set of object trajectories, using belief propagation (BP). The proposed filter propagates a PMB density on the posterior of sets of trajectories through the filtering recursions over time, where the PMB mixture (PMBM) posterior after the update step is approximated as a… ▽ More In this paper, we propose a Poisson multi-Bernoulli (PMB) filter for extended object tracking (EOT), which directly estimates the set of object trajectories, using belief propagation (BP). The proposed filter propagates a PMB density on the posterior of sets of trajectories through the filtering recursions over time, where the PMB mixture (PMBM) posterior after the update step is approximated as a PMB. The efficient PMB approximation relies on several important theoretical contributions. First, we present a PMBM conjugate prior on the posterior of sets of trajectories for a generalized measurement model, in which each object generates an independent set of measurements. The PMBM density is a conjugate prior in the sense that both the prediction and the update steps preserve the PMBM form of the density. Second, we present a factor graph representation of the joint posterior of the PMBM set of trajectories and association variables for the Poisson spatial measurement model. Importantly, leveraging the PMBM conjugacy and the factor graph formulation enables an elegant treatment on undetected objects via a Poisson point process and efficient inference on sets of trajectories using BP, where the approximate marginal densities in the PMB approximation can be obtained without enumeration of different data association hypotheses. To achieve this, we present a particle-based implementation of the proposed filter, where smoothed trajectory estimates, if desired, can be obtained via single-object particle smoothing methods, and its performance for EOT with ellipsoidal shapes is evaluated in a simulation study. △ Less

Submitted 19 September, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

Comments: Accepted for publication in IEEE Transactions on Aerospace and Electronic Systems. MATLAB implementation available at https://github.com/yuhsuansia/Trajectory-PMB-EOT-BP

arXiv:2206.13245 [pdf, other]

Performance Evaluation of Dynamic Metasurface Antennas: Impact of Insertion Losses and Coupling

Authors: Pablo Ramírez-Espinosa, Robin Jess Williams, Jide Yuan, Elisabeth de Carvalho

Abstract: This paper evaluates the performance of multi-user massive multiple-input multiple-output (MIMO) systems in which the base station is equipped with a dynamic metasurface antenna (DMA). Due to the physical implementation of DMAs, conventional models widely-used in MIMO are no longer valid, and electromagnetic phenomena such as mutual coupling, insertion losses and reflections inside the waveguides… ▽ More This paper evaluates the performance of multi-user massive multiple-input multiple-output (MIMO) systems in which the base station is equipped with a dynamic metasurface antenna (DMA). Due to the physical implementation of DMAs, conventional models widely-used in MIMO are no longer valid, and electromagnetic phenomena such as mutual coupling, insertion losses and reflections inside the waveguides need to be considered. Hence, starting from a recently proposed electromagnetic model for DMAs, we formulate a zero-forcing optimization problem, yielding an unconstrained objective function with known gradient. The performance is compared with that of full-digital and hybrid massive MIMO, focusing on the impact of insertion losses and mutual coupling. △ Less

Submitted 27 June, 2022; originally announced June 2022.

arXiv:2206.08112 [pdf, other]

doi 10.1109/TSP.2022.3184794

Multiple Object Trajectory Estimation Using Backward Simulation

Authors: Yuxuan Xia, Lennart Svensson, Ángel F. García-Fernández, Jason L. Williams, Daniel Svensson, Karl Granström

Abstract: This paper presents a general solution for computing the multi-object posterior for sets of trajectories from a sequence of multi-object (unlabelled) filtering densities and a multi-object dynamic model. Importantly, the proposed solution opens an avenue of trajectory estimation possibilities for multi-object filters that do not explicitly estimate trajectories. In this paper, we first derive a ge… ▽ More This paper presents a general solution for computing the multi-object posterior for sets of trajectories from a sequence of multi-object (unlabelled) filtering densities and a multi-object dynamic model. Importantly, the proposed solution opens an avenue of trajectory estimation possibilities for multi-object filters that do not explicitly estimate trajectories. In this paper, we first derive a general multi-trajectory backward smoothing equation based on random finite sets of trajectories. Then we show how to sample sets of trajectories using backward simulation for Poisson multi-Bernoulli filtering densities, and develop a tractable implementation based on ranked assignment. The performance of the resulting multi-trajectory particle smoothers is evaluated in a simulation study, and the results demonstrate that they have superior performance in comparison to several state-of-the-art multi-object filters and smoothers. △ Less

Submitted 16 June, 2022; originally announced June 2022.

Comments: Accepted for publication in IEEE Transactions on Signal Processing

arXiv:2204.03408 [pdf, other]

Surface Vision Transformers: Flexible Attention-Based Modelling of Biomedical Surfaces

Authors: Simon Dahan, Hao Xu, Logan Z. J. Williams, Abdulah Fawaz, Chunhui Yang, Timothy S. Coalson, Michelle C. Williams, David E. Newby, A. David Edwards, Matthew F. Glasser, Alistair A. Young, Daniel Rueckert, Emma C. Robinson

Abstract: Recent state-of-the-art performances of Vision Transformers (ViT) in computer vision tasks demonstrate that a general-purpose architecture, which implements long-range self-attention, could replace the local feature learning operations of convolutional neural networks. In this paper, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence learning problem… ▽ More Recent state-of-the-art performances of Vision Transformers (ViT) in computer vision tasks demonstrate that a general-purpose architecture, which implements long-range self-attention, could replace the local feature learning operations of convolutional neural networks. In this paper, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence learning problem, by proposing patching mechanisms for general surface meshes. Sequences of patches are then processed by a transformer encoder and used for classification or regression. We validate our method on a range of different biomedical surface domains and tasks: brain age prediction in the develo** Human Connectome Project (dHCP), fluid intelligence prediction in the Human Connectome Project (HCP), and coronary artery calcium score classification using surfaces from the Scottish Computed Tomography of the Heart (SCOT-HEART) dataset, and investigate the impact of pretraining and data augmentation on model performance. Results suggest that Surface Vision Transformers (SiT) demonstrate consistent improvement over geometric deep learning methods for brain age and fluid intelligence prediction and achieve comparable performance on calcium score classification to standard metrics used in clinical practice. Furthermore, analysis of transformer attention maps offers clear and individualised predictions of the features driving each task. Code is available on Github: https://github.com/metrics-lab/surface-vision-transformers △ Less

Submitted 7 April, 2022; originally announced April 2022.

Comments: 10 pages, 3 figures, Submitted to IEEE Transactions on Medical Imaging

arXiv:2203.16414 [pdf, other]

Surface Vision Transformers: Attention-Based Modelling applied to Cortical Analysis

Authors: Simon Dahan, Abdulah Fawaz, Logan Z. J. Williams, Chunhui Yang, Timothy S. Coalson, Matthew F. Glasser, A. David Edwards, Daniel Rueckert, Emma C. Robinson

Abstract: The extension of convolutional neural networks (CNNs) to non-Euclidean geometries has led to multiple frameworks for studying manifolds. Many of those methods have shown design limitations resulting in poor modelling of long-range associations, as the generalisation of convolutions to irregular surfaces is non-trivial. Motivated by the success of attention-modelling in computer vision, we translat… ▽ More The extension of convolutional neural networks (CNNs) to non-Euclidean geometries has led to multiple frameworks for studying manifolds. Many of those methods have shown design limitations resulting in poor modelling of long-range associations, as the generalisation of convolutions to irregular surfaces is non-trivial. Motivated by the success of attention-modelling in computer vision, we translate convolution-free vision transformer approaches to surface data, to introduce a domain-agnostic architecture to study any surface data projected onto a spherical manifold. Here, surface patching is achieved by representing spherical data as a sequence of triangular patches, extracted from a subdivided icosphere. A transformer model encodes the sequence of patches via successive multi-head self-attention layers while preserving the sequence resolution. We validate the performance of the proposed Surface Vision Transformer (SiT) on the task of phenotype regression from cortical surface metrics derived from the Develo** Human Connectome Project (dHCP). Experiments show that the SiT generally outperforms surface CNNs, while performing comparably on registered and unregistered data. Analysis of transformer attention maps offers strong potential to characterise subtle cognitive developmental patterns. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: 22 pages, 6 figures, Accepted to MIDL 2022, OpenReview link https://openreview.net/forum?id=mpp843Bsf-

Journal ref: Proceedings of Machine Learning Research. 172 (2022) 282-303

arXiv:2203.14640 [pdf, other]

Analysis of Voice Conversion and Code-Switching Synthesis Using VQ-VAE

Authors: Shuvayanti Das, Jennifer Williams, Catherine Lai

Abstract: This paper presents an analysis of speech synthesis quality achieved by simultaneously performing voice conversion and language code-switching using multilingual VQ-VAE speech synthesis in German, French, English and Italian. In this paper, we utilize VQ code indices representing phone information from VQ-VAE to perform code-switching and a VQ speaker code to perform voice conversion in a single s… ▽ More This paper presents an analysis of speech synthesis quality achieved by simultaneously performing voice conversion and language code-switching using multilingual VQ-VAE speech synthesis in German, French, English and Italian. In this paper, we utilize VQ code indices representing phone information from VQ-VAE to perform code-switching and a VQ speaker code to perform voice conversion in a single system with a neural vocoder. Our analysis examines several aspects of code-switching including the number of language switches and the number of words involved in each switch. We found that speech synthesis quality degrades after increasing the number of language switches within an utterance and decreasing the number of words. We also found some evidence of accent transfer when performing voice conversion across languages as observed when a speaker's original language differs from the language of a synthetic target utterance. We present results from our listening tests and discuss the inherent difficulties of assessing accent transfer in speech synthesis. Our work highlights some of the limitations and strengths of using a semi-supervised end-to-end system like VQ-VAE for handling multilingual synthesis. Our work provides insight into why multilingual speech synthesis is challenging and we suggest some directions for expanding work in this area. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: Submitted to Interspeech 2022

arXiv:2112.06048 [pdf, other]

Behavior measures are predicted by how information is encoded in an individual's brain

Authors: Jennifer Williams, Leila Wehbe

Abstract: Similar to how differences in the proficiency of the cardiovascular and musculoskeletal system predict an individual's athletic ability, differences in how the same brain region encodes information across individuals may explain their behavior. However, when studying how the brain encodes information, researchers choose different neuroimaging tasks (e.g., language or motor tasks), which can rely o… ▽ More Similar to how differences in the proficiency of the cardiovascular and musculoskeletal system predict an individual's athletic ability, differences in how the same brain region encodes information across individuals may explain their behavior. However, when studying how the brain encodes information, researchers choose different neuroimaging tasks (e.g., language or motor tasks), which can rely on processing different types of information and can modulate different brain regions. We hypothesize that individual differences in how information is encoded in the brain are task-specific and predict different behavior measures. We propose a framework using encoding-models to identify individual differences in brain encoding and test if these differences can predict behavior. We evaluate our framework using task functional magnetic resonance imaging data. Our results indicate that individual differences revealed by encoding-models are a powerful tool for predicting behavior, and that researchers should optimize their choice of task and encoding-model for their behavior of interest. △ Less

Submitted 11 December, 2021; originally announced December 2021.

arXiv:2110.06760 [pdf, other]

Revisiting Speech Content Privacy

Authors: Jennifer Williams, Junichi Yamagishi, Paul-Gauthier Noe, Cassia Valentini Botinhao, Jean-Francois Bonastre

Abstract: In this paper, we discuss an important aspect of speech privacy: protecting spoken content. New capabilities from the field of machine learning provide a unique and timely opportunity to revisit speech content protection. There are many different applications of content privacy, even though this area has been under-explored in speech technology research. This paper presents several scenarios that… ▽ More In this paper, we discuss an important aspect of speech privacy: protecting spoken content. New capabilities from the field of machine learning provide a unique and timely opportunity to revisit speech content protection. There are many different applications of content privacy, even though this area has been under-explored in speech technology research. This paper presents several scenarios that indicate a need for speech content privacy even as the specific techniques to achieve content privacy may necessarily vary. Our discussion includes several different types of content privacy including recoverable and non-recoverable content. Finally, we introduce evaluation strategies as well as describe some of the difficulties that may be encountered. △ Less

Submitted 13 October, 2021; originally announced October 2021.

Comments: Accepted to ISCA Security and Privacy in Speech Communication (1st SPSC Symposium)

arXiv:2109.03115 [pdf, other]

Improving Phenotype Prediction using Long-Range Spatio-Temporal Dynamics of Functional Connectivity

Authors: Simon Dahan, Logan Z. J. Williams, Daniel Rueckert, Emma C. Robinson

Abstract: The study of functional brain connectivity (FC) is important for understanding the underlying mechanisms of many psychiatric disorders. Many recent analyses adopt graph convolutional networks, to study non-linear interactions between functionally-correlated states. However, although patterns of brain activation are known to be hierarchically organised in both space and time, many methods have fail… ▽ More The study of functional brain connectivity (FC) is important for understanding the underlying mechanisms of many psychiatric disorders. Many recent analyses adopt graph convolutional networks, to study non-linear interactions between functionally-correlated states. However, although patterns of brain activation are known to be hierarchically organised in both space and time, many methods have failed to extract powerful spatio-temporal features. To overcome those challenges, and improve understanding of long-range functional dynamics, we translate an approach, from the domain of skeleton-based action recognition, designed to model interactions across space and time. We evaluate this approach using the Human Connectome Project (HCP) dataset on sex classification and fluid intelligence prediction. To account for subject topographic variability of functional organisation, we modelled functional connectomes using multi-resolution dual-regressed (subject-specific) ICA nodes. Results show a prediction accuracy of 94.4% for sex classification (an increase of 6.2% compared to other methods), and an improvement of correlation with fluid intelligence of 0.325 vs 0.144, relative to a baseline model that encodes space and time separately. Results suggest that explicit encoding of spatio-temporal dynamics of brain functional activity may improve the precision with which behavioural and cognitive phenotypes may be predicted in the future. △ Less

Submitted 7 September, 2021; originally announced September 2021.

Comments: MLCN 2021

arXiv:2109.01490 [pdf, other]

A Scalable Track-Before-Detect Method With Poisson/Multi-Bernoulli Model

Authors: Thomas Kropfreiter, Jason L. Williams, Florian Meyer

Abstract: We propose a scalable track-before-detect (TBD) tracking method based on a Poisson/multi-Bernoulli model. To limit computational complexity, we approximate the exact multi-Bernoulli mixture posterior probability density function (pdf) by a multi-Bernoulli pdf. Data association based on the sum-product algorithm and recycling of Bernoulli components enable the detection and tracking of low-observab… ▽ More We propose a scalable track-before-detect (TBD) tracking method based on a Poisson/multi-Bernoulli model. To limit computational complexity, we approximate the exact multi-Bernoulli mixture posterior probability density function (pdf) by a multi-Bernoulli pdf. Data association based on the sum-product algorithm and recycling of Bernoulli components enable the detection and tracking of low-observable objects with limited computational resources. Our simulation results demonstrate a significantly improved tracking performance compared to a state-of-the-art TBD method. △ Less

Submitted 3 September, 2021; originally announced September 2021.

Comments: published at FUSION conference 2021

arXiv:2107.09667 [pdf, other]

doi 10.1145/3552466.3556531

Human Perception of Audio Deepfakes

Authors: Nicolas M. Müller, Karla Pizzi, Jennifer Williams

Abstract: The recent emergence of deepfakes has brought manipulated and generated content to the forefront of machine learning research. Automatic detection of deepfakes has seen many new machine learning techniques, however, human detection capabilities are far less explored. In this paper, we present results from comparing the abilities of humans and machines for detecting audio deepfakes used to imitate… ▽ More The recent emergence of deepfakes has brought manipulated and generated content to the forefront of machine learning research. Automatic detection of deepfakes has seen many new machine learning techniques, however, human detection capabilities are far less explored. In this paper, we present results from comparing the abilities of humans and machines for detecting audio deepfakes used to imitate someone's voice. For this, we use a web-based application framework formulated as a game. Participants were asked to distinguish between real and fake audio samples. In our experiment, 472 unique users competed against a state-of-the-art AI deepfake detection algorithm for 14912 total of rounds of the game. We find that humans and deepfake detection algorithms share similar strengths and weaknesses, both struggling to detect certain types of attacks. This is in contrast to the superhuman performance of AI in many application areas such as object detection or face recognition. Concerning human success factors, we find that IT professionals have no advantage over non-professionals but native speakers have an advantage over non-native speakers. Additionally, we find that older participants tend to be more susceptible than younger ones. These insights may be helpful when designing future cybersecurity training for humans as well as develo** better detection algorithms. △ Less

Submitted 6 October, 2022; v1 submitted 20 July, 2021; originally announced July 2021.

Comments: Published at ACM Multimedia 2022 Workshop DDAM First International Workshop on Deepfake Detection for Audio Multimedia at ACM Multimedia 2022

arXiv:2106.12914 [pdf, other]

Speech is Silver, Silence is Golden: What do ASVspoof-trained Models Really Learn?

Authors: Nicolas M. Müller, Franziska Dieckmann, Pavel Czempin, Roman Canals, Konstantin Böttinger, Jennifer Williams

Abstract: We present our analysis of a significant data artifact in the official 2019/2021 ASVspoof Challenge Dataset. We identify an uneven distribution of silence duration in the training and test splits, which tends to correlate with the target prediction label. Bonafide instances tend to have significantly longer leading and trailing silences than spoofed instances. In this paper, we explore this phenom… ▽ More We present our analysis of a significant data artifact in the official 2019/2021 ASVspoof Challenge Dataset. We identify an uneven distribution of silence duration in the training and test splits, which tends to correlate with the target prediction label. Bonafide instances tend to have significantly longer leading and trailing silences than spoofed instances. In this paper, we explore this phenomenon and its impact in depth. We compare several types of models trained on a) only the duration of the leading silence and b) only on the duration of leading and trailing silence. Results show that models trained on only the duration of the leading silence perform particularly well, and achieve up to 85% percent accuracy and an equal error rate (EER) of 15.1%. At the same time, we observe that trimming silence during pre-processing and then training established antispoofing models using signal-based features leads to comparatively worse performance. In that case, EER increases from 3.6% (with silence) to 15.5% (trimmed silence). Our findings suggest that previous work may, in part, have inadvertently learned thespoof/bonafide distinction by relying on the duration of silence as it appears in the official challenge dataset. We discuss the potential consequences that this has for interpreting system scores in the challenge and discuss how the ASV community may further consider this issue. △ Less

Submitted 28 September, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

Journal ref: ASVspoof 2021 Workshop

arXiv:2105.01573 [pdf, other]

Exploring Disentanglement with Multilingual and Monolingual VQ-VAE

Authors: Jennifer Williams, Jason Fong, Erica Cooper, Junichi Yamagishi

Abstract: This work examines the content and usefulness of disentangled phone and speaker representations from two separately trained VQ-VAE systems: one trained on multilingual data and another trained on monolingual data. We explore the multi- and monolingual models using four small proof-of-concept tasks: copy-synthesis, voice transformation, linguistic code-switching, and content-based privacy masking.… ▽ More This work examines the content and usefulness of disentangled phone and speaker representations from two separately trained VQ-VAE systems: one trained on multilingual data and another trained on monolingual data. We explore the multi- and monolingual models using four small proof-of-concept tasks: copy-synthesis, voice transformation, linguistic code-switching, and content-based privacy masking. From these tasks, we reflect on how disentangled phone and speaker representations can be used to manipulate speech in a meaningful way. Our experiments demonstrate that the VQ representations are suitable for these tasks, including creating new voices by mixing speaker representations together. We also present our novel technique to conceal the content of targeted words within an utterance by manipulating phone VQ codes, while retaining speaker identity and intelligibility of surrounding words. Finally, we discuss recommendations for further increasing the viability of disentangled representations. △ Less

Submitted 28 June, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

Comments: Accepted to Speech Synthesis Workshop 2021 (SSW11)

arXiv:2103.11279 [pdf, other]

doi 10.1109/TSP.2021.3121631

Scalable Detection and Tracking of Geometric Extended Objects

Authors: Florian Meyer, Jason L. Williams

Abstract: Multiobject tracking provides situational awareness that enables new applications for modern convenience, public safety, and homeland security. This paper presents a factor graph formulation and a particle-based sum-product algorithm (SPA) for scalable detection and tracking of extended objects. The proposed method dynamically introduces states of newly detected objects, efficiently performs proba… ▽ More Multiobject tracking provides situational awareness that enables new applications for modern convenience, public safety, and homeland security. This paper presents a factor graph formulation and a particle-based sum-product algorithm (SPA) for scalable detection and tracking of extended objects. The proposed method dynamically introduces states of newly detected objects, efficiently performs probabilistic multiple-measurement to object association, and jointly infers the geometric shapes of objects. Scalable extended object tracking (EOT) is enabled by modeling association uncertainty by measurement-oriented association variables and newly detected objects by a Poisson birth process. Contrary to conventional EOT methods, a fully particle-based approach makes it possible to describe different geometric object shapes. The proposed method can reliably detect, localize, and track a large number of closely-spaced extended objects without gating and clustering of measurements. We demonstrate significant performance advantages of our approach compared to the recently introduced Poisson multi-Bernoulli mixture filter. In particular, we consider a simulated scenarios with up to twenty closely-spaced objects and a real autonomous driving application where measurements are captured by a lidar sensor. △ Less

Submitted 9 December, 2021; v1 submitted 20 March, 2021; originally announced March 2021.

Comments: 29 pages, 8 figures, 2 tables

arXiv:2103.02561 [pdf, other]

ICAM-reg: Interpretable Classification and Regression with Feature Attribution for Map** Neurological Phenotypes in Individual Scans

Authors: Cher Bass, Mariana da Silva, Carole Sudre, Logan Z. J. Williams, Petru-Daniel Tudosiu, Fidel Alfaro-Almagro, Sean P. Fitzgibbon, Matthew F. Glasser, Stephen M. Smith, Emma C. Robinson

Abstract: An important goal of medical imaging is to be able to precisely detect patterns of disease specific to individual scans; however, this is challenged in brain imaging by the degree of heterogeneity of shape and appearance. Traditional methods, based on image registration to a global template, historically fail to detect variable features of disease, as they utilise population-based analyses, suited… ▽ More An important goal of medical imaging is to be able to precisely detect patterns of disease specific to individual scans; however, this is challenged in brain imaging by the degree of heterogeneity of shape and appearance. Traditional methods, based on image registration to a global template, historically fail to detect variable features of disease, as they utilise population-based analyses, suited primarily to studying group-average effects. In this paper we therefore take advantage of recent developments in generative deep learning to develop a method for simultaneous classification, or regression, and feature attribution (FA). Specifically, we explore the use of a VAE-GAN translation network called ICAM, to explicitly disentangle class relevant features from background confounds for improved interpretability and regression of neurological phenotypes. We validate our method on the tasks of Mini-Mental State Examination (MMSE) cognitive test score prediction for the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort, as well as brain age prediction, for both neurodevelopment and neurodegeneration, using the develo** Human Connectome Project (dHCP) and UK Biobank datasets. We show that the generated FA maps can be used to explain outlier predictions and demonstrate that the inclusion of a regression module improves the disentanglement of the latent space. Our code is freely available on Github https://github.com/CherBass/ICAM. △ Less

Submitted 3 March, 2021; originally announced March 2021.

arXiv:2011.00922 [pdf, other]

Multiuser MIMO with Large Intelligent Surfaces: Communication Model and Transmit Design

Authors: Robin Jess Williams, Pablo Ramírez-Espinosa, Elisabeth de Carvalho, Thomas L. Marzetta

Abstract: This paper proposes a communication model for multiuser multiple-input multiple-output (MIMO) systems based on large intelligent surfaces (LIS), where the LIS is modeled as a collection of tightly packed antenna elements. The LIS system is first represented in a circuital way, obtaining expressions for the radiated and received powers, as well as for the coupling between the distinct elements. The… ▽ More This paper proposes a communication model for multiuser multiple-input multiple-output (MIMO) systems based on large intelligent surfaces (LIS), where the LIS is modeled as a collection of tightly packed antenna elements. The LIS system is first represented in a circuital way, obtaining expressions for the radiated and received powers, as well as for the coupling between the distinct elements. Then, this circuital model is used to characterize the channel in a line-of-sight propagation scenario, rendering the basis for the analysis and design of MIMO systems. Due to the particular properties of LIS, the model accounts for superdirectivity and mutual coupling effects along with near field propagation, necessary in those situations where the array dimension becomes very large. Finally, with the proposed model, the matched filter transmitter and the weighted minimum mean square error precoding are derived under both realistic constraints: limited radiated power and maximum ohmic losses. △ Less

Submitted 2 November, 2020; originally announced November 2020.

Comments: 6 pages, 3 figures; This paper is submitted to IEEE International Conference on Communications (ICC) 2021

arXiv:2010.10727 [pdf, other]

Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE Paradigm

Authors: Jennifer Williams, Yi Zhao, Erica Cooper, Junichi Yamagishi

Abstract: We present a new approach to disentangle speaker voice and phone content by introducing new components to the VQ-VAE architecture for speech synthesis. The original VQ-VAE does not generalize well to unseen speakers or content. To alleviate this problem, we have incorporated a speaker encoder and speaker VQ codebook that learns global speaker characteristics entirely separate from the existing sub… ▽ More We present a new approach to disentangle speaker voice and phone content by introducing new components to the VQ-VAE architecture for speech synthesis. The original VQ-VAE does not generalize well to unseen speakers or content. To alleviate this problem, we have incorporated a speaker encoder and speaker VQ codebook that learns global speaker characteristics entirely separate from the existing sub-phone codebooks. We also compare two training methods: self-supervised with global conditions and semi-supervised with speaker labels. Adding a speaker VQ component improves objective measures of speech synthesis quality (estimated MOS, speaker similarity, ASR-based intelligibility) and provides learned representations that are meaningful. Our speaker VQ codebook indices can be used in a simple speaker diarization task and perform slightly better than an x-vector baseline. Additionally, phones can be recognized from sub-phone VQ codebook indices in our semi-supervised VQ-VAE better than self-supervised with global conditions. △ Less

Submitted 10 February, 2021; v1 submitted 20 October, 2020; originally announced October 2020.

Comments: Accepted to ICASSP 2021

arXiv:2008.02051 [pdf, ps, other]

Backward Simulation for Sets of Trajectories

Authors: Yuxuan Xia, Lennart Svensson, Ángel F. García-Fernández, Karl Granström, Jason L. Williams

Abstract: This paper presents a solution for recovering full trajectory information, via the calculation of the posterior of the set of trajectories, from a sequence of multitarget (unlabelled) filtering densities and the multitarget dynamic model. Importantly, the proposed solution opens an avenue of trajectory estimation possibilities for multitarget filters that do not explicitly estimate trajectories. I… ▽ More This paper presents a solution for recovering full trajectory information, via the calculation of the posterior of the set of trajectories, from a sequence of multitarget (unlabelled) filtering densities and the multitarget dynamic model. Importantly, the proposed solution opens an avenue of trajectory estimation possibilities for multitarget filters that do not explicitly estimate trajectories. In this paper, we first derive a general multitrajectory forward-backward smoothing equation based on sets of trajectories and the random finite set framework. Then we show how to sample sets of trajectories using backward simulation when the multitarget filtering densities are multi-Bernoulli processes. The proposed approach is demonstrated in a simulation study. △ Less

Submitted 22 February, 2021; v1 submitted 5 August, 2020; originally announced August 2020.

Comments: Published in 23rd International Conference on Information Fusion. This arXiv version contains more detailed derivations

arXiv:2007.15652 [pdf, other]

doi 10.1002/rob.22006

Canopy Density Estimation in Perennial Horticulture Crops Using 3D Spinning Lidar SLAM

Authors: Thomas Lowe, Peyman Moghadam, Everard Edwards, Jason Williams

Abstract: We propose a novel, canopy density estimation solution using a 3D ray cloud representation for perennial horticultural crops at the field scale. To attain high spatial and temporal fidelity in field conditions, we propose the application of continuous-time 3D SLAM (Simultaneous Localisation and Map**) to a spinning lidar payload (AgScan3D) mounted on a moving farm vehicle. The AgScan3D data is p… ▽ More We propose a novel, canopy density estimation solution using a 3D ray cloud representation for perennial horticultural crops at the field scale. To attain high spatial and temporal fidelity in field conditions, we propose the application of continuous-time 3D SLAM (Simultaneous Localisation and Map**) to a spinning lidar payload (AgScan3D) mounted on a moving farm vehicle. The AgScan3D data is processed through a Continuous-Time SLAM algorithm into a globally registered 3D ray cloud. The global ray cloud is a canonical data format (a digital twin) from which we can compare vineyard snapshots over multiple times within a season and across seasons. Then, the vineyard rows are automatically extracted from the ray cloud and a novel density calculation is performed to estimate the maximum likelihood canopy densities of the vineyard. This combination of digital twinning, together with the accurate extraction of canopy structure information, allows entire vineyards to be analysed and compared, across the growing season and from year to year. The proposed method is evaluated both in simulation and field experiments. Field experiments were performed at four sites, which varied in vineyard structure and vine management, over two growing seasons and 64 data collection campaigns, resulting in a total traversal of 160 kilometres, 42.4 scanned hectares of vines with a combined total of approximately 93,000 scanned vines. Our experiments show canopy density repeatability of 3.8% (Relative RMSE) per vineyard panel, for acquisition speeds of 5-6 km/h, and under half the standard deviation in estimated densities when compared to an industry standard gap-fraction based solution. The code and field datasets are available at https://github.com/csiro-robotics/agscan3d. △ Less

Submitted 14 December, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

Comments: Accepted to Journal of Field Robotics. More information at https://github.com/csiro-robotics/agscan3d

arXiv:2006.06563 [pdf, other]

A Primer on Large Intelligent Surface (LIS) for Wireless Sensing in an Industrial Setting

Authors: Cristian J. Vaca-Rubio, Pablo Ramirez-Espinosa, Robin Jess Williams, Kimmo Kansanen, Zheng-Hua Tan, Elisabeth de Carvalho, Petar Popovski

Abstract: One of the beyond-5G developments that is often highlighted is the integration of wireless communication and radio sensing. This paper addresses the potential of communication-sensing integration of Large Intelligent Surfaces (LIS) in an exemplary Industry 4.0 scenario. Besides the potential for high throughput and efficient multiplexing of wireless links, an LIS can offer a high-resolution render… ▽ More One of the beyond-5G developments that is often highlighted is the integration of wireless communication and radio sensing. This paper addresses the potential of communication-sensing integration of Large Intelligent Surfaces (LIS) in an exemplary Industry 4.0 scenario. Besides the potential for high throughput and efficient multiplexing of wireless links, an LIS can offer a high-resolution rendering of the propagation environment. This is because, in an indoor setting, it can be placed in proximity to the sensed phenomena, while the high resolution is offered by densely spaced tiny antennas deployed over a large area. By treating an LIS as a radio image of the environment, we develop sensing techniques that leverage the usage of computer vision combined with machine learning. We test these methods for a scenario where we need to detect whether an industrial robot deviates from a predefined route. The results show that the LIS-based sensing offers high precision and has a high application potential in indoor industrial environments. △ Less

Submitted 16 November, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

arXiv:2005.07884 [pdf, other]

Improved Prosody from Learned F0 Codebook Representations for VQ-VAE Speech Waveform Reconstruction

Authors: Yi Zhao, Haoyu Li, Cheng-I Lai, Jennifer Williams, Erica Cooper, Junichi Yamagishi

Abstract: Vector Quantized Variational AutoEncoders (VQ-VAE) are a powerful representation learning framework that can discover discrete groups of features from a speech signal without supervision. Until now, the VQ-VAE architecture has previously modeled individual types of speech features, such as only phones or only F0. This paper introduces an important extension to VQ-VAE for learning F0-related supras… ▽ More Vector Quantized Variational AutoEncoders (VQ-VAE) are a powerful representation learning framework that can discover discrete groups of features from a speech signal without supervision. Until now, the VQ-VAE architecture has previously modeled individual types of speech features, such as only phones or only F0. This paper introduces an important extension to VQ-VAE for learning F0-related suprasegmental information simultaneously along with traditional phone features.The proposed framework uses two encoders such that the F0 trajectory and speech waveform are both input to the system, therefore two separate codebooks are learned. We used a WaveRNN vocoder as the decoder component of VQ-VAE. Our speaker-independent VQ-VAE was trained with raw speech waveforms from multi-speaker Japanese speech databases. Experimental results show that the proposed extension reduces F0 distortion of reconstructed speech for all unseen test speakers, and results in significantly higher preference scores from a listening test. We additionally conducted experiments using single-speaker Mandarin speech to demonstrate advantages of our architecture in another language which relies heavily on F0. △ Less

Submitted 16 May, 2020; originally announced May 2020.

Comments: Submitted to Interspeech 2020

arXiv:2002.12696 [pdf, other]

Spatiotemporal Constraints for Sets of Trajectories with Applications to PMBM Densities

Authors: Karl Granström, Lennart Svensson, Yuxuan Xia, Angel F. Garcia-Fernandez, Jason Williams

Abstract: In this paper we introduce spatiotemporal constraints for trajectories, i.e., restrictions that the trajectory must be in some part of the state space (spatial constraint) at some point in time (temporal constraint). Spatiotemporal contraints on trajectories can be used to answer a range of important questions, including, e.g., "where did the person that were in area A at time t, go afterwards?".… ▽ More In this paper we introduce spatiotemporal constraints for trajectories, i.e., restrictions that the trajectory must be in some part of the state space (spatial constraint) at some point in time (temporal constraint). Spatiotemporal contraints on trajectories can be used to answer a range of important questions, including, e.g., "where did the person that were in area A at time t, go afterwards?". We discuss how multiple constraints can be combined into sets of constraints, and we then apply sets of constraints to set of trajectories densities, specifically Poisson Multi-Bernoulli Mixture (PMBM) densities. For Poisson target birth, the exact posterior density is PMBM for both point targets and extended targets. In the paper we show that if the unconstrained set of trajectories density is PMBM, then the constrained density is also PMBM. Examples of constrained trajectory densities motivate and illustrate the key results. △ Less

Submitted 28 February, 2020; originally announced February 2020.

arXiv:2002.12645 [pdf, other]

Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis

Authors: Jennifer Williams, Joanna Rownicka, Pilar Oplustil, Simon King

Abstract: We aim to characterize how different speakers contribute to the perceived output quality of multi-speaker Text-to-Speech (TTS) synthesis. We automatically rate the quality of TTS using a neural network (NN) trained on human mean opinion score (MOS) ratings. First, we train and evaluate our NN model on 13 different TTS and voice conversion (VC) systems from the ASVSpoof 2019 Logical Access (LA) Dat… ▽ More We aim to characterize how different speakers contribute to the perceived output quality of multi-speaker Text-to-Speech (TTS) synthesis. We automatically rate the quality of TTS using a neural network (NN) trained on human mean opinion score (MOS) ratings. First, we train and evaluate our NN model on 13 different TTS and voice conversion (VC) systems from the ASVSpoof 2019 Logical Access (LA) Dataset. Since it is not known how best to represent speech for this task, we compare 8 different representations alongside MOSNet frame-based features. Our representations include image-based spectrogram features and x-vector embeddings that explicitly model different types of noise such as T60 reverberation time. Our NN predicts MOS with a high correlation to human judgments. We report prediction correlation and error. A key finding is the quality achieved for certain speakers seems consistent, regardless of the TTS or VC system. It is widely accepted that some speakers give higher quality than others for building a TTS system: our method provides an automatic way to identify such speakers. Finally, to see if our quality prediction models generalize, we predict quality scores for synthetic speech using a separate multi-speaker TTS system that was trained on LibriTTS data, and conduct our own MOS listening test to compare human ratings with our NN predictions. △ Less

Submitted 27 April, 2020; v1 submitted 28 February, 2020; originally announced February 2020.

Comments: accepted at Speaker Odyssey 2020

arXiv:2001.10822 [pdf, other]

Lattice-based Improvements for Voice Triggering Using Graph Neural Networks

Authors: Pranay Dighe, Saurabh Adya, Nuoyu Li, Srikanth Vishnubhotla, Devang Naik, Adithya Sagar, Ying Ma, Stephen Pulman, Jason Williams

Abstract: Voice-triggered smart assistants often rely on detection of a trigger-phrase before they start listening for the user request. Mitigation of false triggers is an important aspect of building a privacy-centric non-intrusive smart assistant. In this paper, we address the task of false trigger mitigation (FTM) using a novel approach based on analyzing automatic speech recognition (ASR) lattices using… ▽ More Voice-triggered smart assistants often rely on detection of a trigger-phrase before they start listening for the user request. Mitigation of false triggers is an important aspect of building a privacy-centric non-intrusive smart assistant. In this paper, we address the task of false trigger mitigation (FTM) using a novel approach based on analyzing automatic speech recognition (ASR) lattices using graph neural networks (GNN). The proposed approach uses the fact that decoding lattice of a falsely triggered audio exhibits uncertainties in terms of many alternative paths and unexpected words on the lattice arcs as compared to the lattice of a correctly triggered audio. A pure trigger-phrase detector model doesn't fully utilize the intent of the user speech whereas by using the complete decoding lattice of user audio, we can effectively mitigate speech not intended for the smart assistant. We deploy two variants of GNNs in this paper based on 1) graph convolution layers and 2) self-attention mechanism respectively. Our experiments demonstrate that GNNs are highly accurate in FTM task by mitigating ~87% of false triggers at 99% true positive rate (TPR). Furthermore, the proposed models are fast to train and efficient in parameter requirements. △ Less

Submitted 24 January, 2020; originally announced January 2020.

arXiv:1912.08718 [pdf, other]

Poisson Multi-Bernoulli Mixtures for Sets of Trajectories

Authors: Karl Granström, Lennart Svensson, Yuxuan Xia, Jason Williams, Ángel F. García-Fernández

Abstract: For the standard point target model with Poisson birth process, the Poisson Multi-Bernoulli Mixture (PMBM) is a conjugate multi-target density. The PMBM filter for sets of targets has been shown to have state-of-the-art performance and a structure similar to the Multiple Hypothesis Tracker (MHT). In this paper we consider a recently developed formulation of multiple target tracking as a random fin… ▽ More For the standard point target model with Poisson birth process, the Poisson Multi-Bernoulli Mixture (PMBM) is a conjugate multi-target density. The PMBM filter for sets of targets has been shown to have state-of-the-art performance and a structure similar to the Multiple Hypothesis Tracker (MHT). In this paper we consider a recently developed formulation of multiple target tracking as a random finite set (RFS) of trajectories, and present three important and interesting results. First, we show that, for the standard point target model, the PMBM density is conjugate also for sets of trajectories. Second, based on this we develop PMBM trackers (trajectory RFS filters) that efficiently estimate the set of trajectories. Third, we establish that for the standard point target model the multi-trajectory density is PMBM for trajectories in any time window, given measurements in any (possibly non-overlap**) time window. In addition, the PMBM trackers are evaluated in a simulation study, and shown to yield state-of-the-art performance. △ Less

Submitted 17 December, 2019; originally announced December 2019.

Comments: arXiv admin note: text overlap with arXiv:1812.05131

arXiv:1912.06644 [pdf, other]

A Communication Model for Large Intelligent Surfaces

Authors: Robin Jess Williams, Elisabeth De Carvalho, Thomas L. Marzetta

Abstract: The purpose of this paper is to introduce a communication model for Large Intelligent Surfaces (LIS). A LIS is modelled as a collection of tiny closely spaced antenna elements. Due to the proximity of the elements, mutual coupling arises. An optimal transmitter design depends on the mutual coupling matrix. For single user communication, the optimal transmitter uses the inverse of the mutual coupli… ▽ More The purpose of this paper is to introduce a communication model for Large Intelligent Surfaces (LIS). A LIS is modelled as a collection of tiny closely spaced antenna elements. Due to the proximity of the elements, mutual coupling arises. An optimal transmitter design depends on the mutual coupling matrix. For single user communication, the optimal transmitter uses the inverse of the mutual coupling matrix in a filter matched to the channel vector. We give the expression of the mutual coupling for two types of planar arrays. The conditioning number of the mutual coupling matrix is unbounded as the antenna element density increases, so only the dominant values can be inverted within reasonable computation. The directivity is partial but still significant compared to the conventional gain. When the spacing between elements becomes small (smaller than half a wavelength), the directivity surpasses the conventional directivity equal to the number of antennas, as well as the gain obtained when modelling the surface as continuous. The gain is theoretically unbounded as the element density increases for a constant aperture. △ Less

Submitted 4 May, 2020; v1 submitted 13 December, 2019; originally announced December 2019.

Comments: 6 pages, 7 figures; typos corrected

arXiv:1912.01748 [pdf, ps, other]

Multi-Scan Implementation of the Trajectory Poisson Multi-Bernoulli Mixture Filter

Authors: Yuxuan Xia, Karl Granström, Lennart Svensson, Ángel F. García-Fernández, Jason L. Williams

Abstract: The Poisson multi-Bernoulli mixture (PMBM) and the multi-Bernoulli mixture (MBM) are two multi-target distributions for which closed-form filtering recursions exist. The PMBM has a Poisson birth process, whereas the MBM has a multi-Bernoulli birth process. This paper considers a recently developed formulation of the multi-target tracking problem using a random finite set of trajectories, through w… ▽ More The Poisson multi-Bernoulli mixture (PMBM) and the multi-Bernoulli mixture (MBM) are two multi-target distributions for which closed-form filtering recursions exist. The PMBM has a Poisson birth process, whereas the MBM has a multi-Bernoulli birth process. This paper considers a recently developed formulation of the multi-target tracking problem using a random finite set of trajectories, through which the track continuity is explicitly established. A multi-scan trajectory PMBM filter and a multi-scan trajectory MBM filter, with the ability to correct past data association decisions to improve current decisions, are presented. In addition, a multi-scan trajectory $\text{MBM}_{01}$ filter, in which the existence probabilities of all Bernoulli components are either 0 or 1, is presented. This paper proposes an efficient implementation that performs track-oriented $N$-scan pruning to limit computational complexity, and uses dual decomposition to solve the involved multi-frame assignment problem. The performance of the presented multi-target trackers, applied with an efficient fixed-lag smoothing method, are evaluated in a simulation study. △ Less

Submitted 27 February, 2020; v1 submitted 3 December, 2019; originally announced December 2019.

Comments: Published in Journals of Advances in Information Fusion, Special issue on Multiple Hypothesis Tracking, Volume 14, Number 2, Page 213-235, December 2019. MATLAB code is available at https://github.com/yuhsuansia/Multi-scan-trajectory-PMBM-filter

Journal ref: Journal of Advances in Information Fusion Volume 14 Number 2 December 2019

arXiv:1911.09025 [pdf, ps, other]

Extended target Poisson multi-Bernoulli mixture trackers based on sets of trajectories

Authors: Yuxuan Xia, Karl Granström, Lennart Svensson, Ángel F. García-Fernández, Jason L. Williams

Abstract: The Poisson multi-Bernoulli mixture (PMBM) is a multi-target distribution for which the prediction and update are closed. By applying the random finite set (RFS) framework to multi-target tracking with sets of trajectories as the variable of interest, the PMBM trackers can efficiently estimate the set of target trajectories. This paper derives two trajectory RFS filters for extended target trackin… ▽ More The Poisson multi-Bernoulli mixture (PMBM) is a multi-target distribution for which the prediction and update are closed. By applying the random finite set (RFS) framework to multi-target tracking with sets of trajectories as the variable of interest, the PMBM trackers can efficiently estimate the set of target trajectories. This paper derives two trajectory RFS filters for extended target tracking, called extended target PMBM trackers. Compared to the extended target PMBM filter based on sets on targets, explicit track continuity between time steps is provided in the extended target PMBM trackers. △ Less

Submitted 19 November, 2019; originally announced November 2019.

Comments: MATLAB code is available at https://github.com/yuhsuansia/Extended-Target-PMBM-Tracker. arXiv admin note: text overlap with arXiv:1812.05131

Journal ref: Proceedings of the 22nd International Conference on Information Fusion, 2019

arXiv:1908.08819 [pdf, other]

Gaussian implementation of the multi-Bernoulli mixture filter

Authors: Ángel F. García-Fernández, Yuxuan Xia, Karl Granström, Lennart Svensson, Jason L. Williams

Abstract: This paper presents the Gaussian implementation of the multi-Bernoulli mixture (MBM) filter. The MBM filter provides the filtering (multi-target) density for the standard dynamic and radar measurement models when the birth model is multi-Bernoulli or multi-Bernoulli mixture. Under linear/Gaussian models, the single target densities of the MBM mixture admit Gaussian closed-form expressions. Murty's… ▽ More This paper presents the Gaussian implementation of the multi-Bernoulli mixture (MBM) filter. The MBM filter provides the filtering (multi-target) density for the standard dynamic and radar measurement models when the birth model is multi-Bernoulli or multi-Bernoulli mixture. Under linear/Gaussian models, the single target densities of the MBM mixture admit Gaussian closed-form expressions. Murty's algorithm is used to select the global hypotheses with highest weights. The MBM filter is compared with other algorithms in the literature via numerical simulations. △ Less

Submitted 23 August, 2019; originally announced August 2019.

Comments: Matlab code of the MBM and PMBM filters is provided in https://github.com/Agarciafernandez/MTT . Additional information on MTT including PMBM and MBM filters can be found in the online course https://www.youtube.com/channel/UCa2-fpj6AV8T6JK1uTRuFpw

Journal ref: Proceedings of the 22nd International Conference on Information Fusion, 2019

arXiv:1812.05131 [pdf, other]

Poisson multi-Bernoulli mixture trackers: continuity through random finite sets of trajectories

Authors: Karl Granström, Lennart Svensson, Yuxuan Xia, Jason Williams, Angel F Garcia-Fernandez

Abstract: The Poisson multi-Bernoulli mixture (PMBM) is an unlabelled multi-target distribution for which the prediction and update are closed. It has a Poisson birth process, and new Bernoulli components are generated on each new measurement as a part of the Bayesian measurement update. The PMBM filter is similar to the multiple hypothesis tracker (MHT), but seemingly does not provide explicit continuity b… ▽ More The Poisson multi-Bernoulli mixture (PMBM) is an unlabelled multi-target distribution for which the prediction and update are closed. It has a Poisson birth process, and new Bernoulli components are generated on each new measurement as a part of the Bayesian measurement update. The PMBM filter is similar to the multiple hypothesis tracker (MHT), but seemingly does not provide explicit continuity between time steps. This paper considers a recently developed formulation of the multi-target tracking problem as a random finite set (RFS) of trajectories, and derives two trajectory RFS filters, called PMBM trackers. The PMBM trackers efficiently estimate the set of trajectories, and share hypothesis structure with the PMBM filter. By showing that the prediction and update in the PMBM filter can be viewed as an efficient method for calculating the time marginals of the RFS of trajectories, continuity in the same sense as MHT is established for the PMBM filter. △ Less

Submitted 12 December, 2018; originally announced December 2018.

arXiv:1801.01353 [pdf, other]

Poisson Multi-Bernoulli Approximations for Multiple Extended Object Filtering

Authors: Yuxuan Xia, Karl Granström, Lennart Svensson, Maryam Fatemi, Ángel F. García-Fernández, Jason L. Williams

Abstract: The Poisson multi-Bernoulli mixture (PMBM) is a multi-object conjugate prior for the closed-form Bayes random finite sets filter. The extended object PMBM filter provides a closed-form solution for multiple extended object filtering with standard models. This paper considers computationally lighter alternatives to the extended object PMBM filter by propagating a Poisson multi-Bernoulli (PMB) densi… ▽ More The Poisson multi-Bernoulli mixture (PMBM) is a multi-object conjugate prior for the closed-form Bayes random finite sets filter. The extended object PMBM filter provides a closed-form solution for multiple extended object filtering with standard models. This paper considers computationally lighter alternatives to the extended object PMBM filter by propagating a Poisson multi-Bernoulli (PMB) density through the filtering recursion. A new local hypothesis representation is presented where each measurement creates a new Bernoulli component. This facilitates the developments of methods for efficiently approximating the PMBM posterior density after the update step as a PMB. Based on the new hypothesis representation, two approximation methods are presented: one is based on the track-oriented multi-Bernoulli (MB) approximation, and the other is based on the variational MB approximation via Kullback-Leibler divergence minimisation. The performance of the proposed PMB filters with gamma Gaussian inverse-Wishart implementations are evaluated in a simulation study. △ Less

Submitted 13 August, 2021; v1 submitted 4 January, 2018; originally announced January 2018.

Comments: Accepted for publication in IEEE T-AES

arXiv:1607.07942 [pdf, other]

Multiple scan data association by convex variational inference

Authors: Jason L. Williams, Roslyn A. Lau

Abstract: Data association, the reasoning over correspondence between targets and measurements, is a problem of fundamental importance in target tracking. Recently, belief propagation (BP) has emerged as a promising method for estimating the marginal probabilities of measurement to target association, providing fast, accurate estimates. The excellent performance of BP in the particular formulation used may… ▽ More Data association, the reasoning over correspondence between targets and measurements, is a problem of fundamental importance in target tracking. Recently, belief propagation (BP) has emerged as a promising method for estimating the marginal probabilities of measurement to target association, providing fast, accurate estimates. The excellent performance of BP in the particular formulation used may be attributed to the convexity of the underlying free energy which it implicitly optimises. This paper studies multiple scan data association problems, i.e., problems that reason over correspondence between targets and several sets of measurements, which may correspond to different sensors or different time steps. We find that the multiple scan extension of the single scan BP formulation is non-convex and demonstrate the undesirable behaviour that can result. A convex free energy is constructed using the recently proposed fractional free energy (FFE). A convergent, BP-like algorithm is provided for the single scan FFE, and employed in optimising the multiple scan free energy using primal-dual coordinate ascent. Finally, based on a variational interpretation of joint probabilistic data association (JPDA), we develop a sequential variant of the algorithm that is similar to JPDA, but retains consistency constraints from prior scans. The performance of the proposed methods is demonstrated on a bearings only target localisation problem. △ Less

Submitted 23 January, 2018; v1 submitted 26 July, 2016; originally announced July 2016.

arXiv:1403.5022 [pdf, other]

doi 10.1109/TSP.2014.2370946

An efficient, variational approximation of the best fitting multi-Bernoulli filter

Authors: Jason L. Williams

Abstract: The joint probabilistic data association (JPDA) filter is a popular tracking methodology for problems involving well-spaced targets, but it is rarely applied in problems with closely-spaced targets due to its complexity in these cases, and due to the well-known phenomenon of coalescence. This paper addresses these difficulties using random finite sets (RFSs) and variational inference, deriving a h… ▽ More The joint probabilistic data association (JPDA) filter is a popular tracking methodology for problems involving well-spaced targets, but it is rarely applied in problems with closely-spaced targets due to its complexity in these cases, and due to the well-known phenomenon of coalescence. This paper addresses these difficulties using random finite sets (RFSs) and variational inference, deriving a highly tractable, approximate method for obtaining the multi-Bernoulli distribution that minimises the set Kullback-Leibler (KL) divergence from the true posterior, working within the RFS framework to incorporate uncertainty in target existence. The derivation is interpreted as an application of expectation-maximisation (EM), where the missing data is the correspondence of Bernoulli components (i.e., tracks) under each data association hypothesis. The missing data is shown to play an identical role to the selection of an ordered distribution in the same ordered family in the set JPDA algorithm. Subsequently, a special case of the proposed method is utilised to provide an efficient approximation of the minimum mean optimal sub-pattern assignment estimator. The performance of the proposed methods is demonstrated in challenging scenarios in which up to twenty targets come into close proximity. △ Less

Submitted 18 November, 2014; v1 submitted 19 March, 2014; originally announced March 2014.

Comments: Accepted, IEEE Transactions on Signal Processing, http://dx.doi.org/10.1109/TSP.2014.2370946

Journal ref: IEEE Transactions on Signal Processing, vol 63, no 1, pp 258-273, January 2015

arXiv:1203.2995 [pdf, other]

doi 10.1109/TAES.2015.130550

Marginal multi-Bernoulli filters: RFS derivation of MHT, JIPDA and association-based MeMBer

Authors: Jason L. Williams

Abstract: Recent developments in random finite sets (RFSs) have yielded a variety of tracking methods that avoid data association. This paper derives a form of the full Bayes RFS filter and observes that data association is implicitly present, in a data structure similar to MHT. Subsequently, algorithms are obtained by approximating the distribution of associations. Two algorithms result: one nearly identic… ▽ More Recent developments in random finite sets (RFSs) have yielded a variety of tracking methods that avoid data association. This paper derives a form of the full Bayes RFS filter and observes that data association is implicitly present, in a data structure similar to MHT. Subsequently, algorithms are obtained by approximating the distribution of associations. Two algorithms result: one nearly identical to JIPDA, and another related to the MeMBer filter. Both improve performance in challenging environments. △ Less

Submitted 24 August, 2016; v1 submitted 13 March, 2012; originally announced March 2012.

Comments: Journal version at http://ieeexplore.ieee.org/document/7272821. Matlab code of simple implementation included with ancillary files

Journal ref: IEEE Transactions on Aerospace and Electronic Systems, vol 51, no 3, pp 1664-1687, July 2015

arXiv:1203.2992 [pdf, other]

Hybrid Poisson and multi-Bernoulli filters

Authors: Jason L. Williams

Abstract: The probability hypothesis density (PHD) and multi-target multi-Bernoulli (MeMBer) filters are two leading algorithms that have emerged from random finite sets (RFS). In this paper we study a method which combines these two approaches. Our work is motivated by a sister paper, which proves that the full Bayes RFS filter naturally incorporates a Poisson component representing targets that have never… ▽ More The probability hypothesis density (PHD) and multi-target multi-Bernoulli (MeMBer) filters are two leading algorithms that have emerged from random finite sets (RFS). In this paper we study a method which combines these two approaches. Our work is motivated by a sister paper, which proves that the full Bayes RFS filter naturally incorporates a Poisson component representing targets that have never been detected, and a linear combination of multi-Bernoulli components representing targets under track. Here we demonstrate the benefit (in speed of track initiation) that maintenance of a Poisson component of undetected targets provides. Subsequently, we propose a method of recycling, which projects Bernoulli components with a low probability of existence onto the Poisson component (as opposed to deleting them). We show that this allows us to achieve similar tracking performance using a fraction of the number of Bernoulli components (i.e., tracks). △ Less

Submitted 13 March, 2012; originally announced March 2012.

Comments: Submitted to 15th International Conference on Information Fusion (2012)

arXiv:1105.3298 [pdf, other]

Graphical model approximations of random finite set filters

Authors: Jason L. Williams

Abstract: Random finite sets (RFSs) has been a fruitful area of research in recent years, yielding new approximate filters such as the probability hypothesis density (PHD), cardinalised PHD (CPHD), and multiple target multi-Bernoulli (MeMBer). These new methods have largely been based on approximations that side-step the need for measurement-to-track association. Comparably, RFS methods that incorporate dat… ▽ More Random finite sets (RFSs) has been a fruitful area of research in recent years, yielding new approximate filters such as the probability hypothesis density (PHD), cardinalised PHD (CPHD), and multiple target multi-Bernoulli (MeMBer). These new methods have largely been based on approximations that side-step the need for measurement-to-track association. Comparably, RFS methods that incorporate data association, such as Morelande and Challa's (M-C) method, have received little attention. This paper provides a RFS algorithm that incorporates data association similarly to the M-C method, but retains computational tractability via a recently developed approximation of marginal association weights. We describe an efficient method for resolving the track coalescence phenomenon which is problematic for joint probabilistic data association (JPDA) and related methods (including M-C). The method utilises a network flow optimisation, and thus is tractable for large numbers of targets. Finally, our derivation also shows that it is natural for the multi-target density to incorporate both a Poisson point process (PPP) component (representing targets that have never been detected) and a multi-Bernoulli component (representing targets under track). We describe a method of recycling, in which tracks with a low probability existence are transferred from the multi-Bernoulli component to the PPP component, effectively yielding a hybrid of M-C and PHD. △ Less

Submitted 30 August, 2011; v1 submitted 17 May, 2011; originally announced May 2011.

Comments: Extended version; first version submitted to Fusion 2011

Showing 1–50 of 50 results for author: Williams, J