Skip to main content

Showing 1–16 of 16 results for author: Jackson, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.00495  [pdf, other

    eess.AS cs.CV cs.SD

    Audio-Visual Talker Localization in Video for Spatial Sound Reproduction

    Authors: Davide Berghi, Philip J. B. Jackson

    Abstract: Object-based audio production requires the positional metadata to be defined for each point-source object, including the key elements in the foreground of the sound scene. In many media production use cases, both cameras and microphones are employed to make recordings, and the human voice is often a key element. In this research, we detect and locate the active speaker in the video, facilitating t… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  2. arXiv:2312.14021  [pdf, other

    eess.AS cs.LG cs.SD eess.IV eess.SP

    Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization

    Authors: Davide Berghi, Philip J. B. Jackson

    Abstract: Conventional audio-visual approaches for active speaker detection (ASD) typically rely on visually pre-extracted face tracks and the corresponding single-channel audio to find the speaker in a video. Therefore, they tend to fail every time the face of the speaker is not visible. We demonstrate that a simple audio convolutional recurrent neural network (CRNN) trained with spatial input features ext… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  3. arXiv:2312.09034  [pdf, other

    eess.AS cs.SD eess.IV

    Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection

    Authors: Davide Berghi, Peipei Wu, **zheng Zhao, Wenwu Wang, Philip J. B. Jackson

    Abstract: Sound event localization and detection (SELD) combines two subtasks: sound event detection (SED) and direction of arrival (DOA) estimation. SELD is usually tackled as an audio-only problem, but visual information has been recently included. Few audio-visual (AV)-SELD works have been published and most employ vision via face/object bounding boxes, or human pose keypoints. In contrast, we explore th… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: ICASSP 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  4. arXiv:2310.14778  [pdf, other

    cs.MM cs.SD eess.AS

    Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

    Authors: **zheng Zhao, Yong Xu, Xinyuan Qian, Davide Berghi, Peipei Wu, Meng Cui, Jianyuan Sun, Philip J. B. Jackson, Wenwu Wang

    Abstract: Audio-visual speaker tracking has drawn increasing attention over the past few years due to its academic values and wide application. Audio and visual modalities can provide complementary information for localization and tracking. With audio and visual information, the Bayesian-based filter can solve the problem of data association, audio-visual fusion and track management. In this paper, we condu… ▽ More

    Submitted 17 December, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

  5. arXiv:2307.14739  [pdf, other

    eess.AS eess.SP

    Audio Inputs for Active Speaker Detection and Localization via Microphone Array

    Authors: Davide Berghi, Philip J. B. Jackson

    Abstract: This study considers the problem of detecting and locating an active talker's horizontal position from multichannel audio captured by a microphone array. We refer to this as active speaker detection and localization (ASDL). Our goal was to investigate the performance of spatial acoustic features extracted from the multichannel audio as the input of a convolutional recurrent neural network (CRNN),… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  6. arXiv:2212.01892  [pdf, other

    eess.AS cs.MM cs.SD

    Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research

    Authors: Davide Berghi, Marco Volino, Philip J. B. Jackson

    Abstract: 3D audio-visual production aims to deliver immersive and interactive experiences to the consumer. Yet, faithfully reproducing real-world 3D scenes remains a challenging task. This is partly due to the lack of available datasets enabling audio-visual research in this direction. In most of the existing multi-view datasets, the accompanying audio is neglected. Similarly, datasets for spatial audio re… ▽ More

    Submitted 4 December, 2022; originally announced December 2022.

  7. arXiv:2203.03291  [pdf, other

    eess.AS cs.SD eess.IV

    Visually Supervised Speaker Detection and Localization via Microphone Array

    Authors: Davide Berghi, Adrian Hilton, Philip J. B. Jackson

    Abstract: Active speaker detection (ASD) is a multi-modal task that aims to identify who, if anyone, is speaking from a set of candidates. Current audio-visual approaches for ASD typically rely on visually pre-extracted face tracks (sequences of consecutive face crops) and the respective monaural audio. However, their recall rate is often low as only the visible faces are included in the set of candidates.… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: Erratum: Due to a bug in the evaluation script, the correct average distance (aD) metric is here reported in yellow. The analysis remains unchanged from the original paper as the trend between the old and new measures are perfectly monotonic. The bug was caused by an incorrect normalization factor

    Journal ref: IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), 2021

  8. arXiv:2201.00655  [pdf, other

    eess.SY cs.AI

    Formal Verification of Unknown Dynamical Systems via Gaussian Process Regression

    Authors: John Jackson, Luca Laurenti, Eric Frew, Morteza Lahijanian

    Abstract: Leveraging autonomous systems in safety-critical scenarios requires verifying their behaviors in the presence of uncertainties and black-box components that influence the system dynamics. In this article, we develop a framework for verifying partially-observable, discrete-time dynamical systems with unmodelled dynamics against temporal logic specifications from a given input-output dataset. The ve… ▽ More

    Submitted 31 December, 2021; originally announced January 2022.

  9. arXiv:2112.08644  [pdf

    eess.IV cs.CV

    A comparative study of paired versus unpaired deep learning methods for physically enhancing digital rock image resolution

    Authors: Yufu Niu, Samuel J. Jackson, Naif Alqahtani, Peyman Mostaghimi, Ryan T. Armstrong

    Abstract: X-ray micro-computed tomography (micro-CT) has been widely leveraged to characterise pore-scale geometry in subsurface porous rock. Recent developments in super resolution (SR) methods using deep learning allow the digital enhancement of low resolution (LR) images over large spatial scales, creating SR images comparable to the high resolution (HR) ground truth. This circumvents traditional resolut… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

    Comments: 26 pages, 11 figures, 4 tables

  10. arXiv:2110.05525  [pdf, other

    eess.SY

    Synergistic Offline-Online Control Synthesis via Local Gaussian Process Regression

    Authors: John Jackson, Luca Laurenti, Eric Frew, Morteza Lahijanian

    Abstract: Autonomous systems often have complex and possibly unknown dynamics due to, e.g., black-box components. This leads to unpredictable behaviors and makes control design with performance guarantees a major challenge. This paper presents a data-driven control synthesis framework for such systems subject to linear temporal logic on finite traces (LTLf) specifications. The framework combines a baseline… ▽ More

    Submitted 8 March, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Updated Prop 1 from published version -- To appear in the 60th IEEE Conf on Decision and Control

  11. arXiv:2105.00641  [pdf, other

    cs.MM cs.SD eess.AS

    Naturalistic audio-visual volumetric sequences dataset of sounding actions for six degree-of-freedom interaction

    Authors: Hanne Stenzel, Davide Berghi, Marco Volino, Philip J. B. Jackson

    Abstract: As audio-visual systems increasingly bring immersive and interactive capabilities into our work and leisure activities, so the need for naturalistic test material grows. New volumetric datasets have captured high-quality 3D video, but accompanying audio is often neglected, making it hard to test an integrated bimodal experience. Designed to cover diverse sound types and features, the presented vol… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

    Comments: for dataset visit cvssp.org/data/navvs; accepted as poster in IEEE VR 2021

  12. Strategy Synthesis for Partially-known Switched Stochastic Systems

    Authors: John Jackson, Luca Laurenti, Eric Frew, Morteza Lahijanian

    Abstract: We present a data-driven framework for strategy synthesis for partially-known switched stochastic systems. The properties of the system are specified using linear temporal logic (LTL) over finite traces (LTLf), which is as expressive as LTL and enables interpretations over finite behaviors. The framework first learns the unknown dynamics via Gaussian process regression. Then, it builds a formal ab… ▽ More

    Submitted 8 March, 2022; v1 submitted 5 April, 2021; originally announced April 2021.

    Comments: Updated Thm 1 and Fig 2 from published version -- 11 pages, to appear in the 2021 Proceedings of the ACM Int. Conf. on Hybrid Systems: Computation and Control (HSCC 2021)

  13. arXiv:2004.01821  [pdf, other

    eess.SY

    Safety Verification of Unknown Dynamical Systems via Gaussian Process Regression

    Authors: John Jackson, Luca Laurenti, Eric Frew, Morteza Lahijanian

    Abstract: The deployment of autonomous systems that operate in unstructured environments necessitates algorithms to verify their safety. This can be challenging due to, e.g., black-box components in the control software, or undermodelled dynamics that prevent model-based verification. We present a novel verification framework for an unknown dynamical system from a given set of noisy observations of the dyna… ▽ More

    Submitted 15 June, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

    Comments: 8 pages, LaTeX; typos corrected, references updated, figures updated

  14. arXiv:2003.06656  [pdf, other

    eess.AS cs.SD eess.IV

    Audio-Visual Spatial Aligment Requirements of Central and Peripheral Object Events

    Authors: Davide Berghi, Hanne Stenzel, Marco Volino, Adrian Hilton, Philip J. B. Jackson

    Abstract: Immersive audio-visual perception relies on the spatial integration of both auditory and visual information which are heterogeneous sensing modalities with different fields of reception and spatial resolution. This study investigates the perceived coherence of audiovisual object events presented either centrally or peripherally with horizontally aligned/misaligned sound. Various object events were… ▽ More

    Submitted 14 March, 2020; originally announced March 2020.

    Comments: Two-pages poster abstract

    Journal ref: IEEE VR 2020

  15. Modeling the Comb Filter Effect and Interaural Coherence for Binaural Source Separation

    Authors: Luca Remaggi, Philip J. B. Jackson, Wenwu Wang

    Abstract: Typical methods for binaural source separation consider only the direct sound as the target signal in a mixture. However, in most scenarios, this assumption limits the source separation performance. It is well known that the early reflections interact with the direct sound, producing acoustic effects at the listening position, e.g. the so-called comb filter effect. In this article, we propose a no… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: IEEE Copyright. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019

  16. arXiv:1906.07552  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks

    Authors: Qiuqiang Kong, Yong Xu, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley

    Abstract: Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available. Both individual sources and mixing filters need to be estimated. In addition, a mixture may contain non-stationary noise which is unseen in the training set. We propose a synt… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

    Comments: 7 pages. Accepted by IJCAI 2019

    Journal ref: International Joint Conference on Artificial Intelligence (IJCAI), 2019, pp. 2747-2753