Skip to main content

Showing 1–26 of 26 results for author: Jackson, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06187  [pdf, other

    cs.CV

    An Effective-Efficient Approach for Dense Multi-Label Action Detection

    Authors: Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton

    Abstract: Unlike the sparse label action detection task, where a single action occurs in each timestamp of a video, in a dense multi-label scenario, actions can overlap. To address this challenging task, it is necessary to simultaneously learn (i) temporal dependencies and (ii) co-occurrence action relationships. Recent approaches model temporal information by extracting multi-scale features through hierarc… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 14 pages. arXiv admin note: substantial text overlap with arXiv:2308.05051

  2. arXiv:2406.00495  [pdf, other

    eess.AS cs.CV cs.SD

    Audio-Visual Talker Localization in Video for Spatial Sound Reproduction

    Authors: Davide Berghi, Philip J. B. Jackson

    Abstract: Object-based audio production requires the positional metadata to be defined for each point-source object, including the key elements in the foreground of the sound scene. In many media production use cases, both cameras and microphones are employed to make recordings, and the human voice is often a key element. In this research, we detect and locate the active speaker in the video, facilitating t… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  3. arXiv:2405.10690  [pdf, other

    cs.CV

    CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video Parsing

    Authors: Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton

    Abstract: Weakly supervised audio-visual video parsing (AVVP) methods aim to detect audible-only, visible-only, and audible-visible events using only video-level labels. Existing approaches tackle this by leveraging unimodal and cross-modal contexts. However, we argue that while cross-modal learning is beneficial for detecting audible-visible events, in the weakly supervised scenario, it negatively impacts… ▽ More

    Submitted 20 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  4. arXiv:2401.00128  [pdf

    cs.LG cs.CV math.OC

    Quantifying intra-tumoral genetic heterogeneity of glioblastoma toward precision medicine using MRI and a data-inclusive machine learning algorithm

    Authors: Lujia Wang, Hairong Wang, Fulvio D'Angelo, Lee Curtin, Christopher P. Sereduk, Gustavo De Leon, Kyle W. Singleton, Javier Urcuyo, Andrea Hawkins-Daarud, Pamela R. Jackson, Chandan Krishna, Richard S. Zimmerman, Devi P. Patra, Bernard R. Bendok, Kris A. Smith, Peter Nakaji, Kliment Donev, Leslie C. Baxter, Maciej M. MrugaĊ‚a, Michele Ceccarelli, Antonio Iavarone, Kristin R. Swanson, Nhan L. Tran, Leland S. Hu, **g Li

    Abstract: Glioblastoma (GBM) is one of the most aggressive and lethal human cancers. Intra-tumoral genetic heterogeneity poses a significant challenge for treatment. Biopsy is invasive, which motivates the development of non-invasive, MRI-based machine learning (ML) models to quantify intra-tumoral genetic heterogeneity for each patient. This capability holds great promise for enabling better therapeutic se… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

    Comments: 36 pages, 8 figures, 3 tables

  5. arXiv:2312.14021  [pdf, other

    eess.AS cs.LG cs.SD eess.IV eess.SP

    Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization

    Authors: Davide Berghi, Philip J. B. Jackson

    Abstract: Conventional audio-visual approaches for active speaker detection (ASD) typically rely on visually pre-extracted face tracks and the corresponding single-channel audio to find the speaker in a video. Therefore, they tend to fail every time the face of the speaker is not visible. We demonstrate that a simple audio convolutional recurrent neural network (CRNN) trained with spatial input features ext… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  6. arXiv:2312.09034  [pdf, other

    eess.AS cs.SD eess.IV

    Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection

    Authors: Davide Berghi, Peipei Wu, **zheng Zhao, Wenwu Wang, Philip J. B. Jackson

    Abstract: Sound event localization and detection (SELD) combines two subtasks: sound event detection (SED) and direction of arrival (DOA) estimation. SELD is usually tackled as an audio-only problem, but visual information has been recently included. Few audio-visual (AV)-SELD works have been published and most employ vision via face/object bounding boxes, or human pose keypoints. In contrast, we explore th… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: ICASSP 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  7. arXiv:2310.14778  [pdf, other

    cs.MM cs.SD eess.AS

    Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

    Authors: **zheng Zhao, Yong Xu, Xinyuan Qian, Davide Berghi, Peipei Wu, Meng Cui, Jianyuan Sun, Philip J. B. Jackson, Wenwu Wang

    Abstract: Audio-visual speaker tracking has drawn increasing attention over the past few years due to its academic values and wide application. Audio and visual modalities can provide complementary information for localization and tracking. With audio and visual information, the Bayesian-based filter can solve the problem of data association, audio-visual fusion and track management. In this paper, we condu… ▽ More

    Submitted 17 December, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

  8. arXiv:2308.05051  [pdf, other

    cs.CV

    PAT: Position-Aware Transformer for Dense Multi-Label Action Detection

    Authors: Faegheh Sardari, Armin Mustafa, Philip J. B. Jackson, Adrian Hilton

    Abstract: We present PAT, a transformer-based network that learns complex temporal co-occurrence action dependencies in a video by exploiting multi-scale temporal features. In existing methods, the self-attention mechanism in transformers loses the temporal positional information, which is essential for robust action detection. To address this issue, we (i) embed relative positional encoding in the self-att… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

  9. arXiv:2212.01892  [pdf, other

    eess.AS cs.MM cs.SD

    Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research

    Authors: Davide Berghi, Marco Volino, Philip J. B. Jackson

    Abstract: 3D audio-visual production aims to deliver immersive and interactive experiences to the consumer. Yet, faithfully reproducing real-world 3D scenes remains a challenging task. This is partly due to the lack of available datasets enabling audio-visual research in this direction. In most of the existing multi-view datasets, the accompanying audio is neglected. Similarly, datasets for spatial audio re… ▽ More

    Submitted 4 December, 2022; originally announced December 2022.

  10. arXiv:2203.03291  [pdf, other

    eess.AS cs.SD eess.IV

    Visually Supervised Speaker Detection and Localization via Microphone Array

    Authors: Davide Berghi, Adrian Hilton, Philip J. B. Jackson

    Abstract: Active speaker detection (ASD) is a multi-modal task that aims to identify who, if anyone, is speaking from a set of candidates. Current audio-visual approaches for ASD typically rely on visually pre-extracted face tracks (sequences of consecutive face crops) and the respective monaural audio. However, their recall rate is often low as only the visible faces are included in the set of candidates.… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

    Comments: Erratum: Due to a bug in the evaluation script, the correct average distance (aD) metric is here reported in yellow. The analysis remains unchanged from the original paper as the trend between the old and new measures are perfectly monotonic. The bug was caused by an incorrect normalization factor

    Journal ref: IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), 2021

  11. arXiv:2105.00641  [pdf, other

    cs.MM cs.SD eess.AS

    Naturalistic audio-visual volumetric sequences dataset of sounding actions for six degree-of-freedom interaction

    Authors: Hanne Stenzel, Davide Berghi, Marco Volino, Philip J. B. Jackson

    Abstract: As audio-visual systems increasingly bring immersive and interactive capabilities into our work and leisure activities, so the need for naturalistic test material grows. New volumetric datasets have captured high-quality 3D video, but accompanying audio is often neglected, making it hard to test an integrated bimodal experience. Designed to cover diverse sound types and features, the presented vol… ▽ More

    Submitted 3 May, 2021; originally announced May 2021.

    Comments: for dataset visit cvssp.org/data/navvs; accepted as poster in IEEE VR 2021

  12. arXiv:2010.12635  [pdf, other

    cs.LG cs.PF

    Not Half Bad: Exploring Half-Precision in Graph Convolutional Neural Networks

    Authors: John Brennan, Stephen Bonner, Amir Atapour-Abarghouei, Philip T Jackson, Boguslaw Obara, Andrew Stephen McGough

    Abstract: With the growing significance of graphs as an effective representation of data in numerous applications, efficient graph analysis using modern machine learning is receiving a growing level of attention. Deep learning approaches often operate over the entire adjacency matrix -- as the input and intermediate network layers are all designed in proportion to the size of the adjacency matrix -- leading… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  13. arXiv:2007.08574  [pdf, other

    cs.CV

    Camera Bias in a Fine Grained Classification Task

    Authors: Philip T. Jackson, Stephen Bonner, Ning Jia, Christopher Holder, Jon Stonehouse, Boguslaw Obara

    Abstract: We show that correlations between the camera used to acquire an image and the class label of that image can be exploited by convolutional neural networks (CNN), resulting in a model that "cheats" at an image classification task by recognizing which camera took the image and inferring the class label from the camera. We show that models trained on a dataset with camera / label correlations do not g… ▽ More

    Submitted 16 July, 2020; originally announced July 2020.

  14. arXiv:2003.06656  [pdf, other

    eess.AS cs.SD eess.IV

    Audio-Visual Spatial Aligment Requirements of Central and Peripheral Object Events

    Authors: Davide Berghi, Hanne Stenzel, Marco Volino, Adrian Hilton, Philip J. B. Jackson

    Abstract: Immersive audio-visual perception relies on the spatial integration of both auditory and visual information which are heterogeneous sensing modalities with different fields of reception and spatial resolution. This study investigates the perceived coherence of audiovisual object events presented either centrally or peripherally with horizontally aligned/misaligned sound. Various object events were… ▽ More

    Submitted 14 March, 2020; originally announced March 2020.

    Comments: Two-pages poster abstract

    Journal ref: IEEE VR 2020

  15. Modeling the Comb Filter Effect and Interaural Coherence for Binaural Source Separation

    Authors: Luca Remaggi, Philip J. B. Jackson, Wenwu Wang

    Abstract: Typical methods for binaural source separation consider only the direct sound as the target signal in a mixture. However, in most scenarios, this assumption limits the source separation performance. It is well known that the early reflections interact with the direct sound, producing acoustic effects at the listening position, e.g. the so-called comb filter effect. In this article, we propose a no… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: IEEE Copyright. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019

  16. arXiv:1908.08402  [pdf, other

    cs.SI

    Temporal Neighbourhood Aggregation: Predicting Future Links in Temporal Graphs via Recurrent Variational Graph Convolutions

    Authors: Stephen Bonner, Amir Atapour-Abarghouei, Philip T Jackson, John Brennan, Ibad Kureshi, Georgios Theodoropoulos, Andrew Stephen McGough, Boguslaw Obara

    Abstract: Graphs have become a crucial way to represent large, complex and often temporal datasets across a wide range of scientific disciplines. However, when graphs are used as input to machine learning models, this rich temporal information is frequently disregarded during the learning process, resulting in suboptimal performance on certain temporal infernce tasks. To combat this, we introduce Temporal N… ▽ More

    Submitted 21 November, 2019; v1 submitted 21 August, 2019; originally announced August 2019.

    Comments: IEEE International Conference on Big Data 2019

  17. arXiv:1906.08826  [pdf, other

    astro-ph.EP astro-ph.IM cs.CV cs.LG

    Automated crater shape retrieval using weakly-supervised deep learning

    Authors: Mohamad Ali-Dib, Kristen Menou, Alan P. Jackson, Chenchong Zhu, Noah Hammond

    Abstract: Crater ellipticity determination is a complex and time consuming task that so far has evaded successful automation. We train a state of the art computer vision algorithm to identify craters in Lunar digital elevation maps and retrieve their sizes and 2D shapes. The computational backbone of the model is MaskRCNN, an "instance segmentation" general framework that detects craters in an image while s… ▽ More

    Submitted 11 March, 2020; v1 submitted 20 June, 2019; originally announced June 2019.

    Comments: 59 pages, 13 figures, Accepted for publication in Icarus

  18. arXiv:1906.07552  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Single-Channel Signal Separation and Deconvolution with Generative Adversarial Networks

    Authors: Qiuqiang Kong, Yong Xu, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley

    Abstract: Single-channel signal separation and deconvolution aims to separate and deconvolve individual sources from a single-channel mixture and is a challenging problem in which no prior knowledge of the mixing filters is available. Both individual sources and mixing filters need to be estimated. In addition, a mixture may contain non-stationary noise which is unseen in the training set. We propose a synt… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

    Comments: 7 pages. Accepted by IJCAI 2019

    Journal ref: International Joint Conference on Artificial Intelligence (IJCAI), 2019, pp. 2747-2753

  19. arXiv:1903.06516  [pdf, other

    cs.CV

    Phenotypic Profiling of High Throughput Imaging Screens with Generic Deep Convolutional Features

    Authors: Philip T. Jackson, Yinhai Wang, Sinead Knight, Hongming Chen, Thierry Dorval, Martin Brown, Claus Bendtsen, Boguslaw Obara

    Abstract: While deep learning has seen many recent applications to drug discovery, most have focused on predicting activity or toxicity directly from chemical structure. Phenotypic changes exhibited in cellular images are also indications of the mechanism of action (MoA) of chemical compounds. In this paper, we show how pre-trained convolutional image features can be used to assist scientists in discovering… ▽ More

    Submitted 15 March, 2019; originally announced March 2019.

  20. arXiv:1809.05375  [pdf, other

    cs.CV

    Style Augmentation: Data Augmentation via Style Randomization

    Authors: Philip T. Jackson, Amir Atapour-Abarghouei, Stephen Bonner, Toby Breckon, Boguslaw Obara

    Abstract: We introduce style augmentation, a new form of data augmentation based on random style transfer, for improving the robustness of convolutional neural networks (CNN) over both classification and regression based tasks. During training, our style augmentation randomizes texture, contrast and color, while preserving shape and semantic content. This is accomplished by adapting an arbitrary style trans… ▽ More

    Submitted 12 April, 2019; v1 submitted 14 September, 2018; originally announced September 2018.

  21. arXiv:1708.07218  [pdf

    cs.SD

    Object-Based Audio Rendering

    Authors: Philip Jackson, Filippo Fazi, Frank Melchior, Trevor Cox, Adrian Hilton, Chris Pike, Jon Francombe, Andreas Franck, Philip Coleman, Dylan Menzies-Gow, James Woodcock, Yan Tang, Qingju Liu, Rick Hughes, Marcos Simon Galvez, Teo de Campos, Hansung Kim, Hanne Stenzel

    Abstract: Apparatus and methods are disclosed for performing object-based audio rendering on a plurality of audio objects which define a sound scene, each audio object comprising at least one audio signal and associated metadata. The apparatus comprises: a plurality of renderers each capable of rendering one or more of the audio objects to output rendered audio data; and object adapting means for adapting o… ▽ More

    Submitted 23 August, 2017; originally announced August 2017.

    Comments: This is a transcript of GB Patent Application No: GB1609316.3, filed in the UK by the University of Surrey on 23 May 2016. It describes an intelligent system for customising, personalising and perceptually monitoring the rendering of an object-based audio stream for an arbitrary connected system of loudspeakers to optimize the listening experience as the producer intended. 30 pages, 5 figures

  22. arXiv:1705.08262  [pdf, other

    cs.LO

    Verification of a lazy cache coherence protocol against a weak memory model

    Authors: Christopher J. Banks, Marco Elver, Ruth Hoffmann, Susmit Sarkar, Paul Jackson, Vijay Nagarajan

    Abstract: In this paper we verify a modern lazy cache coherence protocol, TSO-CC, against the memory consistency model it was designed for, TSO. We achieve this by first showing a weak simulation relation between TSO-CC (with a fixed number of processors) and a novel finite-state operational model which exhibits the laziness of TSO-CC and satisfies TSO. We then extend this by an existing parameterisation te… ▽ More

    Submitted 18 May, 2017; originally announced May 2017.

    Comments: 10 pages

    ACM Class: B.1.4; B.3.2; B.3.3

  23. Acoustic Reflector Localization: Novel Image Source Reversion and Direct Localization Methods

    Authors: Luca Remaggi, Philip J. B. Jackson, Philip Coleman, Wenwu Wang

    Abstract: Acoustic reflector localization is an important issue in audio signal processing, with direct applications in spatial audio, scene reconstruction, and source separation. Several methods have recently been proposed to estimate the 3D positions of acoustic reflectors given room impulse responses (RIRs). In this article, we categorize these methods as "image-source reversion", which localizes the ima… ▽ More

    Submitted 5 January, 2017; v1 submitted 18 October, 2016; originally announced October 2016.

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 2, pp. 296-309, February 2017

  24. Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging

    Authors: Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, Siddharth Sigtia, Philip J. B. Jackson, Mark D. Plumbley

    Abstract: Environmental audio tagging aims to predict only the presence or absence of certain acoustic events in the interested acoustic scene. In this paper we make contributions to audio tagging in two parts, respectively, acoustic modeling and feature learning. We propose to use a shrinking deep neural network (DNN) framework incorporating unsupervised feature learning to handle the multi-label classific… ▽ More

    Submitted 29 November, 2016; v1 submitted 13 July, 2016; originally announced July 2016.

    Comments: 10 pages, dcase 2016 challenge

    Journal ref: IEEE/ACM Transactions on Audio, Speech and Language Processing 25(6):1230-1241, Jun 2017

  25. arXiv:1606.07695  [pdf, other

    cs.CV cs.AI

    Fully DNN-based Multi-label regression for audio tagging

    Authors: Yong Xu, Qiang Huang, Wenwu Wang, Philip J. B. Jackson, Mark D. Plumbley

    Abstract: Acoustic event detection for content analysis in most cases relies on lots of labeled data. However, manually annotating data is a time-consuming task, which thus makes few annotated resources available so far. Unlike audio event detection, automatic audio tagging, a multi-label acoustic event classification task, only relies on weakly labeled data. This is highly desirable to some practical appli… ▽ More

    Submitted 13 August, 2016; v1 submitted 24 June, 2016; originally announced June 2016.

    Comments: Submitted to DCASE2016 Workshop which is as a satellite event to the 2016 European Signal Processing Conference (EUSIPCO)

  26. arXiv:1407.0325  [pdf

    cs.CY

    A Computational Model of Crowds for Collective Intelligence

    Authors: John Prpic, Piper Jackson, Thai Nguyen

    Abstract: In this work, we present a high-level computational model of IT-mediated crowds for collective intelligence. We introduce the Crowd Capital perspective as an organizational-level model of collective intelligence generation from IT-mediated crowds, and specify a computational system including agents, forms of IT, and organizational knowledge.

    Submitted 30 June, 2014; originally announced July 2014.

    Report number: ci-2014/94