Search | arXiv e-print repository

Memories in the Making: Predicting Video Memorability with Encoding Phase EEG

Authors: Lorin Sweeney, Graham Healy, Alan F. Smeaton

Abstract: In a world of ephemeral moments, our brain diligently sieves through a cascade of experiences, like a skilled gold prospector searching for precious nuggets amidst the river's relentless flow. This study delves into the elusive "moment of memorability" -- a fleeting, yet vital instant where experiences are prioritised for consolidation in our memory. By transforming subjects' encoding phase electr… ▽ More In a world of ephemeral moments, our brain diligently sieves through a cascade of experiences, like a skilled gold prospector searching for precious nuggets amidst the river's relentless flow. This study delves into the elusive "moment of memorability" -- a fleeting, yet vital instant where experiences are prioritised for consolidation in our memory. By transforming subjects' encoding phase electroencephalography (EEG) signals into the visual domain using scaleograms and leveraging deep learning techniques, we investigate the neural signatures that underpin this moment, with the aim of predicting subject-specific recognition of video. Our findings not only support the involvement of theta band (4-8Hz) oscillations over the right temporal lobe in the encoding of declarative memory, but also support the existence of a distinct moment of memorability, akin to the gold nuggets that define our personal river of experiences. △ Less

Submitted 16 August, 2023; originally announced September 2023.

Comments: Content-Based Multimedia Indexing, CBMI, September 20-22, Orleans, France, 2023

arXiv:2309.11891 [pdf, other]

Heart Rate Detection Using an Event Camera

Authors: Aniket Jagtap, RamaKrishna Venkatesh Saripalli, Joe Lemley, Waseem Shariff, Alan F. Smeaton

Abstract: Event cameras, also known as neuromorphic cameras, are an emerging technology that offer advantages over traditional shutter and frame-based cameras, including high temporal resolution, low power consumption, and selective data acquisition. In this study, we propose to harnesses the capabilities of event-based cameras to capture subtle changes in the surface of the skin caused by the pulsatile flo… ▽ More Event cameras, also known as neuromorphic cameras, are an emerging technology that offer advantages over traditional shutter and frame-based cameras, including high temporal resolution, low power consumption, and selective data acquisition. In this study, we propose to harnesses the capabilities of event-based cameras to capture subtle changes in the surface of the skin caused by the pulsatile flow of blood in the wrist region. We investigate whether an event camera could be used for continuous noninvasive monitoring of heart rate (HR). Event camera video data from 25 participants, comprising varying age groups and skin colours, was collected and analysed. Ground-truth HR measurements obtained using conventional methods were used to evaluate of the accuracy of automatic detection of HR from event camera data. Our experimental results and comparison to the performance of other non-contact HR measurement methods demonstrate the feasibility of using event cameras for pulse detection. We also acknowledge the challenges and limitations of our method, such as light-induced flickering and the sub-conscious but naturally-occurring tremors of an individual during data capture. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: Dataset available at https://doi.org/10.6084/m9.figshare.24039501.v1

arXiv:2305.05780 [pdf, other]

Enhancing Gappy Speech Audio Signals with Generative Adversarial Networks

Authors: Deniss Strods, Alan F. Smeaton

Abstract: Gaps, dropouts and short clips of corrupted audio are a common problem and particularly annoying when they occur in speech. This paper uses machine learning to regenerate gaps of up to 320ms in an audio speech signal. Audio regeneration is translated into image regeneration by transforming audio into a Mel-spectrogram and using image in-painting to regenerate the gaps. The full Mel-spectrogram is… ▽ More Gaps, dropouts and short clips of corrupted audio are a common problem and particularly annoying when they occur in speech. This paper uses machine learning to regenerate gaps of up to 320ms in an audio speech signal. Audio regeneration is translated into image regeneration by transforming audio into a Mel-spectrogram and using image in-painting to regenerate the gaps. The full Mel-spectrogram is then transferred back to audio using the Parallel-WaveGAN vocoder and integrated into the audio stream. Using a sample of 1300 spoken audio clips of between 1 and 10 seconds taken from the publicly-available LJSpeech dataset our results show regeneration of audio gaps in close to real time using GANs with a GPU equipped system. As expected, the smaller the gap in the audio, the better the quality of the filled gaps. On a gap of 240ms the average mean opinion score (MOS) for the best performing models was 3.737, on a scale of 1 (worst) to 5 (best) which is sufficient for a human to perceive as close to uninterrupted human speech. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 7 pages, 4 figures, 4 tables. 34th Irish Signals and Systems Conferences, 13-14 June 2023

arXiv:2106.08936 [pdf, other]

doi 10.1109/OJSP.2021.3089439

Improved CNN-based Learning of Interpolation Filters for Low-Complexity Inter Prediction in Video Coding

Authors: Luka Murn, Saverio Blasi, Alan F. Smeaton, Marta Mrak

Abstract: The versatility of recent machine learning approaches makes them ideal for improvement of next generation video compression solutions. Unfortunately, these approaches typically bring significant increases in computational complexity and are difficult to interpret into explainable models, affecting their potential for implementation within practical video coding applications. This paper introduces… ▽ More The versatility of recent machine learning approaches makes them ideal for improvement of next generation video compression solutions. Unfortunately, these approaches typically bring significant increases in computational complexity and are difficult to interpret into explainable models, affecting their potential for implementation within practical video coding applications. This paper introduces a novel explainable neural network-based inter-prediction scheme, to improve the interpolation of reference samples needed for fractional precision motion compensation. The approach requires a single neural network to be trained from which a full quarter-pixel interpolation filter set is derived, as the network is easily interpretable due to its linear structure. A novel training framework enables each network branch to resemble a specific fractional shift. This practical solution makes it very efficient to use alongside conventional video coding schemes. When implemented in the context of the state-of-the-art Versatile Video Coding (VVC) test model, 0.77%, 1.27% and 2.25% BD-rate savings can be achieved on average for lower resolution sequences under the random access, low-delay B and low-delay P configurations, respectively, while the complexity of the learned interpolation schemes is significantly reduced compared to the interpolation with full CNNs. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: IEEE Open Journal of Signal Processing Special Issue on Applied AI and Machine Learning for Video Coding and Streaming, June 2021

arXiv:2102.04993 [pdf, other]

doi 10.1109/JSTSP.2020.3044482

Attention-Based Neural Networks for Chroma Intra Prediction in Video Coding

Authors: Marc Górriz, Saverio Blasi, Alan F. Smeaton, Noel E. O'Connor, Marta Mrak

Abstract: Neural networks can be successfully used to improve several modules of advanced video coding schemes. In particular, compression of colour components was shown to greatly benefit from usage of machine learning models, thanks to the design of appropriate attention-based architectures that allow the prediction to exploit specific samples in the reference region. However, such architectures tend to b… ▽ More Neural networks can be successfully used to improve several modules of advanced video coding schemes. In particular, compression of colour components was shown to greatly benefit from usage of machine learning models, thanks to the design of appropriate attention-based architectures that allow the prediction to exploit specific samples in the reference region. However, such architectures tend to be complex and computationally intense, and may be difficult to deploy in a practical video coding pipeline. This work focuses on reducing the complexity of such methodologies, to design a set of simplified and cost-effective attention-based architectures for chroma intra-prediction. A novel size-agnostic multi-model approach is proposed to reduce the complexity of the inference process. The resulting simplified architecture is still capable of outperforming state-of-the-art methods. Moreover, a collection of simplifications is presented in this paper, to further reduce the complexity overhead of the proposed prediction architecture. Thanks to these simplifications, a reduction in the number of parameters of around 90% is achieved with respect to the original attention-based methodologies. Simplifications include a framework for reducing the overhead of the convolutional operations, a simplified cross-component processing model integrated into the original architecture, and a methodology to perform integer-precision approximations with the aim to obtain fast and hardware-aware implementations. The proposed schemes are integrated into the Versatile Video Coding (VVC) prediction pipeline, retaining compression efficiency of state-of-the-art chroma intra-prediction methods based on neural networks, while offering different directions for significantly reducing coding complexity. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Journal ref: IEEE Journal of Selected Topics in Signal Processing, 2020

arXiv:2010.01947 [pdf, other]

A Comparative Study of Existing and New Deep Learning Methods for Detecting Knee Injuries using the MRNet Dataset

Authors: David Azcona, Kevin McGuinness, Alan F. Smeaton

Abstract: This work presents a comparative study of existing and new techniques to detect knee injuries by leveraging Stanford's MRNet Dataset. All approaches are based on deep learning and we explore the comparative performances of transfer learning and a deep residual network trained from scratch. We also exploit some characteristics of Magnetic Resonance Imaging (MRI) data by, for example, using a fixed… ▽ More This work presents a comparative study of existing and new techniques to detect knee injuries by leveraging Stanford's MRNet Dataset. All approaches are based on deep learning and we explore the comparative performances of transfer learning and a deep residual network trained from scratch. We also exploit some characteristics of Magnetic Resonance Imaging (MRI) data by, for example, using a fixed number of slices or 2D images from each of the axial, coronal and sagittal planes as well as combining the three planes into one multi-plane network. Overall we achieved a performance of 93.4% AUC on the validation data by using the more recent deep learning architectures and data augmentation strategies. More flexible architectures are also proposed that might help with the development and training of models that process MRIs. We found that transfer learning and a carefully tuned data augmentation strategy were the crucial factors in determining best performance. △ Less

Submitted 5 October, 2020; originally announced October 2020.

arXiv:2006.15349 [pdf, other]

Chroma Intra Prediction with attention-based CNN architectures

Authors: Marc Górriz, Saverio Blasi, Alan F. Smeaton, Noel E. O'Connor, Marta Mrak

Abstract: Neural networks can be used in video coding to improve chroma intra-prediction. In particular, usage of fully-connected networks has enabled better cross-component prediction with respect to traditional linear models. Nonetheless, state-of-the-art architectures tend to disregard the location of individual reference samples in the prediction process. This paper proposes a new neural network archite… ▽ More Neural networks can be used in video coding to improve chroma intra-prediction. In particular, usage of fully-connected networks has enabled better cross-component prediction with respect to traditional linear models. Nonetheless, state-of-the-art architectures tend to disregard the location of individual reference samples in the prediction process. This paper proposes a new neural network architecture for cross-component intra-prediction. The network uses a novel attention module to model spatial relations between reference and predicted samples. The proposed approach is integrated into the Versatile Video Coding (VVC) prediction pipeline. Experimental results demonstrate compression gains over the latest VVC anchor compared with state-of-the-art chroma intra-prediction methods based on neural networks. △ Less

Submitted 27 June, 2020; originally announced June 2020.

Comments: 27th IEEE International Conference on Image Processing, 25-28 Oct 2020, Abu Dhabi, United Arab Emirates

arXiv:2006.06392 [pdf, other]

doi 10.1109/ICIP40778.2020.9191193

Interpreting CNN for Low Complexity Learned Sub-pixel Motion Compensation in Video Coding

Authors: Luka Murn, Saverio Blasi, Alan F. Smeaton, Noel E. O'Connor, Marta Mrak

Abstract: Deep learning has shown great potential in image and video compression tasks. However, it brings bit savings at the cost of significant increases in coding complexity, which limits its potential for implementation within practical applications. In this paper, a novel neural network-based tool is presented which improves the interpolation of reference samples needed for fractional precision motion… ▽ More Deep learning has shown great potential in image and video compression tasks. However, it brings bit savings at the cost of significant increases in coding complexity, which limits its potential for implementation within practical applications. In this paper, a novel neural network-based tool is presented which improves the interpolation of reference samples needed for fractional precision motion compensation. Contrary to previous efforts, the proposed approach focuses on complexity reduction achieved by interpreting the interpolation filters learned by the networks. When the approach is implemented in the Versatile Video Coding (VVC) test model, up to 4.5% BD-rate saving for individual sequences is achieved compared with the baseline VVC, while the complexity of learned interpolation is significantly reduced compared to the application of full neural network. △ Less

Submitted 11 June, 2020; originally announced June 2020.

Comments: 27th IEEE International Conference on Image Processing, 25-28 Oct 2020, Abu Dhabi, United Arab Emirates

Journal ref: 2020 IEEE International Conference on Image Processing (ICIP), 2020, pp. 798-802

arXiv:2004.00505 [pdf, other]

Optimised Convolutional Neural Networks for Heart Rate Estimation and Human Activity Recognition in Wrist Worn Sensing Applications

Authors: Eoin Brophy, Willie Muehlhausen, Alan F. Smeaton, Tomas E. Ward

Abstract: Wrist-worn smart devices are providing increased insights into human health, behaviour and performance through sophisticated analytics. However, battery life, device cost and sensor performance in the face of movement-related artefact present challenges which must be further addressed to see effective applications and wider adoption through commoditisation of the technology. We address these chall… ▽ More Wrist-worn smart devices are providing increased insights into human health, behaviour and performance through sophisticated analytics. However, battery life, device cost and sensor performance in the face of movement-related artefact present challenges which must be further addressed to see effective applications and wider adoption through commoditisation of the technology. We address these challenges by demonstrating, through using a simple optical measurement, photoplethysmography (PPG) used conventionally for heart rate detection in wrist-worn sensors, that we can provide improved heart rate and human activity recognition (HAR) simultaneously at low sample rates, without an inertial measurement unit. This simplifies hardware design and reduces costs and power budgets. We apply two deep learning pipelines, one for human activity recognition and one for heart rate estimation. HAR is achieved through the application of a visual classification approach, capable of robust performance at low sample rates. Here, transfer learning is leveraged to retrain a convolutional neural network (CNN) to distinguish characteristics of the PPG during different human activities. For heart rate estimation we use a CNN adopted for regression which maps noisy optical signals to heart rate estimates. In both cases, comparisons are made with leading conventional approaches. Our results demonstrate a low sampling frequency can achieve good performance without significant degradation of accuracy. 5 Hz and 10 Hz were shown to have 80.2% and 83.0% classification accuracy for HAR respectively. These same sampling frequencies also yielded a robust heart rate estimation which was comparative with that achieved at the more energy-intensive rate of 256 Hz. △ Less

Submitted 30 March, 2020; originally announced April 2020.

arXiv:2003.03193 [pdf, other]

A Neuro-AI Interface for Evaluating Generative Adversarial Networks

Authors: Zhengwei Wang, Qi She, Alan F. Smeaton, Tomas E. Ward, Graham Healy

Abstract: Generative adversarial networks (GANs) are increasingly attracting attention in the computer vision, natural language processing, speech synthesis and similar domains. However, evaluating the performance of GANs is still an open and challenging problem. Existing evaluation metrics primarily measure the dissimilarity between real and generated images using automated statistical methods. They often… ▽ More Generative adversarial networks (GANs) are increasingly attracting attention in the computer vision, natural language processing, speech synthesis and similar domains. However, evaluating the performance of GANs is still an open and challenging problem. Existing evaluation metrics primarily measure the dissimilarity between real and generated images using automated statistical methods. They often require large sample sizes for evaluation and do not directly reflect human perception of image quality. In this work, we introduce an evaluation metric called Neuroscore, for evaluating the performance of GANs, that more directly reflects psychoperceptual image quality through the utilization of brain signals. Our results show that Neuroscore has superior performance to the current evaluation metrics in that: (1) It is more consistent with human judgment; (2) The evaluation process needs much smaller numbers of samples; and (3) It is able to rank the quality of images on a per GAN basis. A convolutional neural network (CNN) based neuro-AI interface is proposed to predict Neuroscore from GAN-generated images directly without the need for neural responses. Importantly, we show that including neural responses during the training phase of the network can significantly improve the prediction capability of the proposed model. Codes and data can be referred at this link: https://github.com/villawang/Neuro-AI-Interface. △ Less

Submitted 6 April, 2020; v1 submitted 5 March, 2020; originally announced March 2020.

Comments: Accepted by ICLR 2020 Workshop Bridging AI and Cognitive Science (BAICS). arXiv admin note: substantial text overlap with arXiv:1905.04243

arXiv:1908.09873 [pdf, other]

End-to-End Conditional GAN-based Architectures for Image Colourisation

Authors: Marc Górriz, Marta Mrak, Alan F. Smeaton, Noel E. O'Connor

Abstract: In this work recent advances in conditional adversarial networks are investigated to develop an end-to-end architecture based on Convolutional Neural Networks (CNNs) to directly map realistic colours to an input greyscale image. Observing that existing colourisation methods sometimes exhibit a lack of colourfulness, this paper proposes a method to improve colourisation results. In particular, the… ▽ More In this work recent advances in conditional adversarial networks are investigated to develop an end-to-end architecture based on Convolutional Neural Networks (CNNs) to directly map realistic colours to an input greyscale image. Observing that existing colourisation methods sometimes exhibit a lack of colourfulness, this paper proposes a method to improve colourisation results. In particular, the method uses Generative Adversarial Neural Networks (GANs) and focuses on improvement of training stability to enable better generalisation in large multi-class image datasets. Additionally, the integration of instance and batch normalisation layers in both generator and discriminator is introduced to the popular U-Net architecture, boosting the network capabilities to generalise the style changes of the content. The method has been tested using the ILSVRC 2012 dataset, achieving improved automatic colourisation results compared to other methods based on GANs. △ Less

Submitted 5 September, 2019; v1 submitted 26 August, 2019; originally announced August 2019.

Comments: IEEE 21st International Workshop on Multimedia Signal Processing, 27-29 Sept 2019, Kuala Lumpur, Malaysia

arXiv:1905.05025 [pdf]

doi 10.1145/3365245.3365246

Tracking Human Behavioural Consistency by Analysing Periodicity of Household Water Consumption

Authors: Seán Quinn, Noel Murphy, Alan F. Smeaton

Abstract: People are living longer than ever due to advances in healthcare, and this has prompted many healthcare providers to look towards remote patient care as a means to meet the needs of the future. It is now a priority to enable people to reside in their own homes rather than in overburdened facilities whenever possible. The increasing maturity of IoT technologies and the falling costs of connected se… ▽ More People are living longer than ever due to advances in healthcare, and this has prompted many healthcare providers to look towards remote patient care as a means to meet the needs of the future. It is now a priority to enable people to reside in their own homes rather than in overburdened facilities whenever possible. The increasing maturity of IoT technologies and the falling costs of connected sensors has made the deployment of remote healthcare at scale an increasingly attractive prospect. In this work we demonstrate that we can measure the consistency and regularity of the behaviour of a household using sensor readings generated from interaction with the home environment. We show that we can track changes in this behaviour regularity longitudinally and detect changes that may be related to significant life events or trends that may be medically significant. We achieve this using periodicity analysis on water usage readings sampled from the main household water meter every 15 minutes for over 8 months. We utilise an IoT Application Enablement Platform in conjunction with low cost LoRa-enabled sensors and a Low Power Wide Area Network in order to validate a data collection methodology that could be deployed at large scale in future. We envision the statistical methods described here being applied to data streams from the homes of elderly and at-risk groups, both as a means of early illness detection and for monitoring the well-being of those with known illnesses. △ Less

Submitted 16 October, 2019; v1 submitted 13 May, 2019; originally announced May 2019.

Comments: 2019 2nd International Conference on Sensors, Signal and Image Processing

arXiv:1905.04243 [pdf, other]

Synthetic-Neuroscore: Using A Neuro-AI Interface for Evaluating Generative Adversarial Networks

Authors: Zhengwei Wang, Qi She, Alan F. Smeaton, Tomas E. Ward, Graham Healy

Abstract: Generative adversarial networks (GANs) are increasingly attracting attention in the computer vision, natural language processing, speech synthesis and similar domains. Arguably the most striking results have been in the area of image synthesis. However, evaluating the performance of GANs is still an open and challenging problem. Existing evaluation metrics primarily measure the dissimilarity betwe… ▽ More Generative adversarial networks (GANs) are increasingly attracting attention in the computer vision, natural language processing, speech synthesis and similar domains. Arguably the most striking results have been in the area of image synthesis. However, evaluating the performance of GANs is still an open and challenging problem. Existing evaluation metrics primarily measure the dissimilarity between real and generated images using automated statistical methods. They often require large sample sizes for evaluation and do not directly reflect human perception of image quality. In this work, we describe an evaluation metric we call Neuroscore, for evaluating the performance of GANs, that more directly reflects psychoperceptual image quality through the utilization of brain signals. Our results show that Neuroscore has superior performance to the current evaluation metrics in that: (1) It is more consistent with human judgment; (2) The evaluation process needs much smaller numbers of samples; and (3) It is able to rank the quality of images on a per GAN basis. A convolutional neural network (CNN) based neuro-AI interface is proposed to predict Neuroscore from GAN-generated images directly without the need for neural responses. Importantly, we show that including neural responses during the training phase of the network can significantly improve the prediction capability of the proposed model. Materials related to this work are provided at https://github.com/villawang/Neuro-AI-Interface. △ Less

Submitted 2 February, 2020; v1 submitted 10 May, 2019; originally announced May 2019.

arXiv:1901.04618 [pdf, other]

doi 10.1080/2326263X.2019.1568821

Spatial Filtering Pipeline Evaluation of Cortically Coupled Computer Vision System for Rapid Serial Visual Presentation

Authors: Zhengwei Wang, Graham Healy, Alan F. Smeaton, Tomas E. Ward

Abstract: Rapid Serial Visual Presentation (RSVP) is a paradigm that supports the application of cortically coupled computer vision to rapid image search. In RSVP, images are presented to participants in a rapid serial sequence which can evoke Event-related Potentials (ERPs) detectable in their Electroencephalogram (EEG). The contemporary approach to this problem involves supervised spatial filtering techni… ▽ More Rapid Serial Visual Presentation (RSVP) is a paradigm that supports the application of cortically coupled computer vision to rapid image search. In RSVP, images are presented to participants in a rapid serial sequence which can evoke Event-related Potentials (ERPs) detectable in their Electroencephalogram (EEG). The contemporary approach to this problem involves supervised spatial filtering techniques which are applied for the purposes of enhancing the discriminative information in the EEG data. In this paper we make two primary contributions to that field: 1) We propose a novel spatial filtering method which we call the Multiple Time Window LDA Beamformer (MTWLB) method; 2) we provide a comprehensive comparison of nine spatial filtering pipelines using three spatial filtering schemes namely, MTWLB, xDAWN, Common Spatial Pattern (CSP) and three linear classification methods Linear Discriminant Analysis (LDA), Bayesian Linear Regression (BLR) and Logistic Regression (LR). Three pipelines without spatial filtering are used as baseline comparison. The Area Under Curve (AUC) is used as an evaluation metric in this paper. The results reveal that MTWLB and xDAWN spatial filtering techniques enhance the classification performance of the pipeline but CSP does not. The results also support the conclusion that LR can be effective for RSVP based BCI if discriminative features are available. △ Less

Submitted 14 January, 2019; originally announced January 2019.

Showing 1–14 of 14 results for author: Smeaton, A F