Search | arXiv e-print repository

arXiv:2303.00795 [pdf, other]

Improved Segmentation of Deep Sulci in Cortical Gray Matter Using a Deep Learning Framework Incorporating Laplace's Equation

Authors: Sadhana Ravikumar, Ranjit Ittyerah, Sydney Lim, Long Xie, Sandhitsu Das, Pulkit Khandelwal, Laura E. M. Wisse, Madigan L. Bedard, John L. Robinson, Terry Schuck, Murray Grossman, John Q. Trojanowski, Edward B. Lee, M. Dylan Tisdall, Karthik Prabhakaran, John A. Detre, David J. Irwin, Winifred Trotman, Gabor Mizsei, Emilio Artacho-Pérula, Maria Mercedes Iñiguez de Onzono Martin, Maria del Mar Arroyo Jiménez, Monica Muñoz, Francisco Javier Molina Romero, Maria del Pilar Marcos Rabal , et al. (7 additional authors not shown)

Abstract: When develo** tools for automated cortical segmentation, the ability to produce topologically correct segmentations is important in order to compute geometrically valid morphometry measures. In practice, accurate cortical segmentation is challenged by image artifacts and the highly convoluted anatomy of the cortex itself. To address this, we propose a novel deep learning-based cortical segmentat… ▽ More When develo** tools for automated cortical segmentation, the ability to produce topologically correct segmentations is important in order to compute geometrically valid morphometry measures. In practice, accurate cortical segmentation is challenged by image artifacts and the highly convoluted anatomy of the cortex itself. To address this, we propose a novel deep learning-based cortical segmentation method in which prior knowledge about the geometry of the cortex is incorporated into the network during the training process. We design a loss function which uses the theory of Laplace's equation applied to the cortex to locally penalize unresolved boundaries between tightly folded sulci. Using an ex vivo MRI dataset of human medial temporal lobe specimens, we demonstrate that our approach outperforms baseline segmentation networks, both quantitatively and qualitatively. △ Less

Submitted 3 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: Accepted at the 28th biennial international conference on Information Processing in Medical Imaging (IPMI 2023)

arXiv:2301.02130 [pdf]

A deep learning approach to using wearable seismocardiography (SCG) for diagnosing aortic valve stenosis and predicting aortic hemodynamics obtained by 4D flow MRI

Authors: Mahmoud E. Khani, Ethan M. I. Johnson, Aparna Sodhi, Joshua Robinson, Cynthia K. Rigsby, Bradly D. Allen, Michael Markl

Abstract: In this paper, we explored the use of deep learning for the prediction of aortic flow metrics obtained using 4D flow MRI using wearable seismocardiography (SCG) devices. 4D flow MRI provides a comprehensive assessment of cardiovascular hemodynamics, but it is costly and time-consuming. We hypothesized that deep learning could be used to identify pathological changes in blood flow, such as elevated… ▽ More In this paper, we explored the use of deep learning for the prediction of aortic flow metrics obtained using 4D flow MRI using wearable seismocardiography (SCG) devices. 4D flow MRI provides a comprehensive assessment of cardiovascular hemodynamics, but it is costly and time-consuming. We hypothesized that deep learning could be used to identify pathological changes in blood flow, such as elevated peak systolic velocity Vmax in patients with heart valve diseases, from SCG signals. We also investigated the ability of this deep learning technique to differentiate between patients diagnosed with aortic valve stenosis (AS), non-AS patients with a bicuspid aortic valve (BAV), non-AS patients with a mechanical aortic valve (MAV), and healthy subjects with a normal tricuspid aortic valve (TAV). In a study of 77 subjects who underwent same-day 4D flow MRI and SCG, we found that the Vmax values obtained using deep learning and SCGs were in good agreement with those obtained by 4D flow MRI. Additionally, subjects with TAV, BAV, MAV, and AS could be classified with ROC-AUC values of 92%, 95%, 81%, and 83%, respectively. This suggests that SCG obtained using low-cost wearable electronics may be used as a supplement to 4D flow MRI exams or as a screening tool for aortic valve disease. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Comments: 16 pages, 4 figures

arXiv:2112.15552 [pdf, other]

doi 10.1109/JSSC.2021.3129993

Magnetoelectric Bio-Implants Powered and Programmed by a Single Transmitter for Coordinated Multisite Stimulation

Authors: Zhanghao Yu, Joshua C. Chen, Yan He, Fatima T. Alrashdan, Benjamin W. Avants, Amanda Singer, Jacob T. Robinson, Kaiyuan Yang

Abstract: This article presents a hardware platform including stimulating implants wirelessly powered and controlled by a shared transmitter (TX) for coordinated leadless multisite stimulation. The adopted novel single-TX, multiple-implant structure can flexibly deploy stimuli, improve system efficiency, easily scale stimulating channel quantity, and relieve efforts in device synchronization. In the propose… ▽ More This article presents a hardware platform including stimulating implants wirelessly powered and controlled by a shared transmitter (TX) for coordinated leadless multisite stimulation. The adopted novel single-TX, multiple-implant structure can flexibly deploy stimuli, improve system efficiency, easily scale stimulating channel quantity, and relieve efforts in device synchronization. In the proposed system, a wireless link leveraging magnetoelectric (ME) effect is co-designed with a robust and efficient system-on-chip (SoC) to enable reliable operation and individual programming of every implant. Each implant integrates a 0.8-mm2 chip, a 6-mm2 ME film, and an energy storage capacitor within a 6.2-mm3 size. ME power transfer is capable of safely transmitting milliwatt power to devices placed several centimeters away from the TX coil, maintaining good efficiency with size constraints, and tolerating 60 degree, 1.5-cm misalignment in angular and lateral movement. The SoC robustly operates with 2-V source amplitude variations that spans a 40-mm TX-implant distance change, realizes individual addressability through physical unclonable function (PUF) IDs, and achieves 90% efficiency for 1.5-3.5-V stimulation with fully programmable stimulation parameters. △ Less

Submitted 31 December, 2021; originally announced December 2021.

Comments: This paper has been published in IEEE Journal of Solid-State Circuits, 2021

Journal ref: IEEE Journal of Solid-State Circuits, 2021

arXiv:2110.07711 [pdf, other]

Gray Matter Segmentation in Ultra High Resolution 7 Tesla ex vivo T2w MRI of Human Brain Hemispheres

Authors: Pulkit Khandelwal, Shokufeh Sadaghiani, Michael Tran Duong, Sadhana Ravikumar, Sydney Lim, Sanaz Arezoumandan, Claire Peterson, Eunice Chung, Madigan Bedard, Noah Capp, Ranjit Ittyerah, Elyse Migdal, Grace Choi, Emily Kopp, Bridget Loja, Eusha Hasan, Jiacheng Li, Karthik Prabhakaran, Gabor Mizsei, Marianna Gabrielyan, Theresa Schuck, John Robinson, Daniel Ohm, Edward Lee, John Q. Trojanowski , et al. (8 additional authors not shown)

Abstract: Ex vivo MRI of the brain provides remarkable advantages over in vivo MRI for visualizing and characterizing detailed neuroanatomy. However, automated cortical segmentation methods in ex vivo MRI are not well developed, primarily due to limited availability of labeled datasets, and heterogeneity in scanner hardware and acquisition protocols. In this work, we present a high resolution 7 Tesla datase… ▽ More Ex vivo MRI of the brain provides remarkable advantages over in vivo MRI for visualizing and characterizing detailed neuroanatomy. However, automated cortical segmentation methods in ex vivo MRI are not well developed, primarily due to limited availability of labeled datasets, and heterogeneity in scanner hardware and acquisition protocols. In this work, we present a high resolution 7 Tesla dataset of 32 ex vivo human brain specimens. We benchmark the cortical mantle segmentation performance of nine neural network architectures, trained and evaluated using manually-segmented 3D patches sampled from specific cortical regions, and show excellent generalizing capabilities across whole brain hemispheres in different specimens, and also on unseen images acquired at different magnetic field strength and imaging sequences. Finally, we provide cortical thickness measurements across key regions in 3D ex vivo human brain images. Our code and processed datasets are publicly available at https://github.com/Pulkit-Khandelwal/picsl-ex-vivo-segmentation. △ Less

Submitted 3 March, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: Ex vivo analysis framework (work in progress 2022 at the University of Pennsylvania)

arXiv:2107.02995 [pdf, other]

doi 10.1109/TBCAS.2020.3037862

MagNI: A Magnetoelectrically Powered and Controlled Wireless Neurostimulating Implant

Authors: Zhanghao Yu, Joshua C. Chen, Fatima T. Alrashdan, Benjamin W. Avants, Yan He, Amanda Singer, Jacob T. Robinson, Kaiyuan Yang

Abstract: This paper presents the first wireless and programmable neural stimulator leveraging magnetoelectric (ME) effects for power and data transfer. Thanks to low tissue absorption, low misalignment sensitivity and high power transfer efficiency, the ME effect enables safe delivery of high power levels (a few milliwatts) at low resonant frequencies (~250 kHz) to mm-sized implants deep inside the body (3… ▽ More This paper presents the first wireless and programmable neural stimulator leveraging magnetoelectric (ME) effects for power and data transfer. Thanks to low tissue absorption, low misalignment sensitivity and high power transfer efficiency, the ME effect enables safe delivery of high power levels (a few milliwatts) at low resonant frequencies (~250 kHz) to mm-sized implants deep inside the body (30-mm depth). The presented MagNI (Magnetoelectric Neural Implant) consists of a 1.5-mm$^2$ 180-nm CMOS chip, an in-house built 4x2 mm ME film, an energy storage capacitor, and on-board electrodes on a flexible polyimide substrate with a total volume of 8.2 mm$^3$ . The chip with a power consumption of 23.7 $μ$W includes robust system control and data recovery mechanisms under source amplitude variations (1-V variation tolerance). The system delivers fully-programmable bi-phasic current-controlled stimulation with patterns covering 0.05-to-1.5-mA amplitude, 64-to-512-$μ$s pulse width, and 0-to-200Hz repetition frequency for neurostimulation. △ Less

Submitted 6 July, 2021; originally announced July 2021.

Comments: This work has been accepted to 2020 IEEE Transactions on Biomedical Circuits and Systems (TBioCAS)

Journal ref: IEEE Transactions on Biomedical Circuits and Systems (TBioCAS), Volume: 14, Issue: 6, Pages: 1241-1252, Dec. 2020

arXiv:2105.14409 [pdf, other]

A Matrix Autoencoder Framework to Align the Functional and Structural Connectivity Manifolds as Guided by Behavioral Phenotypes

Authors: Niharika Shimona D'Souza, Mary Beth Nebel, Deana Crocetti, Nicholas Wymbs, Joshua Robinson, Stewart Mostofsky, Archana Venkataraman

Abstract: We propose a novel matrix autoencoder to map functional connectomes from resting state fMRI (rs-fMRI) to structural connectomes from Diffusion Tensor Imaging (DTI), as guided by subject-level phenotypic measures. Our specialized autoencoder infers a low dimensional manifold embedding for the rs-fMRI correlation matrices that mimics a canonical outer-product decomposition. The embedding is simultan… ▽ More We propose a novel matrix autoencoder to map functional connectomes from resting state fMRI (rs-fMRI) to structural connectomes from Diffusion Tensor Imaging (DTI), as guided by subject-level phenotypic measures. Our specialized autoencoder infers a low dimensional manifold embedding for the rs-fMRI correlation matrices that mimics a canonical outer-product decomposition. The embedding is simultaneously used to reconstruct DTI tractography matrices via a second manifold alignment decoder and to predict inter-subject phenotypic variability via an artificial neural network. We validate our framework on a dataset of 275 healthy individuals from the Human Connectome Project database and on a second clinical dataset consisting of 57 subjects with Autism Spectrum Disorder. We demonstrate that the model reliably recovers structural connectivity patterns across individuals, while robustly extracting predictive and interpretable brain biomarkers in a cross-validated setting. Finally, our framework outperforms several baselines at predicting behavioral phenotypes in both real-world datasets. △ Less

Submitted 9 July, 2021; v1 submitted 29 May, 2021; originally announced May 2021.

arXiv:2101.11680 [pdf, other]

doi 10.1109/TPAMI.2021.3075366

High Resolution, Deep Imaging Using Confocal Time-of-flight Diffuse Optical Tomography

Authors: Yongyi Zhao, Ankit Raghuram, Hyun K. Kim, Andreas H. Hielscher, Jacob T. Robinson, Ashok Veeraraghavan

Abstract: Light scattering by tissue severely limits how deep beneath the surface one can image, and the spatial resolution one can obtain from these images. Diffuse optical tomography (DOT) is one of the most powerful techniques for imaging deep within tissue -- well beyond the conventional $\sim$10-15 mean scattering lengths tolerated by ballistic imaging techniques such as confocal and two-photon microsc… ▽ More Light scattering by tissue severely limits how deep beneath the surface one can image, and the spatial resolution one can obtain from these images. Diffuse optical tomography (DOT) is one of the most powerful techniques for imaging deep within tissue -- well beyond the conventional $\sim$10-15 mean scattering lengths tolerated by ballistic imaging techniques such as confocal and two-photon microscopy. Unfortunately, existing DOT systems are limited, achieving only centimeter-scale resolution. Furthermore, they suffer from slow acquisition times and slow reconstruction speeds making real-time imaging infeasible. We show that time-of-flight diffuse optical tomography (ToF-DOT) and its confocal variant (CToF-DOT), by exploiting the photon travel time information, allow us to achieve millimeter spatial resolution in the highly scattered diffusion regime ($> 50 $ mean free paths). In addition, we demonstrate two additional innovations: focusing on confocal measurements, and multiplexing the illumination sources allow us to significantly reduce the measurement acquisition time. Finally, we rely on a novel convolutional approximation that allows us to develop a fast reconstruction algorithm, achieving a 100$\times$ speedup in reconstruction time compared to traditional DOT reconstruction techniques. Together, we believe that these technical advances serve as the first step towards real-time, millimeter resolution, deep tissue imaging using DOT. △ Less

Submitted 27 May, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

Comments: The updated version includes edits made to our paper in response to suggestions from reviewers. These changes include: updated 3D image reconstruction results, additional comments on prior work, and further explanations of the linear model. In addition, we made a correction to figure 9, relabeling the x-axis to the correct scale. Finally, we also updated our acknowledgements

arXiv:2008.12410 [pdf, other]

Deep sr-DDL: Deep Structurally Regularized Dynamic Dictionary Learning to Integrate Multimodal and Dynamic Functional Connectomics data for Multidimensional Clinical Characterizations

Authors: Niharika Shimona D'Souza, Mary Beth Nebel, Deana Crocetti, Nicholas Wymbs, Joshua Robinson, Stewart H. Mostofsky, Archana Venkataraman

Abstract: We propose a novel integrated framework that jointly models complementary information from resting-state functional MRI (rs-fMRI) connectivity and diffusion tensor imaging (DTI) tractography to extract biomarkers of brain connectivity predictive of behavior. Our framework couples a generative model of the connectomics data with a deep network that predicts behavioral scores. The generative compone… ▽ More We propose a novel integrated framework that jointly models complementary information from resting-state functional MRI (rs-fMRI) connectivity and diffusion tensor imaging (DTI) tractography to extract biomarkers of brain connectivity predictive of behavior. Our framework couples a generative model of the connectomics data with a deep network that predicts behavioral scores. The generative component is a structurally-regularized Dynamic Dictionary Learning (sr-DDL) model that decomposes the dynamic rs-fMRI correlation matrices into a collection of shared basis networks and time varying subject-specific loadings. We use the DTI tractography to regularize this matrix factorization and learn anatomically informed functional connectivity profiles. The deep component of our framework is an LSTM-ANN block, which uses the temporal evolution of the subject-specific sr-DDL loadings to predict multidimensional clinical characterizations. Our joint optimization strategy collectively estimates the basis networks, the subject-specific time-varying loadings, and the neural network weights. We validate our framework on a dataset of neurotypical individuals from the Human Connectome Project (HCP) database to map to cognition and on a separate multi-score prediction task on individuals diagnosed with Autism Spectrum Disorder (ASD) in a five-fold cross validation setting. Our hybrid model outperforms several state-of-the-art approaches at clinical outcome prediction and learns interpretable multimodal neural signatures of brain organization. △ Less

Submitted 27 August, 2020; originally announced August 2020.

arXiv:2007.01931 [pdf, other]

A Deep-Generative Hybrid Model to Integrate Multimodal and Dynamic Connectivity for Predicting Spectrum-Level Deficits in Autism

Authors: Niharika Shimona D'Souza, Mary Beth Nebel, Deana Crocetti, Nicholas Wymbs, Joshua Robinson, Stewart Mostofsky, Archana Venkataraman

Abstract: We propose an integrated deep-generative framework, that jointly models complementary information from resting-state functional MRI (rs-fMRI) connectivity and diffusion tensor imaging (DTI) tractography to extract predictive biomarkers of a disease. The generative part of our framework is a structurally-regularized Dynamic Dictionary Learning (sr-DDL) model that decomposes the dynamic rs-fMRI corr… ▽ More We propose an integrated deep-generative framework, that jointly models complementary information from resting-state functional MRI (rs-fMRI) connectivity and diffusion tensor imaging (DTI) tractography to extract predictive biomarkers of a disease. The generative part of our framework is a structurally-regularized Dynamic Dictionary Learning (sr-DDL) model that decomposes the dynamic rs-fMRI correlation matrices into a collection of shared basis networks and time varying patient-specific loadings. This matrix factorization is guided by the DTI tractography matrices to learn anatomically informed connectivity profiles. The deep part of our framework is an LSTM-ANN block, which models the temporal evolution of the patient sr-DDL loadings to predict multidimensional clinical severity. Our coupled optimization procedure collectively estimates the basis networks, the patient-specific dynamic loadings, and the neural network weights. We validate our framework on a multi-score prediction task in 57 patients diagnosed with Autism Spectrum Disorder (ASD). Our hybrid model outperforms state-of-the-art baselines in a five-fold cross validated setting and extracts interpretable multimodal neural signatures of brain dysfunction in ASD. △ Less

Submitted 3 July, 2020; originally announced July 2020.

arXiv:1805.10004 [pdf]

doi 10.1007/978-3-319-71078-5_2

Masked Conditional Neural Networks for Environmental Sound Classification

Authors: Fady Medhat, David Chesmore, John Robinson

Abstract: The ConditionaL Neural Network (CLNN) exploits the nature of the temporal sequencing of the sound signal represented in a spectrogram, and its variant the Masked ConditionaL Neural Network (MCLNN) induces the network to learn in frequency bands by embedding a filterbank-like sparseness over the network's links using a binary mask. Additionally, the masking automates the exploration of different fe… ▽ More The ConditionaL Neural Network (CLNN) exploits the nature of the temporal sequencing of the sound signal represented in a spectrogram, and its variant the Masked ConditionaL Neural Network (MCLNN) induces the network to learn in frequency bands by embedding a filterbank-like sparseness over the network's links using a binary mask. Additionally, the masking automates the exploration of different feature combinations concurrently analogous to handcrafting the optimum combination of features for a recognition task. We have evaluated the MCLNN performance using the Urbansound8k dataset of environmental sounds. Additionally, we present a collection of manually recorded sounds for rail and road traffic, YorNoise, to investigate the confusion rates among machine generated sounds possessing low-frequency components. MCLNN has achieved competitive results without augmentation and using 12% of the trainable parameters utilized by an equivalent model based on state-of-the-art Convolutional Neural Networks on the Urbansound8k. We extended the Urbansound8k dataset with YorNoise, where experiments have shown that common tonal properties affect the classification performance. △ Less

Submitted 27 April, 2019; v1 submitted 25 May, 2018; originally announced May 2018.

Comments: Conditional Neural Networks, CLNN, Masked Conditional Neural Networks, MCLNN, Restricted Boltzmann Machine, RBM, Conditional Restricted Boltz-mann Machine, CRBM, Deep Belief Nets, Environmental Sound Recognition, ESR, YorNoise

Journal ref: Artificial Intelligence XXXIV. SGAI 2017

arXiv:1804.02665 [pdf]

doi 10.1007/978-3-319-69179-4_26

Environmental Sound Recognition using Masked Conditional Neural Networks

Authors: Fady Medhat, David Chesmore, John Robinson

Abstract: Neural network based architectures used for sound recognition are usually adapted from other application domains, which may not harness sound related properties. The ConditionaL Neural Network (CLNN) is designed to consider the relational properties across frames in a temporal signal, and its extension the Masked ConditionaL Neural Network (MCLNN) embeds a filterbank behavior within the network, w… ▽ More Neural network based architectures used for sound recognition are usually adapted from other application domains, which may not harness sound related properties. The ConditionaL Neural Network (CLNN) is designed to consider the relational properties across frames in a temporal signal, and its extension the Masked ConditionaL Neural Network (MCLNN) embeds a filterbank behavior within the network, which enforces the network to learn in frequency bands rather than bins. Additionally, it automates the exploration of different feature combinations analogous to handcrafting the optimum combination of features for a recognition task. We applied the MCLNN to the environmental sounds of the ESC-10 dataset. The MCLNN achieved competitive accuracies compared to state-of-the-art convolutional neural networks and hand-crafted attempts. △ Less

Submitted 11 April, 2019; v1 submitted 8 April, 2018; originally announced April 2018.

Comments: Boltzmann Machine, RBM, Conditional RBM, CRBM, Deep Neural Network, DNN, Conditional Neural Network, CLNN, Masked Conditional Neural Net-work, MCLNN, Environmental Sound Recognition, ESR, Advanced Data Mining and Applications (ADMA) Year: 2017

arXiv:1803.02421 [pdf, other]

doi 10.1007/978-3-319-68612-7_40

Masked Conditional Neural Networks for Audio Classification

Authors: Fady Medhat, David Chesmore, John Robinson

Abstract: We present the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) designed for temporal signal recognition. The CLNN takes into consideration the temporal nature of the sound signal and the MCLNN extends upon the CLNN through a binary mask to preserve the spatial locality of the features and allows an automated exploration of the features combination analogous to h… ▽ More We present the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) designed for temporal signal recognition. The CLNN takes into consideration the temporal nature of the sound signal and the MCLNN extends upon the CLNN through a binary mask to preserve the spatial locality of the features and allows an automated exploration of the features combination analogous to hand-crafting the most relevant features for the recognition task. MCLNN has achieved competitive recognition accuracies on the GTZAN and the ISMIR2004 music datasets that surpass several state-of-the-art neural network based architectures and hand-crafted methods applied on both datasets. △ Less

Submitted 23 March, 2019; v1 submitted 6 March, 2018; originally announced March 2018.

Comments: Restricted BoltzmannMachine, RBM, Conditional Restricted Boltzmann Machine, CRBM, Music Information Retrieval, MIR, Conditional Neural Network, CLNN, Masked Conditional Neural Network, MCLNN, Deep Neural Network

Journal ref: International Conference on Artificial Neural Networks (ICANN) Year: 2017

arXiv:1802.05792 [pdf]

doi 10.1109/DSAA.2017.43

Masked Conditional Neural Networks for Automatic Sound Events Recognition

Authors: Fady Medhat, David Chesmore, John Robinson

Abstract: Deep neural network architectures designed for application domains other than sound, especially image recognition, may not optimally harness the time-frequency representation when adapted to the sound recognition problem. In this work, we explore the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) for multi-dimensional temporal signal recognition. The CLNN consi… ▽ More Deep neural network architectures designed for application domains other than sound, especially image recognition, may not optimally harness the time-frequency representation when adapted to the sound recognition problem. In this work, we explore the ConditionaL Neural Network (CLNN) and the Masked ConditionaL Neural Network (MCLNN) for multi-dimensional temporal signal recognition. The CLNN considers the inter-frame relationship, and the MCLNN enforces a systematic sparseness over the network's links to enable learning in frequency bands rather than bins allowing the network to be frequency shift invariant mimicking a filterbank. The mask also allows considering several combinations of features concurrently, which is usually handcrafted through exhaustive manual search. We applied the MCLNN to the environmental sound recognition problem using the ESC-10 and ESC-50 datasets. MCLNN achieved competitive performance, using 12% of the parameters and without augmentation, compared to state-of-the-art Convolutional Neural Networks. △ Less

Submitted 28 April, 2019; v1 submitted 15 February, 2018; originally announced February 2018.

Comments: Restricted Boltzmann Machine, RBM, Conditional RBM, CRBM, Deep Belief Net, DBN, Conditional Neural Network, CLNN, Masked Conditional Neural Network, MCLNN, Environmental Sound Recognition, ESR

Journal ref: IEEE International Conference on Data Science and Advanced Analytics (DSAA) Year: 2017, Pages: 389 - 394

arXiv:1802.02617 [pdf]

doi 10.1109/ICMLA.2017.0-158

Recognition of Acoustic Events Using Masked Conditional Neural Networks

Authors: Fady Medhat, David Chesmore, John Robinson

Abstract: Automatic feature extraction using neural networks has accomplished remarkable success for images, but for sound recognition, these models are usually modified to fit the nature of the multi-dimensional temporal representation of the audio signal in spectrograms. This may not efficiently harness the time-frequency representation of the signal. The ConditionaL Neural Network (CLNN) takes into consi… ▽ More Automatic feature extraction using neural networks has accomplished remarkable success for images, but for sound recognition, these models are usually modified to fit the nature of the multi-dimensional temporal representation of the audio signal in spectrograms. This may not efficiently harness the time-frequency representation of the signal. The ConditionaL Neural Network (CLNN) takes into consideration the interrelation between the temporal frames, and the Masked ConditionaL Neural Network (MCLNN) extends upon the CLNN by forcing a systematic sparseness over the network's weights using a binary mask. The masking allows the network to learn about frequency bands rather than bins, mimicking a filterbank used in signal transformations such as MFCC. Additionally, the Mask is designed to consider various combinations of features, which automates the feature hand-crafting process. We applied the MCLNN for the Environmental Sound Recognition problem using the Urbansound8k, YorNoise, ESC-10 and ESC-50 datasets. The MCLNN have achieved competitive performance compared to state-of-the-art Convolutional Neural Networks and hand-crafted attempts. △ Less

Submitted 28 April, 2019; v1 submitted 7 February, 2018; originally announced February 2018.

Comments: Restricted Boltzmann Machine, RBM, Conditional Restricted Boltzmann Machine, CRBM, Conditional Neural Networks, CLNN, Masked Conditional Neural Networks, MCLNN, Deep Neural Network, Environmental Sound Recognition, ESR

Journal ref: IEEE International Conference on Machine Learning and Applications (ICMLA) Year: 2017 Pages: 199 - 206

arXiv:1801.05504 [pdf]

doi 10.1109/ICDM.2017.125

Automatic Classification of Music Genre using Masked Conditional Neural Networks

Authors: Fady Medhat, David Chesmore, John Robinson

Abstract: Neural network based architectures used for sound recognition are usually adapted from other application domains such as image recognition, which may not harness the time-frequency representation of a signal. The ConditionaL Neural Networks (CLNN) and its extension the Masked ConditionaL Neural Networks (MCLNN) are designed for multidimensional temporal signal recognition. The CLNN is trained over… ▽ More Neural network based architectures used for sound recognition are usually adapted from other application domains such as image recognition, which may not harness the time-frequency representation of a signal. The ConditionaL Neural Networks (CLNN) and its extension the Masked ConditionaL Neural Networks (MCLNN) are designed for multidimensional temporal signal recognition. The CLNN is trained over a window of frames to preserve the inter-frame relation, and the MCLNN enforces a systematic sparseness over the network's links that mimics a filterbank-like behavior. The masking operation induces the network to learn in frequency bands, which decreases the network susceptibility to frequency-shifts in time-frequency representations. Additionally, the mask allows an exploration of a range of feature combinations concurrently analogous to the manual handcrafting of the optimum collection of features for a recognition task. MCLNN have achieved competitive performance on the Ballroom music dataset compared to several hand-crafted attempts and outperformed models based on state-of-the-art Convolutional Neural Networks. △ Less

Submitted 28 April, 2019; v1 submitted 16 January, 2018; originally announced January 2018.

Comments: Restricted Boltzmann Machine; RBM; Conditional RBM; CRBM; Deep Belief Net; DBN; Conditional Neural Network; CLNN; Masked Conditional Neural Network; MCLNN; Music Information Retrieval; MIR. IEEE International Conference on Data Mining (ICDM), 2017

Journal ref: IEEE International Conference on Data Mining (ICDM) Year: 2017 Pages: 979 - 984

Showing 1–15 of 15 results for author: Robinson, J