-
Gradient-Adjusted Neuron Activation Profiles for Comprehensive Introspection of Convolutional Speech Recognition Models
Authors:
Andreas Krug,
Sebastian Stober
Abstract:
Deep Learning based Automatic Speech Recognition (ASR) models are very successful, but hard to interpret. To gain better understanding of how Artificial Neural Networks (ANNs) accomplish their tasks, introspection methods have been proposed. Adapting such techniques from computer vision to speech recognition is not straight-forward, because speech data is more complex and less interpretable than i…
▽ More
Deep Learning based Automatic Speech Recognition (ASR) models are very successful, but hard to interpret. To gain better understanding of how Artificial Neural Networks (ANNs) accomplish their tasks, introspection methods have been proposed. Adapting such techniques from computer vision to speech recognition is not straight-forward, because speech data is more complex and less interpretable than image data. In this work, we introduce Gradient-adjusted Neuron Activation Profiles (GradNAPs) as means to interpret features and representations in Deep Neural Networks. GradNAPs are characteristic responses of ANNs to particular groups of inputs, which incorporate the relevance of neurons for prediction. We show how to utilize GradNAPs to gain insight about how data is processed in ANNs. This includes different ways of visualizing features and clustering of GradNAPs to compare embeddings of different groups of inputs in any layer of a given network. We demonstrate our proposed techniques using a fully-convolutional ASR model.
△ Less
Submitted 19 February, 2020;
originally announced February 2020.
-
Systematic Misestimation of Machine Learning Performance in Neuroimaging Studies of Depression
Authors:
Claas Flint,
Micah Cearns,
Nils Opel,
Ronny Redlich,
David M. A. Mehler,
Daniel Emden,
Nils R. Winter,
Ramona Leenings,
Simon B. Eickhoff,
Tilo Kircher,
Axel Krug,
Igor Nenadic,
Volker Arolt,
Scott Clark,
Bernhard T. Baune,
Xiaoyi Jiang,
Udo Dannlowski,
Tim Hahn
Abstract:
We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied ques…
▽ More
We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from major depressive disorder (MDD) and healthy control (HC) based on neuroimaging data. Drawing upon structural magnetic resonance imaging (MRI) data from a balanced sample of $N = 1,868$ MDD patients and HC from our recent international Predictive Analytics Competition (PAC), we first trained and tested a classification model on the full dataset which yielded an accuracy of $61\,\%$. Next, we mimicked the process by which researchers would draw samples of various sizes ($N = 4$ to $N = 150$) from the population and showed a strong risk of misestimation. Specifically, for small sample sizes ($N = 20$), we observe accuracies of up to $95\,\%$. For medium sample sizes ($N = 100$) accuracies up to $75\,\%$ were found. Importantly, further investigation showed that sufficiently large test sets effectively protect against performance misestimation whereas larger datasets per se do not. While these results question the validity of a substantial part of the current literature, we outline the relatively low-cost remedy of larger test sets, which is readily available in most cases.
△ Less
Submitted 3 May, 2021; v1 submitted 13 December, 2019;
originally announced December 2019.
-
Visualizing Deep Neural Networks for Speech Recognition with Learned Topographic Filter Maps
Authors:
Andreas Krug,
Sebastian Stober
Abstract:
The uninformative ordering of artificial neurons in Deep Neural Networks complicates visualizing activations in deeper layers. This is one reason why the internal structure of such models is very unintuitive. In neuroscience, activity of real brains can be visualized by highlighting active regions. Inspired by those techniques, we train a convolutional speech recognition model, where filters are a…
▽ More
The uninformative ordering of artificial neurons in Deep Neural Networks complicates visualizing activations in deeper layers. This is one reason why the internal structure of such models is very unintuitive. In neuroscience, activity of real brains can be visualized by highlighting active regions. Inspired by those techniques, we train a convolutional speech recognition model, where filters are arranged in a 2D grid and neighboring filters are similar to each other. We show, how those topographic filter maps visualize artificial neuron activations more intuitively. Moreover, we investigate, whether this causes phoneme-responsive neurons to be grouped in certain regions of the topographic map.
△ Less
Submitted 6 December, 2019;
originally announced December 2019.
-
Biological sex classification with structural MRI data shows increased misclassification in transgender women
Authors:
Claas Flint,
Katharina Förster,
Sophie A. Koser,
Carsten Konrad,
Pienie Zwitserlood,
Klaus Berger,
Marco Hermesdorf,
Tilo Kircher,
Igor Nenadic,
Axel Krug,
Bernhard T. Baune,
Katharina Dohm,
Ronny Redlich,
Nils Opel,
Volker Arolt,
Tim Hahn,
Xiaoyi Jiang,
Udo Dannlowski,
Dominik Grotegerd
Abstract:
Transgender individuals (TIs) show brain structural alterations that differ from their biological sex as well as their perceived gender. To substantiate evidence that the brain structure of TIs differs from male and female, we use a combined multivariate and univariate approach. Gray matter segments resulting from voxel-based morphometry preprocessing of $N = 1753$ cisgender (CG) healthy participa…
▽ More
Transgender individuals (TIs) show brain structural alterations that differ from their biological sex as well as their perceived gender. To substantiate evidence that the brain structure of TIs differs from male and female, we use a combined multivariate and univariate approach. Gray matter segments resulting from voxel-based morphometry preprocessing of $N = 1753$ cisgender (CG) healthy participants were used to train ($N=1402$) and validate (20 % hold-out; $N = 351$) a support-vector machine classifying the biological sex. As a second validation, we classified $N = 1104$ patients with depression. A third validation was performed using the matched CG sample of the transgender women (TWs) application-sample. Subsequently, the classifier was applied to $N = 26$ TWs. Finally, we compared brain volumes of CG-men, women and TW-pre/post treatment (cross-sex hormone treatment) in a univariate analysis controlling for sexual orientation, age and total brain volume. The application of our biological sex classifier to the transgender sample resulted in a significantly lower true positive rate (TPR) (TPR-male = 56.0 %). The TPR did not differ between CG-individuals with (TPR-male = 86.9 %) and without depression (TPR-male = 88.5 %). The univariate analysis of the transgender application-sample revealed that TW-pre/post treatment show brain structural differences from CG-women and CG-men in the putamen and insula, as well as the whole-brain analysis. Our results support the hypothesis that brain structure in TW differs from brain structure of their biological sex (male) as well as their perceived gender (female). This finding substantiates evidence that TIs show specific brain structural alterations leading to a different pattern of brain structure than CG-individuals.
△ Less
Submitted 22 April, 2020; v1 submitted 24 November, 2019;
originally announced November 2019.
-
Transfer Learning for Speech Recognition on a Budget
Authors:
Julius Kunze,
Louis Kirsch,
Ilia Kurenkov,
Andreas Krug,
Jens Johannsmeier,
Sebastian Stober
Abstract:
End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data. We conduct several systematic experiments adapting a Wav2Letter convolutional neural network originally trained for English ASR to the…
▽ More
End-to-end training of automated speech recognition (ASR) systems requires massive data and compute resources. We explore transfer learning based on model adaptation as an approach for training ASR models under constrained GPU memory, throughput and training data. We conduct several systematic experiments adapting a Wav2Letter convolutional neural network originally trained for English ASR to the German language. We show that this technique allows faster training on consumer-grade resources while requiring less training data in order to achieve the same accuracy, thereby lowering the cost of training ASR models in other languages. Model introspection revealed that small adaptations to the network's weights were sufficient for good performance, especially for inner layers.
△ Less
Submitted 1 June, 2017;
originally announced June 2017.