Search | arXiv e-print repository

Derivative-based regularization for regression

Authors: Enrico Lopedoto, Maksim Shekhunov, Vitaly Aksenov, Kizito Salako, Tillman Weyde

Abstract: In this work, we introduce a novel approach to regularization in multivariable regression problems. Our regularizer, called DLoss, penalises differences between the model's derivatives and derivatives of the data generating function as estimated from the training data. We call these estimated derivatives data derivatives. The goal of our method is to align the model to the data, not only in terms… ▽ More In this work, we introduce a novel approach to regularization in multivariable regression problems. Our regularizer, called DLoss, penalises differences between the model's derivatives and derivatives of the data generating function as estimated from the training data. We call these estimated derivatives data derivatives. The goal of our method is to align the model to the data, not only in terms of target values but also in terms of the derivatives involved. To estimate data derivatives, we select (from the training data) 2-tuples of input-value pairs, using either nearest neighbour or random, selection. On synthetic and real datasets, we evaluate the effectiveness of adding DLoss, with different weights, to the standard mean squared error loss. The experimental results show that with DLoss (using nearest neighbour selection) we obtain, on average, the best rank with respect to MSE on validation data sets, compared to no regularization, L2 regularization, and Dropout. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2311.16004 [pdf, other]

Improved Data Generation for Enhanced Asset Allocation: A Synthetic Dataset Approach for the Fixed Income Universe

Authors: Szymon Kubiak, Tillman Weyde, Oleksandr Galkin, Dan Philps, Ram Gopal

Abstract: We present a novel process for generating synthetic datasets tailored to assess asset allocation methods and construct portfolios within the fixed income universe. Our approach begins by enhancing the CorrGAN model to generate synthetic correlation matrices. Subsequently, we propose an Encoder-Decoder model that samples additional data conditioned on a given correlation matrix. The resulting synth… ▽ More We present a novel process for generating synthetic datasets tailored to assess asset allocation methods and construct portfolios within the fixed income universe. Our approach begins by enhancing the CorrGAN model to generate synthetic correlation matrices. Subsequently, we propose an Encoder-Decoder model that samples additional data conditioned on a given correlation matrix. The resulting synthetic dataset facilitates in-depth analyses of asset allocation methods across diverse asset universes. Additionally, we provide a case study that exemplifies the use of the synthetic dataset to improve portfolios constructed within a simulation-based asset allocation process. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2305.13258 [pdf, other]

NeSy4VRD: A Multifaceted Resource for Neurosymbolic AI Research using Knowledge Graphs in Visual Relationship Detection

Authors: David Herron, Ernesto Jiménez-Ruiz, Giacomo Tarroni, Tillman Weyde

Abstract: NeSy4VRD is a multifaceted resource designed to support the development of neurosymbolic AI (NeSy) research. NeSy4VRD re-establishes public access to the images of the VRD dataset and couples them with an extensively revised, quality-improved version of the VRD visual relationship annotations. Crucially, NeSy4VRD provides a well-aligned, companion OWL ontology that describes the dataset domain.It… ▽ More NeSy4VRD is a multifaceted resource designed to support the development of neurosymbolic AI (NeSy) research. NeSy4VRD re-establishes public access to the images of the VRD dataset and couples them with an extensively revised, quality-improved version of the VRD visual relationship annotations. Crucially, NeSy4VRD provides a well-aligned, companion OWL ontology that describes the dataset domain.It comes with open source infrastructure that provides comprehensive support for extensibility of the annotations (which, in turn, facilitates extensibility of the ontology), and open source code for loading the annotations to/from a knowledge graph. We are contributing NeSy4VRD to the computer vision, NeSy and Semantic Web communities to help foster more NeSy research using OWL-based knowledge graphs. △ Less

Submitted 22 May, 2023; originally announced May 2023.

arXiv:2304.03639 [pdf, other]

Theoretical Conditions and Empirical Failure of Bracket Counting on Long Sequences with Linear Recurrent Networks

Authors: Nadine El-Naggar, Pranava Madhyastha, Tillman Weyde

Abstract: Previous work has established that RNNs with an unbounded activation function have the capacity to count exactly. However, it has also been shown that RNNs are challenging to train effectively and generally do not learn exact counting behaviour. In this paper, we focus on this problem by studying the simplest possible RNN, a linear single-cell network. We conduct a theoretical analysis of linear R… ▽ More Previous work has established that RNNs with an unbounded activation function have the capacity to count exactly. However, it has also been shown that RNNs are challenging to train effectively and generally do not learn exact counting behaviour. In this paper, we focus on this problem by studying the simplest possible RNN, a linear single-cell network. We conduct a theoretical analysis of linear RNNs and identify conditions for the models to exhibit exact counting behaviour. We provide a formal proof that these conditions are necessary and sufficient. We also conduct an empirical analysis using tasks involving a Dyck-1-like Balanced Bracket language under two different settings. We observe that linear RNNs generally do not meet the necessary and sufficient conditions for counting behaviour when trained with the standard approach. We investigate how varying the length of training sequences and utilising different target classes impacts model behaviour during training and the ability of linear RNN models to effectively approximate the indicator conditions. △ Less

Submitted 7 April, 2023; originally announced April 2023.

Comments: 17th Conference of the European Chapter of the Association for Computational Linguistics Student Research Workshop (EACL 2023 SRW)

arXiv:2301.10799 [pdf, other]

Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering

Authors: Chenxi Whitehouse, Tillman Weyde, Pranava Madhyastha

Abstract: The field of visual question answering (VQA) has recently seen a surge in research focused on providing explanations for predicted answers. However, current systems mostly rely on separate models to predict answers and generate explanations, leading to less grounded and frequently inconsistent results. To address this, we propose a multitask learning approach towards a Unified Model for Answer and… ▽ More The field of visual question answering (VQA) has recently seen a surge in research focused on providing explanations for predicted answers. However, current systems mostly rely on separate models to predict answers and generate explanations, leading to less grounded and frequently inconsistent results. To address this, we propose a multitask learning approach towards a Unified Model for Answer and Explanation generation (UMAE). Our approach involves the addition of artificial prompt tokens to training data and fine-tuning a multimodal encoder-decoder model on a variety of VQA-related tasks. In our experiments, UMAE models surpass the prior state-of-the-art answer accuracy on A-OKVQA by 10~15%, show competitive results on OK-VQA, achieve new state-of-the-art explanation scores on A-OKVQA and VCR, and demonstrate promising out-of-domain performance on VQA-X. △ Less

Submitted 13 February, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

Comments: Findings of EACL 2023

arXiv:2211.16429 [pdf, other]

Exploring the Long-Term Generalization of Counting Behavior in RNNs

Authors: Nadine El-Naggar, Pranava Madhyastha, Tillman Weyde

Abstract: In this study, we investigate the generalization of LSTM, ReLU and GRU models on counting tasks over long sequences. Previous theoretical work has established that RNNs with ReLU activation and LSTMs have the capacity for counting with suitable configuration, while GRUs have limitations that prevent correct counting over longer sequences. Despite this and some positive empirical results for LSTMs… ▽ More In this study, we investigate the generalization of LSTM, ReLU and GRU models on counting tasks over long sequences. Previous theoretical work has established that RNNs with ReLU activation and LSTMs have the capacity for counting with suitable configuration, while GRUs have limitations that prevent correct counting over longer sequences. Despite this and some positive empirical results for LSTMs on Dyck-1 languages, our experimental results show that LSTMs fail to learn correct counting behavior for sequences that are significantly longer than in the training data. ReLUs show much larger variance in behavior and in most cases worse generalization. The long sequence generalization is empirically related to validation loss, but reliable long sequence generalization seems not practically achievable through backpropagation with current techniques. We demonstrate different failure modes for LSTMs, GRUs and ReLUs. In particular, we observe that the saturation of activation functions in LSTMs and the correct weight setting for ReLUs to generalize counting behavior are not achieved in standard training regimens. In summary, learning generalizable counting behavior is still an open problem and we discuss potential approaches for further research. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: Published in I Can't Believe It's Not Better: Understanding Deep Learning Through Empirical Falsification Workshop at NeurIPS 2022

arXiv:2208.00792 [pdf, other]

Jazz Contrafact Detection

Authors: C. Bunks, T. Weyde

Abstract: In jazz, a contrafact is a new melody composed over an existing, but often reharmonized chord progression. Because reharmonization can introduce a wide range of variations, detecting contrafacts is a challenging task. This paper develops a novel vector-space model to represent chord progressions, and uses it for contrafact detection. The process applies principles from music theory to reduce the d… ▽ More In jazz, a contrafact is a new melody composed over an existing, but often reharmonized chord progression. Because reharmonization can introduce a wide range of variations, detecting contrafacts is a challenging task. This paper develops a novel vector-space model to represent chord progressions, and uses it for contrafact detection. The process applies principles from music theory to reduce the dimensionality of chord space, determine a common key signature representation, and compute a chordal co-occurrence matrix. The rows of the matrix form a basis for the vector space in which chord progressions are represented as piecewise linear functions, and harmonic similarity is evaluated by computing the membrane area, a novel distance metric. To illustrate our method's effectiveness, we apply it to the Impro-Visor corpus of 2,612 chord progressions, and present examples demonstrating its ability to account for reharmonizations and find contrafacts. △ Less

Submitted 1 August, 2022; originally announced August 2022.

Comments: 8 pages, 6 figures, 4 tables

arXiv:2206.03224 [pdf]

The Beyond the Fence Musical and Computer Says Show Documentary

Authors: Simon Colton, Maria Teresa Llano, Rose Hepworth, John Charnley, Catherine V. Gale, Archie Baron, Francois Pachet, Pierre Roy, Pablo Gervas, Nick Collins, Bob Sturm, Tillman Weyde, Daniel Wolff, James Robert Lloyd

Abstract: During 2015 and early 2016, the cultural application of Computational Creativity research and practice took a big leap forward, with a project where multiple computational systems were used to provide advice and material for a new musical theatre production. Billed as the world's first 'computer musical... conceived by computer and substantially crafted by computer', Beyond The Fence was staged in… ▽ More During 2015 and early 2016, the cultural application of Computational Creativity research and practice took a big leap forward, with a project where multiple computational systems were used to provide advice and material for a new musical theatre production. Billed as the world's first 'computer musical... conceived by computer and substantially crafted by computer', Beyond The Fence was staged in the Arts Theatre in London's West End during February and March of 2016. Various computational approaches to analytical and generative sub-projects were used to bring about the musical, and these efforts were recorded in two 1-hour documentary films made by Wingspan Productions, which were aired on SkyArts under the title Computer Says Show. We provide details here of the project conception and execution, including details of the systems which took on some of the creative responsibility in writing the musical, and the contributions they made. We also provide details of the impact of the project, including a perspective from the two (human) writers with overall control of the creative aspects the musical. △ Less

Submitted 11 May, 2022; originally announced June 2022.

Journal ref: The Seventh International Conference on Computational Creativity, {ICCC} 2016

arXiv:2204.02385 [pdf, other]

doi 10.1109/TASLP.2023.3250840

Learning Speech Emotion Representations in the Quaternion Domain

Authors: Eric Guizzo, Tillman Weyde, Simone Scardapane, Danilo Comminiello

Abstract: The modeling of human emotion expression in speech signals is an important, yet challenging task. The high resource demand of speech emotion recognition models, combined with the the general scarcity of emotion-labelled data are obstacles to the development and application of effective solutions in this field. In this paper, we present an approach to jointly circumvent these difficulties. Our meth… ▽ More The modeling of human emotion expression in speech signals is an important, yet challenging task. The high resource demand of speech emotion recognition models, combined with the the general scarcity of emotion-labelled data are obstacles to the development and application of effective solutions in this field. In this paper, we present an approach to jointly circumvent these difficulties. Our method, named RH-emo, is a novel semi-supervised architecture aimed at extracting quaternion embeddings from real-valued monoaural spectrograms, enabling the use of quaternion-valued networks for speech emotion recognition tasks. RH-emo is a hybrid real/quaternion autoencoder network that consists of a real-valued encoder in parallel to a real-valued emotion classifier and a quaternion-valued decoder. On the one hand, the classifier permits to optimize each latent axis of the embeddings for the classification of a specific emotion-related characteristic: valence, arousal, dominance and overall emotion. On the other hand, the quaternion reconstruction enables the latent dimension to develop intra-channel correlations that are required for an effective representation as a quaternion entity. We test our approach on speech emotion recognition tasks using four popular datasets: Iemocap, Ravdess, EmoDb and Tess, comparing the performance of three well-established real-valued CNN architectures (AlexNet, ResNet-50, VGG) and their quaternion-valued equivalent fed with the embeddings created with RH-emo. We obtain a consistent improvement in the test accuracy for all datasets, while drastically reducing the resources' demand of models. Moreover, we performed additional experiments and ablation studies that confirm the effectiveness of our approach. The RH-emo repository is available at: https://github.com/ispamm/rhemo. △ Less

Submitted 3 March, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: Accepted for Publication in IEEE/ACM Transactions on Audio, Speech and Language Processing

arXiv:2204.00458 [pdf, other]

Evaluation of Fake News Detection with Knowledge-Enhanced Language Models

Authors: Chenxi Whitehouse, Tillman Weyde, Pranava Madhyastha, Nikos Komninos

Abstract: Recent advances in fake news detection have exploited the success of large-scale pre-trained language models (PLMs). The predominant state-of-the-art approaches are based on fine-tuning PLMs on labelled fake news datasets. However, large-scale PLMs are generally not trained on structured factual data and hence may not possess priors that are grounded in factually accurate knowledge. The use of exi… ▽ More Recent advances in fake news detection have exploited the success of large-scale pre-trained language models (PLMs). The predominant state-of-the-art approaches are based on fine-tuning PLMs on labelled fake news datasets. However, large-scale PLMs are generally not trained on structured factual data and hence may not possess priors that are grounded in factually accurate knowledge. The use of existing knowledge bases (KBs) with rich human-curated factual information has thus the potential to make fake news detection more effective and robust. In this paper, we investigate the impact of knowledge integration into PLMs for fake news detection. We study several state-of-the-art approaches for knowledge integration, mostly using Wikidata as KB, on two popular fake news datasets - LIAR, a politics-based dataset, and COVID-19, a dataset of messages posted on social media relating to the COVID-19 pandemic. Our experiments show that knowledge-enhanced models can significantly improve fake news detection on LIAR where the KB is relevant and up-to-date. The mixed results on COVID-19 highlight the reliance on stylistic features and the importance of domain-specific and current KBs. △ Less

Submitted 13 February, 2023; v1 submitted 1 April, 2022; originally announced April 2022.

Comments: Proceedings of AAAI-ICWSM 2022

Journal ref: Proceedings of the International AAAI Conference on Web and Social Media 16 (2022) 1425-1429

arXiv:2103.12864 [pdf, other]

Learned complex masks for multi-instrument source separation

Authors: Andreas Jansson, Rachel M. Bittner, Nicola Montecchio, Tillman Weyde

Abstract: Music source separation in the time-frequency domain is commonly achieved by applying a soft or binary mask to the magnitude component of (complex) spectrograms. The phase component is usually not estimated, but instead copied from the mixture and applied to the magnitudes of the estimated isolated sources. While this method has several practical advantages, it imposes an upper bound on the perfor… ▽ More Music source separation in the time-frequency domain is commonly achieved by applying a soft or binary mask to the magnitude component of (complex) spectrograms. The phase component is usually not estimated, but instead copied from the mixture and applied to the magnitudes of the estimated isolated sources. While this method has several practical advantages, it imposes an upper bound on the performance of the system, where the estimated isolated sources inherently exhibit audible "phase artifacts". In this paper we address these shortcomings by directly estimating masks in the complex domain, extending recent work from the speech enhancement literature. The method is particularly well suited for multi-instrument musical source separation since residual phase artifacts are more pronounced for spectrally overlap** instrument sources, a common scenario in music. We show that complex masks result in better separation than masks that operate solely on the magnitude component. △ Less

Submitted 23 March, 2021; originally announced March 2021.

arXiv:2103.06198 [pdf, other]

Relational Weight Priors in Neural Networks for Abstract Pattern Learning and Language Modelling

Authors: Radha Kopparti, Tillman Weyde

Abstract: Deep neural networks have become the dominant approach in natural language processing (NLP). However, in recent years, it has become apparent that there are shortcomings in systematicity that limit the performance and data efficiency of deep learning in NLP. These shortcomings can be clearly shown in lower-level artificial tasks, mostly on synthetic data. Abstract patterns are the best known examp… ▽ More Deep neural networks have become the dominant approach in natural language processing (NLP). However, in recent years, it has become apparent that there are shortcomings in systematicity that limit the performance and data efficiency of deep learning in NLP. These shortcomings can be clearly shown in lower-level artificial tasks, mostly on synthetic data. Abstract patterns are the best known examples of a hard problem for neural networks in terms of generalisation to unseen data. They are defined by relations between items, such as equality, rather than their values. It has been argued that these low-level problems demonstrate the inability of neural networks to learn systematically. In this study, we propose Embedded Relation Based Patterns (ERBP) as a novel way to create a relational inductive bias that encourages learning equality and distance-based relations for abstract patterns. ERBP is based on Relation Based Patterns (RBP), but modelled as a Bayesian prior on network weights and implemented as a regularisation term in otherwise standard network learning. ERBP is is easy to integrate into standard neural networks and does not affect their learning capacity. In our experiments, ERBP priors lead to almost perfect generalisation when learning abstract patterns from synthetic noise-free sequences. ERBP also improves natural language models on the word and character level and pitch prediction in melodies with RNN, GRU and LSTM networks. We also find improvements in in the more complex tasks of learning of graph edit distance and compositional sentence entailment. ERBP consistently improves over RBP and over standard networks, showing that it enables abstract pattern learning which contributes to performance in natural language tasks. △ Less

Submitted 10 March, 2021; originally announced March 2021.

Comments: 29 pages

arXiv:2006.06494 [pdf, other]

Anti-Transfer Learning for Task Invariance in Convolutional Neural Networks for Speech Processing

Authors: Eric Guizzo, Tillman Weyde, Giacomo Tarroni

Abstract: We introduce the novel concept of anti-transfer learning for speech processing with convolutional neural networks. While transfer learning assumes that the learning process for a target task will benefit from re-using representations learned for another task, anti-transfer avoids the learning of representations that have been learned for an orthogonal task, i.e., one that is not relevant and poten… ▽ More We introduce the novel concept of anti-transfer learning for speech processing with convolutional neural networks. While transfer learning assumes that the learning process for a target task will benefit from re-using representations learned for another task, anti-transfer avoids the learning of representations that have been learned for an orthogonal task, i.e., one that is not relevant and potentially misleading for the target task, such as speaker identity for speech recognition or speech content for emotion recognition. In anti-transfer learning, we penalize similarity between activations of a network being trained and another one previously trained on an orthogonal task, which yields more suitable representations. This leads to better generalization and provides a degree of control over correlations that are spurious or undesirable, e.g. to avoid social bias. We have implemented anti-transfer for convolutional neural networks in different configurations with several similarity metrics and aggregation functions, which we evaluate and analyze with several speech and audio tasks and settings, using six datasets. We show that anti-transfer actually leads to the intended invariance to the orthogonal task and to more appropriate features for the target task at hand. Anti-transfer learning consistently improves classification accuracy in all test cases. While anti-transfer creates computation and memory cost at training time, there is relatively little computation cost when using pre-trained models for orthogonal tasks. Anti-transfer is widely applicable and particularly useful where a specific invariance is desirable or where trained models are available and labeled data for orthogonal tasks are difficult to obtain. △ Less

Submitted 13 January, 2021; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: Neural Networks Journal

arXiv:2003.03375 [pdf, other]

Multi-Time-Scale Convolution for Emotion Recognition from Speech Audio Signals

Authors: Eric Guizzo, Tillman Weyde, Jack Barnett Leveson

Abstract: Robustness against temporal variations is important for emotion recognition from speech audio, since emotion is ex-pressed through complex spectral patterns that can exhibit significant local dilation and compression on the time axis depending on speaker and context. To address this and potentially other tasks, we introduce the multi-time-scale (MTS) method to create flexibility towards temporal v… ▽ More Robustness against temporal variations is important for emotion recognition from speech audio, since emotion is ex-pressed through complex spectral patterns that can exhibit significant local dilation and compression on the time axis depending on speaker and context. To address this and potentially other tasks, we introduce the multi-time-scale (MTS) method to create flexibility towards temporal variations when analyzing time-frequency representations of audio data. MTS extends convolutional neural networks with convolution kernels that are scaled and re-sampled along the time axis, to increase temporal flexibility without increasing the number of trainable parameters compared to standard convolutional layers. We evaluate MTS and standard convolutional layers in different architectures for emotion recognition from speech audio, using 4 datasets of different sizes. The results show that the use of MTS layers consistently improves the generalization of networks of different capacity and depth, compared to standard convolution, especially on smaller datasets △ Less

Submitted 6 March, 2020; originally announced March 2020.

arXiv:2003.03125 [pdf, other]

Weight Priors for Learning Identity Relations

Authors: Radha Kopparti, Tillman Weyde

Abstract: Learning abstract and systematic relations has been an open issue in neural network learning for over 30 years. It has been shown recently that neural networks do not learn relations based on identity and are unable to generalize well to unseen data. The Relation Based Pattern (RBP) approach has been proposed as a solution for this problem. In this work, we extend RBP by realizing it as a Bayesian… ▽ More Learning abstract and systematic relations has been an open issue in neural network learning for over 30 years. It has been shown recently that neural networks do not learn relations based on identity and are unable to generalize well to unseen data. The Relation Based Pattern (RBP) approach has been proposed as a solution for this problem. In this work, we extend RBP by realizing it as a Bayesian prior on network weights to model the identity relations. This weight prior leads to a modified regularization term in otherwise standard network learning. In our experiments, we show that the Bayesian weight priors lead to perfect generalization when learning identity based relations and do not impede general neural network learning. We believe that the approach of creating an inductive bias with weight priors can be extended easily to other forms of relations and will be beneficial for many other learning tasks. △ Less

Submitted 19 May, 2020; v1 submitted 6 March, 2020; originally announced March 2020.

Comments: Proceedings of KR2ML @ NeurIPS 2019, Vancouver, Canada

Journal ref: Proceedings of KR2ML @ NeurIPS 2019, Vancouver, Canada

arXiv:1911.04489 [pdf, other]

Making Good on LSTMs' Unfulfilled Promise

Authors: Daniel Philps, Artur d'Avila Garcez, Tillman Weyde

Abstract: LSTMs promise much to financial time-series analysis, temporal and cross-sectional inference, but we find that they do not deliver in a real-world financial management task. We examine an alternative called Continual Learning (CL), a memory-augmented approach, which can provide transparent explanations, i.e. which memory did what and when. This work has implications for many financial applications… ▽ More LSTMs promise much to financial time-series analysis, temporal and cross-sectional inference, but we find that they do not deliver in a real-world financial management task. We examine an alternative called Continual Learning (CL), a memory-augmented approach, which can provide transparent explanations, i.e. which memory did what and when. This work has implications for many financial applications including credit, time-varying fairness in decision making and more. We make three important new observations. Firstly, as well as being more explainable, time-series CL approaches outperform LSTMs as well as a simple sliding window learner using feed-forward neural networks (FFNN). Secondly, we show that CL based on a sliding window learner (FFNN) is more effective than CL based on a sequential learner (LSTM). Thirdly, we examine how real-world, time-series noise impacts several similarity approaches used in CL memory addressing. We provide these insights using an approach called Continual Learning Augmentation (CLA) tested on a complex real-world problem, emerging market equities investment decision making. CLA provides a test-bed as it can be based on different types of time-series learners, allowing testing of LSTM and FFNN learners side by side. CLA is also used to test several distance approaches used in a memory recall-gate: Euclidean distance (ED), dynamic time war** (DTW), auto-encoders (AE) and a novel hybrid approach, warp-AE. We find that ED under-performs DTW and AE but warp-AE shows the best overall performance in a real-world financial task. △ Less

Submitted 8 December, 2019; v1 submitted 11 November, 2019; originally announced November 2019.

Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. arXiv admin note: text overlap with arXiv:1812.02340

arXiv:1910.10071 [pdf, other]

Improving singing voice separation with the Wave-U-Net using Minimum Hyperspherical Energy

Authors: Joaquin Perez-Lapillo, Oleksandr Galkin, Tillman Weyde

Abstract: In recent years, deep learning has surpassed traditional approaches to the problem of singing voice separation. The Wave-U-Net is a recent deep network architecture that operates directly on the time domain. The standard Wave-U-Net is trained with data augmentation and early stop** to prevent overfitting. Minimum hyperspherical energy (MHE) regularization has recently proven to increase generali… ▽ More In recent years, deep learning has surpassed traditional approaches to the problem of singing voice separation. The Wave-U-Net is a recent deep network architecture that operates directly on the time domain. The standard Wave-U-Net is trained with data augmentation and early stop** to prevent overfitting. Minimum hyperspherical energy (MHE) regularization has recently proven to increase generalization in image classification problems by encouraging a diversified filter configuration. In this work, we apply MHE regularization to the 1D filters of the Wave-U-Net. We evaluated this approach for separating the vocal part from mixed music audio recordings on the MUSDB18 dataset. We found that adding MHE regularization to the loss function consistently improves singing voice separation, as measured in the Signal to Distortion Ratio on test recordings, leading to the current best time-domain system for singing voice extraction. △ Less

Submitted 22 October, 2019; originally announced October 2019.

Comments: Paper submitted to ICASSP 2020 conference

arXiv:1906.08362 [pdf, other]

Trepan Reloaded: A Knowledge-driven Approach to Explaining Artificial Neural Networks

Authors: Roberto Confalonieri, Tillman Weyde, Tarek R. Besold, Fermín Moscoso del Prado Martín

Abstract: Explainability in Artificial Intelligence has been revived as a topic of active research by the need of conveying safety and trust to users in the `how' and `why' of automated decision-making. Whilst a plethora of approaches have been developed for post-hoc explainability, only a few focus on how to use domain knowledge, and how this influences the understandability of global explanations from the… ▽ More Explainability in Artificial Intelligence has been revived as a topic of active research by the need of conveying safety and trust to users in the `how' and `why' of automated decision-making. Whilst a plethora of approaches have been developed for post-hoc explainability, only a few focus on how to use domain knowledge, and how this influences the understandability of global explanations from the users' perspective. In this paper, we show how ontologies help the understandability of global post-hoc explanations, presented in the form of symbolic models. In particular, we build on Trepan, an algorithm that explains artificial neural networks by means of decision trees, and we extend it to include ontologies modeling domain knowledge in the process of generating explanations. We present the results of a user study that measures the understandability of decision trees using a syntactic complexity measure, and through time and accuracy of responses as well as reported user confidence and understandability. The user study considers domains where explanations are critical, namely, in finance and medicine. The results show that decision trees generated with our algorithm, taking into account domain knowledge, are more understandable than those generated by standard Trepan without the use of ontologies. △ Less

Submitted 21 November, 2019; v1 submitted 19 June, 2019; originally announced June 2019.

arXiv:1906.05449 [pdf, other]

Factors for the Generalisation of Identity Relations by Neural Networks

Authors: Radha Kopparti, Tillman Weyde

Abstract: Many researchers implicitly assume that neural networks learn relations and generalise them to new unseen data. It has been shown recently, however, that the generalisation of feed-forward networks fails for identity relations.The proposed solution for this problem is to create an inductive bias with Differential Rectifier (DR) units. In this work we explore various factors in the neural network a… ▽ More Many researchers implicitly assume that neural networks learn relations and generalise them to new unseen data. It has been shown recently, however, that the generalisation of feed-forward networks fails for identity relations.The proposed solution for this problem is to create an inductive bias with Differential Rectifier (DR) units. In this work we explore various factors in the neural network architecture and learning process whether they make a difference to the generalisation on equality detection of Neural Networks without and and with DR units in early and mid fusion architectures. We find in experiments with synthetic data effects of the number of hidden layers, the activation function and the data representation. The training set size in relation to the total possible set of vectors also makes a difference. However, the accuracy never exceeds 61% without DR units at 50% chance level. DR units improve generalisation in all tasks and lead to almost perfect test accuracy in the Mid Fusion setting. Thus, DR units seem to be a promising approach for creating generalisation abilities that standard networks lack. △ Less

Submitted 12 June, 2019; originally announced June 2019.

Comments: ICML 2019 Workshop on Understanding and Improving Generalization in Deep Learning}, Long Beach, California, 2019

Journal ref: ICML 2019 Workshop on Understanding and Improving Generalization in Deep Learning}, Long Beach, California, 201

arXiv:1812.02616 [pdf, other]

Modelling Identity Rules with Neural Networks

Authors: Tillman Weyde, Radha Manisha Kopparti

Abstract: In this paper, we show that standard feed-forward and recurrent neural networks fail to learn abstract patterns based on identity rules. We propose Relation Based Pattern (RBP) extensions to neural network structures that solve this problem and answer, as well as raise, questions about integrating structures for inductive bias into neural networks. Examples of abstract patterns are the sequence pa… ▽ More In this paper, we show that standard feed-forward and recurrent neural networks fail to learn abstract patterns based on identity rules. We propose Relation Based Pattern (RBP) extensions to neural network structures that solve this problem and answer, as well as raise, questions about integrating structures for inductive bias into neural networks. Examples of abstract patterns are the sequence patterns ABA and ABB where A or B can be any object. These were introduced by Marcus et al (1999) who also found that 7 month old infants recognise these patterns in sequences that use an unfamiliar vocabulary while simple recurrent neural networks do not.This result has been contested in the literature but it is confirmed by our experiments. We also show that the inability to generalise extends to different, previously untested, settings. We propose a new approach to modify standard neural network architectures, called Relation Based Patterns (RBP) with different variants for classification and prediction. Our experiments show that neural networks with the appropriate RBP structure achieve perfect classification and prediction performance on synthetic data, including mixed concrete and abstract patterns. RBP also improves neural network performance in experiments with real-world sequence prediction tasks. We discuss these finding in terms of challenges for neural network models and identify consequences from this result in terms of develo** inductive biases for neural network learning. △ Less

Submitted 20 May, 2019; v1 submitted 6 December, 2018; originally announced December 2018.

Comments: To be Published in Journal of Applied Logic

MSC Class: 62M45 ACM Class: I.2.6

Journal ref: Journal of applied logics (Online) ISSN 2631-9829 Volume 6, Number 4, June 2019

arXiv:1812.02340 [pdf, other]

Continual Learning Augmented Investment Decisions

Authors: Daniel Philps, Tillman Weyde, Artur d'Avila Garcez, Roy Batchelor

Abstract: Investment decisions can benefit from incorporating an accumulated knowledge of the past to drive future decision making. We introduce Continual Learning Augmentation (CLA) which is based on an explicit memory structure and a feed forward neural network (FFNN) base model and used to drive long term financial investment decisions. We demonstrate that our approach improves accuracy in investment dec… ▽ More Investment decisions can benefit from incorporating an accumulated knowledge of the past to drive future decision making. We introduce Continual Learning Augmentation (CLA) which is based on an explicit memory structure and a feed forward neural network (FFNN) base model and used to drive long term financial investment decisions. We demonstrate that our approach improves accuracy in investment decision making while memory is addressed in an explainable way. Our approach introduces novel remember cues, consisting of empirically learned change points in the absolute error series of the FFNN. Memory recall is also novel, with contextual similarity assessed over time by sampling distances using dynamic time war** (DTW). We demonstrate the benefits of our approach by using it in an expected return forecasting task to drive investment decisions. In an investment simulation in a broad international equity universe between 2003-2017, our approach significantly outperforms FFNN base models. We also illustrate how CLA's memory addressing works in practice, using a worked example to demonstrate the explainability of our approach. △ Less

Submitted 25 January, 2019; v1 submitted 5 December, 2018; originally announced December 2018.

Comments: NeurIPS 2018 Workshop on Challenges and Opportunities for AI in Financial Services: the Impact of Fairness, Explainability, Accuracy, and Privacy, Montreal, Canada. This is a non-archival publication - the authors may submit revisions and extensions of this paper to other publication venues

arXiv:1812.01662 [pdf, other]

Feed-Forward Neural Networks Need Inductive Bias to Learn Equality Relations

Authors: Tillman Weyde, Radha Manisha Kopparti

Abstract: Basic binary relations such as equality and inequality are fundamental to relational data structures. Neural networks should learn such relations and generalise to new unseen data. We show in this study, however, that this generalisation fails with standard feed-forward networks on binary vectors. Even when trained with maximal training data, standard networks do not reliably detect equality.We in… ▽ More Basic binary relations such as equality and inequality are fundamental to relational data structures. Neural networks should learn such relations and generalise to new unseen data. We show in this study, however, that this generalisation fails with standard feed-forward networks on binary vectors. Even when trained with maximal training data, standard networks do not reliably detect equality.We introduce differential rectifier (DR) units that we add to the network in different configurations. The DR units create an inductive bias in the networks, so that they do learn to generalise, even from small numbers of examples and we have not found any negative effect of their inclusion in the network. Given the fundamental nature of these relations, we hypothesize that feed-forward neural network learning benefits from inductive bias in other relations as well. Consequently, the further development of suitable inductive biases will be beneficial to many tasks in relational learning with neural networks. △ Less

Submitted 4 December, 2018; originally announced December 2018.

Comments: Relational Representation Learning Workshop, NeurIPS 2018

Journal ref: Relational Representation Learning (R2L) Workshop, NeurIPS 2018

arXiv:1811.11307 [pdf, other]

Improved Speech Enhancement with the Wave-U-Net

Authors: Craig Macartney, Tillman Weyde

Abstract: We study the use of the Wave-U-Net architecture for speech enhancement, a model introduced by Stoller et al for the separation of music vocals and accompaniment. This end-to-end learning method for audio source separation operates directly in the time domain, permitting the integrated modelling of phase information and being able to take large temporal contexts into account. Our experiments show t… ▽ More We study the use of the Wave-U-Net architecture for speech enhancement, a model introduced by Stoller et al for the separation of music vocals and accompaniment. This end-to-end learning method for audio source separation operates directly in the time domain, permitting the integrated modelling of phase information and being able to take large temporal contexts into account. Our experiments show that the proposed method improves several metrics, namely PESQ, CSIG, CBAK, COVL and SSNR, over the state-of-the-art with respect to the speech enhancement task on the Voice Bank corpus (VCTK) dataset. We find that a reduced number of hidden layers is sufficient for speech enhancement in comparison to the original system designed for singing voice separation in music. We see this initial result as an encouraging signal to further explore speech enhancement in the time-domain, both as an end in itself and as a pre-processing step to speech recognition systems. △ Less

Submitted 27 November, 2018; originally announced November 2018.

Comments: 5 pages (including 1 for References), 1 figure, 2 tables

arXiv:1811.07738 [pdf, other]

M2U-Net: Effective and Efficient Retinal Vessel Segmentation for Resource-Constrained Environments

Authors: Tim Laibacher, Tillman Weyde, Sepehr Jalali

Abstract: In this paper, we present a novel neural network architecture for retinal vessel segmentation that improves over the state of the art on two benchmark datasets, is the first to run in real time on high resolution images, and its small memory and processing requirements make it deployable in mobile and embedded systems. The M2U-Net has a new encoder-decoder architecture that is inspired by the U-Ne… ▽ More In this paper, we present a novel neural network architecture for retinal vessel segmentation that improves over the state of the art on two benchmark datasets, is the first to run in real time on high resolution images, and its small memory and processing requirements make it deployable in mobile and embedded systems. The M2U-Net has a new encoder-decoder architecture that is inspired by the U-Net. It adds pretrained components of MobileNetV2 in the encoder part and novel contractive bottleneck blocks in the decoder part that, combined with bilinear upsampling, drastically reduce the parameter count to 0.55M compared to 31.03M in the original U-Net. We have evaluated its performance against a wide body of previously published results on three public datasets. On two of them, the M2U-Net achieves new state-of-the-art performance by a considerable margin. When implemented on a GPU, our method is the first to achieve real-time inference speeds on high-resolution fundus images. We also implemented our proposed network on an ARM-based embedded system where it segments images in between 0.6 and 15 sec, depending on the resolution. Thus, the M2U-Net enables a number of applications of retinal vessel structure extraction, such as early diagnosis of eye diseases, retinal biometric authentication systems, and robot assisted microsurgery. △ Less

Submitted 23 April, 2019; v1 submitted 19 November, 2018; originally announced November 2018.

arXiv:1710.02245 [pdf, other]

Linear-Time Sequence Classification using Restricted Boltzmann Machines

Authors: Son N. Tran, Srikanth Cherla, Artur Garcez, Tillman Weyde

Abstract: Classification of sequence data is the topic of interest for dynamic Bayesian models and Recurrent Neural Networks (RNNs). While the former can explicitly model the temporal dependencies between class variables, the latter have a capability of learning representations. Several attempts have been made to improve performance by combining these two approaches or increasing the processing capability o… ▽ More Classification of sequence data is the topic of interest for dynamic Bayesian models and Recurrent Neural Networks (RNNs). While the former can explicitly model the temporal dependencies between class variables, the latter have a capability of learning representations. Several attempts have been made to improve performance by combining these two approaches or increasing the processing capability of the hidden units in RNNs. This often results in complex models with a large number of learning parameters. In this paper, a compact model is proposed which offers both representation learning and temporal inference of class variables by rolling Restricted Boltzmann Machines (RBMs) and class variables over time. We address the key issue of intractability in this variant of RBMs by optimising a conditional distribution, instead of a joint distribution. Experiments reported in the paper on melody modelling and optical character recognition show that the proposed model can outperform the state-of-the-art. Also, the experimental results on optical character recognition, part-of-speech tagging and text chunking demonstrate that our model is comparable to recurrent neural networks with complex memory gates while requiring far fewer parameters. △ Less

Submitted 8 March, 2018; v1 submitted 5 October, 2017; originally announced October 2017.

arXiv:1604.01806 [pdf, ps, other]

Generalising the Discriminative Restricted Boltzmann Machine

Authors: Srikanth Cherla, Son N Tran, Tillman Weyde, Artur d'Avila Garcez

Abstract: We present a novel theoretical result that generalises the Discriminative Restricted Boltzmann Machine (DRBM). While originally the DRBM was defined assuming the {0, 1}-Bernoulli distribution in each of its hidden units, this result makes it possible to derive cost functions for variants of the DRBM that utilise other distributions, including some that are often encountered in the literature. This… ▽ More We present a novel theoretical result that generalises the Discriminative Restricted Boltzmann Machine (DRBM). While originally the DRBM was defined assuming the {0, 1}-Bernoulli distribution in each of its hidden units, this result makes it possible to derive cost functions for variants of the DRBM that utilise other distributions, including some that are often encountered in the literature. This is illustrated with the Binomial and {-1, +1}-Bernoulli distributions here. We evaluate these two DRBM variants and compare them with the original one on three benchmark datasets, namely the MNIST and USPS digit classification datasets, and the 20 Newsgroups document classification dataset. Results show that each of the three compared models outperforms the remaining two in one of the three datasets, thus indicating that the proposed theoretical generalisation of the DRBM may be valuable in practice. △ Less

Submitted 6 April, 2016; originally announced April 2016.

Comments: Submitted to ECML 2016 conference track

arXiv:1411.1623 [pdf, ps, other]

A Hybrid Recurrent Neural Network For Music Transcription

Authors: Siddharth Sigtia, Emmanouil Benetos, Nicolas Boulanger-Lewandowski, Tillman Weyde, Artur S. d'Avila Garcez, Simon Dixon

Abstract: We investigate the problem of incorporating higher-level symbolic score-like information into Automatic Music Transcription (AMT) systems to improve their performance. We use recurrent neural networks (RNNs) and their variants as music language models (MLMs) and present a generative architecture for combining these models with predictions from a frame level acoustic classifier. We also compare dif… ▽ More We investigate the problem of incorporating higher-level symbolic score-like information into Automatic Music Transcription (AMT) systems to improve their performance. We use recurrent neural networks (RNNs) and their variants as music language models (MLMs) and present a generative architecture for combining these models with predictions from a frame level acoustic classifier. We also compare different neural network architectures for acoustic modeling. The proposed model computes a distribution over possible output sequences given the acoustic input signal and we present an algorithm for performing a global search for good candidate transcriptions. The performance of the proposed model is evaluated on piano music from the MAPS dataset and we observe that the proposed model consistently outperforms existing transcription methods. △ Less

Submitted 6 November, 2014; originally announced November 2014.

Showing 1–27 of 27 results for author: Weyde, T