Skip to main content

Showing 1–44 of 44 results for author: Schwab, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.13369  [pdf, other

    cs.CL cs.AI cs.LG

    Clinical information extraction for Low-resource languages with Few-shot learning using Pre-trained language models and Prompting

    Authors: Phillip Richter-Pechanski, Philipp Wiesenbach, Dominic M. Schwab, Christina Kiriakou, Nicolas Geis, Christoph Dieterich, Anette Frank

    Abstract: Automatic extraction of medical information from clinical documents poses several challenges: high costs of required clinical expertise, limited interpretability of model predictions, restricted computational resources and privacy regulations. Recent advances in domain-adaptation and prompting methods showed promising results with minimal training data using lightweight masked language models, whi… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  2. arXiv:2309.14047  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.CR cs.IT

    Random-Energy Secret Sharing via Extreme Synergy

    Authors: Vudtiwat Ngampruetikorn, David J. Schwab

    Abstract: The random-energy model (REM), a solvable spin-glass model, has impacted an incredibly diverse set of problems, from protein folding to combinatorial optimization to many-body localization. Here, we explore a new connection to secret sharing. We formulate a secret-sharing scheme, based on the REM, and analyze its information-theoretic properties. Our analyses reveal that the correlations between s… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

    Comments: 6 pages, 5 figures

  3. arXiv:2309.05472  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

    Authors: Titouan Parcollet, Ha Nguyen, Solene Evain, Marcely Zanon Boito, Adrien Pupier, Salima Mdhaffar, Hang Le, Sina Alisamir, Natalia Tomashenko, Marco Dinarelli, Shucong Zhang, Alexandre Allauzen, Maximin Coavoux, Yannick Esteve, Mickael Rouvier, Jerome Goulian, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-… ▽ More

    Submitted 18 March, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Published in Computer Science and Language. Preprint allowed

  4. arXiv:2307.11170  [pdf, other

    cs.CL

    UMLS-KGI-BERT: Data-Centric Knowledge Integration in Transformers for Biomedical Entity Recognition

    Authors: Aidan Mannion, Thierry Chevalier, Didier Schwab, Lorraine Geouriot

    Abstract: Pre-trained transformer language models (LMs) have in recent years become the dominant paradigm in applied NLP. These models have achieved state-of-the-art performance on tasks such as information extraction, question answering, sentiment analysis, document classification and many others. In the biomedical domain, significant progress has been made in adapting this paradigm to NLP tasks that requi… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  5. arXiv:2303.17762  [pdf, other

    cs.IT cond-mat.stat-mech cs.LG physics.data-an q-bio.QM

    Generalized Information Bottleneck for Gaussian Variables

    Authors: Vudtiwat Ngampruetikorn, David J. Schwab

    Abstract: The information bottleneck (IB) method offers an attractive framework for understanding representation learning, however its applications are often limited by its computational intractability. Analytical characterization of the IB method is not only of practical interest, but it can also lead to new insights into learning phenomena. Here we consider a generalized IB problem, in which the mutual in… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: 7 pages, 3 figures

  6. arXiv:2301.11716  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Pre-training for Speech Translation: CTC Meets Optimal Transport

    Authors: Phuong-Hang Le, Hongyu Gong, Changhan Wang, Juan Pino, Benjamin Lecouteux, Didier Schwab

    Abstract: The gap between speech and text modalities is a major challenge in speech-to-text translation (ST). Different methods have been proposed to reduce this gap, but most of them require architectural changes in ST training. In this work, we propose to mitigate this issue at the pre-training stage, requiring no change in the ST model. First, we show that the connectionist temporal classification (CTC)… ▽ More

    Submitted 5 June, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: ICML 2023 (oral presentation). This version fixed URLs, updated affiliations & acknowledgements, and improved formatting

  7. arXiv:2208.03848  [pdf, other

    cs.IT cond-mat.stat-mech cs.LG physics.data-an stat.ML

    Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality

    Authors: Vudtiwat Ngampruetikorn, David J. Schwab

    Abstract: Avoiding overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. This puzzling contradiction necessitates new approaches to the study of overfitting. Here we quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data. Information efficient learning algorithms minimize resi… ▽ More

    Submitted 11 October, 2022; v1 submitted 7 August, 2022; originally announced August 2022.

    Comments: NeurIPS 2022

    ACM Class: H.1.1; I.2.6

  8. arXiv:2109.12801  [pdf, other

    cs.CV

    Effect Of Personalized Calibration On Gaze Estimation Using Deep-Learning

    Authors: Nairit Bandyopadhyay, Sébastien Riou, Didier Schwab

    Abstract: With the increase in computation power and the development of new state-of-the-art deep learning algorithms, appearance-based gaze estimation is becoming more and more popular. It is believed to work well with curated laboratory data sets, however it faces several challenges when deployed in real world scenario. One such challenge is to estimate the gaze of a person about which the Deep Learning m… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

  9. arXiv:2106.01463  [pdf, other

    cs.CL

    Lightweight Adapter Tuning for Multilingual Speech Translation

    Authors: Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab, Laurent Besacier

    Abstract: Adapter modules were recently introduced as an efficient alternative to fine-tuning in NLP. Adapter tuning consists in freezing pretrained parameters of a model and injecting lightweight modules between layers, resulting in the addition of only a small number of task-specific trainable parameters. While adapter tuning was investigated for multilingual neural machine translation, this paper propose… ▽ More

    Submitted 12 July, 2021; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: Accepted at ACL-IJCNLP 2021

  10. arXiv:2105.14940  [pdf, other

    cs.CL cs.AI cs.LG

    Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?

    Authors: Zae Myung Kim, Laurent Besacier, Vassilina Nikoulina, Didier Schwab

    Abstract: Recent studies on the analysis of the multilingual representations focus on identifying whether there is an emergence of language-independent representations, or whether a multilingual model partitions its weights among different languages. While most of such work has been conducted in a "black-box" manner, this paper aims to analyze individual components of a multilingual neural translation (NMT)… ▽ More

    Submitted 31 May, 2021; originally announced May 2021.

    Comments: 10 pages, accepted at Findings of ACL 2021 (short)

  11. arXiv:2105.13977  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech cs.IT physics.data-an

    Perturbation Theory for the Information Bottleneck

    Authors: Vudtiwat Ngampruetikorn, David J. Schwab

    Abstract: Extracting relevant information from data is crucial for all forms of learning. The information bottleneck (IB) method formalizes this, offering a mathematically precise and conceptually appealing framework for understanding learning phenomena. However the nonlinearity of the IB problem makes it computationally expensive and analytically intractable in general. Here we derive a perturbation theory… ▽ More

    Submitted 25 October, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: NeurIPS 2021

  12. LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

    Authors: Solene Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, Sina Alisamir, Ziyi Tong, Natalia Tomashenko, Marco Dinarelli, Titouan Parcollet, Alexandre Allauzen, Yannick Esteve, Benjamin Lecouteux, Francois Portet, Solange Rossato, Fabien Ringeval, Didier Schwab, Laurent Besacier

    Abstract: Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient spee… ▽ More

    Submitted 10 June, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

    Comments: Will be presented at Interspeech 2021

    Journal ref: Proc. Interspeech 2021

  13. arXiv:2103.12719  [pdf, other

    cs.CV cs.AI

    Characterizing and Improving the Robustness of Self-Supervised Learning through Background Augmentations

    Authors: Chaitanya K. Ryali, David J. Schwab, Ari S. Morcos

    Abstract: Recent progress in self-supervised learning has demonstrated promising results in multiple visual tasks. An important ingredient in high-performing self-supervised methods is the use of data augmentation by training models to place different augmented views of the same image nearby in embedding space. However, commonly used augmentation pipelines treat images holistically, ignoring the semantic re… ▽ More

    Submitted 12 November, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: Technical Report; Additional Results

  14. arXiv:2011.00747  [pdf, other

    cs.CL cs.SD eess.AS

    Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation

    Authors: Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab, Laurent Besacier

    Abstract: We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST). Our models are based on the original Transformer architecture (Vaswani et al., 2017) but consist of two decoders, each responsible for one task (ASR or ST). Our major contribution lies in how these decoders interact with each other: one… ▽ More

    Submitted 1 November, 2020; originally announced November 2020.

    Comments: Accepted at COLING 2020 (Oral)

    Journal ref: The 28th International Conference on Computational Linguistics (COLING 2020)

  15. arXiv:2010.06682  [pdf, other

    cs.CV cs.LG eess.IV

    Are all negatives created equal in contrastive instance discrimination?

    Authors: Tiffany Tianhui Cai, Jonathan Frankle, David J. Schwab, Ari S. Morcos

    Abstract: Self-supervised learning has recently begun to rival supervised learning on computer vision tasks. Many of the recent approaches have been based on contrastive instance discrimination (CID), in which the network is trained to recognize two augmented versions of the same instance (a query and positive) while discriminating against a pool of other instances (negatives). The learned representation is… ▽ More

    Submitted 25 October, 2020; v1 submitted 13 October, 2020; originally announced October 2020.

    Comments: Fixed author name error

  16. arXiv:2009.12789  [pdf, other

    cs.LG cs.IT stat.ML

    Learning Optimal Representations with the Decodable Information Bottleneck

    Authors: Yann Dubois, Douwe Kiela, David J. Schwab, Ramakrishna Vedantam

    Abstract: We address the question of characterizing and finding optimal representations for supervised learning. Traditionally, this question has been tackled using the Information Bottleneck, which compresses the inputs while retaining information about the targets, in a decoder-agnostic fashion. In machine learning, however, our goal is not compression but rather generalization, which is intimately linked… ▽ More

    Submitted 16 July, 2021; v1 submitted 27 September, 2020; originally announced September 2020.

    Comments: Accepted at NeurIPS 2020

  17. arXiv:2007.14823  [pdf, other

    cond-mat.dis-nn cond-mat.stat-mech cs.LG nlin.CD q-bio.NC

    Theory of gating in recurrent neural networks

    Authors: Kamesh Krishnamurthy, Tankut Can, David J. Schwab

    Abstract: Recurrent neural networks (RNNs) are powerful dynamical models, widely used in machine learning (ML) and neuroscience. Prior theoretical work has focused on RNNs with additive interactions. However, gating - i.e. multiplicative - interactions are ubiquitous in real neurons and also the central feature of the best-performing RNNs in ML. Here, we show that gating offers flexible control of two salie… ▽ More

    Submitted 1 December, 2021; v1 submitted 29 July, 2020; originally announced July 2020.

    Comments: 13 figures

  18. arXiv:2004.11759  [pdf, other

    cs.IR

    Learning Term Discrimination

    Authors: Jibril Frej, Phillipe Mulhem, Didier Schwab, Jean-Pierre Chevallet

    Abstract: Document indexing is a key component for efficient information retrieval (IR). After preprocessing steps such as stemming and stop-word removal, document indexes usually store term-frequencies (tf). Along with tf (that only reflects the importance of a term in a document), traditional IR models use term discrimination values (TDVs) such as inverse document frequency (idf) to favor discriminative t… ▽ More

    Submitted 28 April, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

    Comments: Accepted to ACM SIGIR 2020

  19. arXiv:2003.00152  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs

    Authors: Jonathan Frankle, David J. Schwab, Ari S. Morcos

    Abstract: A wide variety of deep learning techniques from style transfer to multitask learning rely on training affine transformations of features. Most prominent among these is the popular feature normalization technique BatchNorm, which normalizes activations and then subsequently applies a learned affine transform. In this paper, we aim to understand the role and expressive power of affine parameters use… ▽ More

    Submitted 21 March, 2021; v1 submitted 28 February, 2020; originally announced March 2020.

    Comments: Published in ICLR 2021

  20. arXiv:2002.10365  [pdf, other

    cs.LG cs.NE stat.ML

    The Early Phase of Neural Network Training

    Authors: Jonathan Frankle, David J. Schwab, Ari S. Morcos

    Abstract: Recent studies have shown that many important aspects of neural network learning take place within the very earliest iterations or epochs of training. For example, sparse, trainable sub-networks emerge (Frankle et al., 2019), gradient descent moves into a small subspace (Gur-Ari et al., 2018), and the network undergoes a critical period (Achille et al., 2019). Here, we examine the changes that dee… ▽ More

    Submitted 24 February, 2020; originally announced February 2020.

    Comments: ICLR 2020 Camera Ready. Available on OpenReview at https://openreview.net/forum?id=Hkl1iRNFwS

  21. arXiv:2002.00025  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech stat.ML

    Gating creates slow modes and controls phase-space complexity in GRUs and LSTMs

    Authors: Tankut Can, Kamesh Krishnamurthy, David J. Schwab

    Abstract: Recurrent neural networks (RNNs) are powerful dynamical models for data with complex temporal structure. However, training RNNs has traditionally proved challenging due to exploding or vanishing of gradients. RNN models such as LSTMs and GRUs (and their variants) significantly mitigate these issues associated with training by introducing various types of gating units into the architecture. While t… ▽ More

    Submitted 15 June, 2020; v1 submitted 31 January, 2020; originally announced February 2020.

    Comments: 18+18 pages, 4 figures, to appear in Proceedings of Machine Learning Research Vol. 107, 2020, 1st Annual Conference on Mathematical and Scientific Machine Learning

  22. arXiv:1912.05372  [pdf, ps, other

    cs.CL cs.LG

    FlauBERT: Unsupervised Language Model Pre-training for French

    Authors: Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab

    Abstract: Language models have become a key step to achieve state-of-the art results in many different Natural Language Processing (NLP) tasks. Leveraging the huge amount of unlabeled texts nowadays available, they provide an efficient way to pre-train continuous word representations that can be fine-tuned for a downstream task, along with their contextualization at the sentence level. This has been widely… ▽ More

    Submitted 12 March, 2020; v1 submitted 11 December, 2019; originally announced December 2019.

    Comments: Accepted to LREC 2020

  23. arXiv:1912.01901  [pdf, other

    cs.IR

    WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset

    Authors: Jibril Frej, Didier Schwab, Jean-Pierre Chevallet

    Abstract: Over the past years, deep learning methods allowed for new state-of-the-art results in ad-hoc information retrieval. However such methods usually require large amounts of annotated data to be effective. Since most standard ad-hoc information retrieval datasets publicly available for academic research (e.g. Robust04, ClueWeb09) have at most 250 annotated queries, the recent deep learning models for… ▽ More

    Submitted 17 March, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: Accepted at LREC 2020

    MSC Class: H.3.3 ACM Class: H.3.3

  24. arXiv:1911.02898  [pdf, other

    cs.CL

    The LIG system for the English-Czech Text Translation Task of IWSLT 2019

    Authors: Loïc Vial, Benjamin Lecouteux, Didier Schwab, Hang Le, Laurent Besacier

    Abstract: In this paper, we present our submission for the English to Czech Text Translation Task of IWSLT 2019. Our system aims to study how pre-trained language models, used as input embeddings, can improve a specialized machine translation system trained on few data. Therefore, we implemented a Transformer-based encoder-decoder neural system which is able to use the output of a pre-trained language model… ▽ More

    Submitted 7 November, 2019; originally announced November 2019.

    Comments: IWSLT 2019

  25. arXiv:1910.00195  [pdf, other

    cs.LG stat.ML

    How noise affects the Hessian spectrum in overparameterized neural networks

    Authors: Mingwei Wei, David J Schwab

    Abstract: Stochastic gradient descent (SGD) forms the core optimization method for deep neural networks. While some theoretical progress has been made, it still remains unclear why SGD leads the learning dynamics in overparameterized networks to solutions that generalize well. Here we show that for overparameterized networks with a degenerate valley in their loss landscape, SGD on average decreases the trac… ▽ More

    Submitted 29 October, 2019; v1 submitted 1 October, 2019; originally announced October 2019.

  26. arXiv:1905.05677  [pdf, other

    cs.CL

    Sense Vocabulary Compression through the Semantic Knowledge of WordNet for Neural Word Sense Disambiguation

    Authors: Loïc Vial, Benjamin Lecouteux, Didier Schwab

    Abstract: In this article, we tackle the issue of the limited quantity of manually sense annotated corpora for the task of word sense disambiguation, by exploiting the semantic relationships between senses such as synonymy, hypernymy and hyponymy, in order to compress the sense vocabulary of Princeton WordNet, and thus reduce the number of different sense tags that must be observed to disambiguate all words… ▽ More

    Submitted 27 August, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

    Comments: In proceedings of the 10th Global WordNet Conference - GWC 2019. arXiv admin note: text overlap with arXiv:1811.00960

  27. arXiv:1903.02606  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Mean-field Analysis of Batch Normalization

    Authors: Mingwei Wei, James Stokes, David J Schwab

    Abstract: Batch Normalization (BatchNorm) is an extremely useful component of modern neural network architectures, enabling optimization using higher learning rates and achieving faster convergence. In this paper, we use mean-field theory to analytically quantify the impact of BatchNorm on the geometry of the loss landscape for multi-layer networks consisting of fully-connected and convolutional layers. We… ▽ More

    Submitted 6 March, 2019; originally announced March 2019.

  28. arXiv:1902.04706  [pdf, other

    cs.LG cs.RO stat.ML

    Simultaneously Learning Vision and Feature-based Control Policies for Real-world Ball-in-a-Cup

    Authors: Devin Schwab, Tobias Springenberg, Murilo F. Martins, Thomas Lampe, Michael Neunert, Abbas Abdolmaleki, Tim Hertweck, Roland Hafner, Francesco Nori, Martin Riedmiller

    Abstract: We present a method for fast training of vision based control policies on real robots. The key idea behind our method is to perform multi-task Reinforcement Learning with auxiliary tasks that differ not only in the reward to be optimized but also in the state-space in which they operate. In particular, we allow auxiliary task policies to utilize task features that are available only at training-ti… ▽ More

    Submitted 18 February, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

    Comments: Videos can be found at https://sites.google.com/view/rss-2019-sawyer-bic/

  29. arXiv:1811.00960  [pdf, other

    cs.CL

    Improving the Coverage and the Generalization Ability of Neural Word Sense Disambiguation through Hypernymy and Hyponymy Relationships

    Authors: Loïc Vial, Benjamin Lecouteux, Didier Schwab

    Abstract: In Word Sense Disambiguation (WSD), the predominant approach generally involves a supervised system trained on sense annotated corpora. The limited quantity of such corpora however restricts the coverage and the performance of these systems. In this article, we propose a new method that solves these issues by taking advantage of the knowledge present in WordNet, and especially the hypernymy and hy… ▽ More

    Submitted 2 November, 2018; originally announced November 2018.

  30. arXiv:1808.02093  [pdf, other

    cs.AI cs.IT cs.LG cs.MA stat.ML

    Learning to Share and Hide Intentions using Information Regularization

    Authors: DJ Strouse, Max Kleiman-Weiner, Josh Tenenbaum, Matt Botvinick, David Schwab

    Abstract: Learning to cooperate with friends and compete with foes is a key component of multi-agent reinforcement learning. Typically to do so, one requires access to either a model of or interaction with the other agent(s). Here we show how to learn effective strategies for cooperation and competition in an asymmetric information game with no such model or interaction. Our approach is to encourage an agen… ▽ More

    Submitted 1 January, 2019; v1 submitted 6 August, 2018; originally announced August 2018.

    Comments: Presented at the 32nd Conference on Neural Information Processing Systems (NIPS 2018)

  31. arXiv:1803.08823  [pdf, other

    physics.comp-ph cond-mat.stat-mech cs.LG stat.ML

    A high-bias, low-variance introduction to Machine Learning for physicists

    Authors: Pankaj Mehta, Marin Bukov, Ching-Hao Wang, Alexandre G. R. Day, Clint Richardson, Charles K. Fisher, David J. Schwab

    Abstract: Machine Learning (ML) is one of the most exciting and dynamic areas of modern research and application. The purpose of this review is to provide an introduction to the core concepts and tools of machine learning in a manner easily understood and intuitive to physicists. The review begins by covering fundamental concepts in ML and modern statistics such as the bias-variance tradeoff, overfitting, r… ▽ More

    Submitted 27 May, 2019; v1 submitted 23 March, 2018; originally announced March 2018.

    Comments: Notebooks have been updated. 122 pages, 78 figures, 20 Python notebooks

    Journal ref: Phyics Reports 810 (2019) 1-124

  32. arXiv:1802.02053  [pdf, other

    cs.CL cs.LG

    Système de traduction automatique statistique Anglais-Arabe

    Authors: Marwa Hadj Salah, Didier Schwab, Hervé Blanchon, Mounir Zrigui

    Abstract: Machine translation (MT) is the process of translating text written in a source language into text in a target language. In this article, we present our English-Arabic statistical machine translation system. First, we present the general process for setting up a statistical machine translation system, then we describe the tools as well as the different corpora we used to build our MT system. Our s… ▽ More

    Submitted 6 February, 2018; originally announced February 2018.

    Comments: in French

  33. arXiv:1801.03844  [pdf, ps, other

    cs.IR

    Enhancing Translation Language Models with Word Embedding for Information Retrieval

    Authors: Jibril Frej, Jean-Pierre Chevallet, Didier Schwab

    Abstract: In this paper, we explore the usage of Word Embedding semantic resources for Information Retrieval (IR) task. This embedding, produced by a shallow neural network, have been shown to catch semantic similarities between words (Mikolov et al., 2013). Hence, our goal is to enhance IR Language Models by addressing the term mismatch problem. To do so, we applied the model presented in the paper Integra… ▽ More

    Submitted 11 January, 2018; originally announced January 2018.

  34. arXiv:1712.09657  [pdf, other

    stat.ML cs.AI cs.IT cs.LG

    The information bottleneck and geometric clustering

    Authors: DJ Strouse, David J Schwab

    Abstract: The information bottleneck (IB) approach to clustering takes a joint distribution $P\!\left(X,Y\right)$ and maps the data $X$ to cluster labels $T$ which retain maximal information about $Y$ (Tishby et al., 1999). This objective results in an algorithm that clusters data points based upon the similarity of their conditional distributions $P\!\left(Y\mid X\right)$. This is in contrast to classic "g… ▽ More

    Submitted 31 May, 2020; v1 submitted 27 December, 2017; originally announced December 2017.

    Comments: Updated to final published version with more detailed relationship to GMMs/k-means

    Journal ref: Neural Computation 31 (2019) 596-612

  35. arXiv:1705.08828  [pdf, other

    cs.CL

    Deep Investigation of Cross-Language Plagiarism Detection Methods

    Authors: Jeremy Ferrero, Laurent Besacier, Didier Schwab, Frederic Agnes

    Abstract: This paper is a deep investigation of cross-language plagiarism detection methods on a new recently introduced open dataset, which contains parallel and comparable collections of documents with multiple characteristics (different genres, languages and sizes of texts). We investigate cross-language plagiarism detection methods for 6 language pairs on 2 granularities of text units in order to draw r… ▽ More

    Submitted 24 May, 2017; originally announced May 2017.

    Comments: Accepted to BUCC (10th Workshop on Building and Using Comparable Corpora) colocated with ACL 2017

  36. arXiv:1704.02293  [pdf, other

    cs.CL

    Comparison of Global Algorithms in Word Sense Disambiguation

    Authors: Loïc Vial, Andon Tchechmedjiev, Didier Schwab

    Abstract: This article compares four probabilistic algorithms (global algorithms) for Word Sense Disambiguation (WSD) in terms of the number of scorer calls (local algo- rithm) and the F1 score as determined by a gold-standard scorer. Two algorithms come from the state of the art, a Simulated Annealing Algorithm (SAA) and a Genetic Algorithm (GA) as well as two algorithms that we first adapt from WSD that a… ▽ More

    Submitted 7 April, 2017; originally announced April 2017.

  37. arXiv:1704.01346  [pdf, ps, other

    cs.CL

    CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity

    Authors: Jeremy Ferrero, Frederic Agnes, Laurent Besacier, Didier Schwab

    Abstract: We present our submitted systems for Semantic Textual Similarity (STS) Track 4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must estimate their semantic similarity by a score between 0 and 5. In our submission, we use syntax-based, dictionary-based, context-based, and MT-based methods. We also combine these methods in unsupervised and supervised way. Our best run ranked… ▽ More

    Submitted 5 April, 2017; originally announced April 2017.

  38. arXiv:1702.03082  [pdf, other

    cs.CL

    UsingWord Embedding for Cross-Language Plagiarism Detection

    Authors: J. Ferrero, F. Agnes, L. Besacier, D. Schwab

    Abstract: This paper proposes to use distributed representation of words (word embeddings) in cross-language textual similarity detection. The main contributions of this paper are the following: (a) we introduce new cross-language similarity detection methods based on distributed representation of words; (b) we combine the different methods proposed to verify their complementarity and finally obtain an over… ▽ More

    Submitted 10 February, 2017; originally announced February 2017.

    Comments: Accepted to EACL 2017 (short)

  39. arXiv:1609.03541  [pdf, ps, other

    cond-mat.dis-nn cs.LG stat.ML

    Comment on "Why does deep and cheap learning work so well?" [arXiv:1608.08225]

    Authors: David J. Schwab, Pankaj Mehta

    Abstract: In a recent paper, "Why does deep and cheap learning work so well?", Lin and Tegmark claim to show that the map** between deep belief networks and the variational renormalization group derived in [ar** does not hold. In this comment, we show that these claims are incorrect and stem from a misunderstandin… ▽ More

    Submitted 12 September, 2016; originally announced September 2016.

    Comments: Comment on arXiv:1608.08225

  40. arXiv:1605.05775  [pdf, other

    stat.ML cond-mat.str-el cs.LG

    Supervised Learning with Quantum-Inspired Tensor Networks

    Authors: E. Miles Stoudenmire, David J. Schwab

    Abstract: Tensor networks are efficient representations of high-dimensional tensors which have been very successful for physics and mathematics applications. We demonstrate how algorithms for optimizing such networks can be adapted to supervised learning tasks by using matrix product states (tensor trains) to parameterize models for classifying images. For the MNIST data set we obtain less than 1% test set… ▽ More

    Submitted 18 May, 2017; v1 submitted 18 May, 2016; originally announced May 2016.

    Comments: 11 pages, 15 figures; updated version includes corrections, links to sample codes, expanded discussion, and additional references

    Journal ref: Advances in Neural Information Processing Systems 29, 4799 (2016)

  41. arXiv:1604.00268  [pdf, other

    q-bio.NC cond-mat.stat-mech cs.IT q-bio.QM stat.ML

    The deterministic information bottleneck

    Authors: DJ Strouse, David J Schwab

    Abstract: Lossy compression and clustering fundamentally involve a decision about what features are relevant and which are not. The information bottleneck method (IB) by Tishby, Pereira, and Bialek formalized this notion as an information-theoretic optimization problem and proposed an optimal tradeoff between throwing away as many bits as possible, and selectively kee** those that are most important. In t… ▽ More

    Submitted 19 December, 2016; v1 submitted 1 April, 2016; originally announced April 2016.

    Comments: 15 pages, 4 figures

  42. arXiv:1410.3831  [pdf, ps, other

    stat.ML cond-mat.stat-mech cs.LG cs.NE

    An exact map** between the Variational Renormalization Group and Deep Learning

    Authors: Pankaj Mehta, David J. Schwab

    Abstract: Deep learning is a broad set of techniques that uses multiple layers of representation to automatically learn relevant features directly from structured data. Recently, such techniques have yielded record-breaking results on a diverse set of difficult machine learning tasks in computer vision, speech recognition, and natural language processing. Despite the enormous success of deep learning, relat… ▽ More

    Submitted 14 October, 2014; originally announced October 2014.

    Comments: 8 pages, 3 figures

  43. arXiv:1201.4279  [pdf

    cs.HC

    Évaluation et consolidation d'un réseau lexical via un outil pour retrouver le mot sur le bout de la langue

    Authors: Alain Joubert, Mathieu Lafourcade, Didier Schwab, Michael Zock

    Abstract: Since September 2007, a large scale lexical network for French is under construction through methods based on some kind of popular consensus by means of games (JeuxDeMots project). Human intervention can be considered as marginal. It is limited to corrections, adjustments and validation of the senses of terms, which amounts to less than 0,5 % of the relations in the network. To appreciate the qual… ▽ More

    Submitted 20 January, 2012; originally announced January 2012.

    Journal ref: TALN, Montpellier : France (2011)

  44. arXiv:0906.0859  [pdf

    cs.DM

    A Partial Order on Bipartite Graphs with n Vertices

    Authors: Emil Daniel Schwab

    Abstract: The paper examines a partial order on bipartite graphs (X1, X2, E) with n vertices, X1UX2={1,2,...,n}. This partial order is a natural partial order of subobjects of an object in a triangular category with bipartite graphs as morphisms.

    Submitted 4 June, 2009; originally announced June 2009.

    Comments: 10 pages,exposed on 5th International Conference "Actualities and Perspectives on Hardware and Software" - APHS2009, Timisoara, Romania

    Journal ref: Ann. Univ. Tibiscus Comp. Sci. Series VII(2009),315-324