Skip to main content

Showing 1–33 of 33 results for author: Niculae, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.13725  [pdf, other

    cs.LG

    Sparse and Structured Hopfield Networks

    Authors: Saul Santos, Vlad Niculae, Daniel McNamee, Andre F. T. Martins

    Abstract: Modern Hopfield networks have enjoyed recent interest due to their connection to attention in transformers. Our paper provides a unified framework for sparse Hopfield networks by establishing a link with Fenchel-Young losses. The result is a new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations. We reveal a connection between loss mar… ▽ More

    Submitted 4 June, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: 20 pages, 4 figures

  2. arXiv:2402.01404  [pdf, other

    cs.CL

    On Measuring Context Utilization in Document-Level MT Systems

    Authors: Wafaa Mohammed, Vlad Niculae

    Abstract: Document-level translation models are usually evaluated using general metrics such as BLEU, which are not informative about the benefits of context. Current work on context-aware evaluation, such as contrastive methods, only measure translation accuracy on words that need context for disambiguation. Such measures cannot reveal whether the translation model uses the correct supporting context. We p… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  3. arXiv:2310.20620  [pdf, other

    cs.CL cs.LG

    The Unreasonable Effectiveness of Random Target Embeddings for Continuous-Output Neural Machine Translation

    Authors: Evgeniia Tokarchuk, Vlad Niculae

    Abstract: Continuous-output neural machine translation (CoNMT) replaces the discrete next-word prediction problem with an embedding prediction. The semantic structure of the target embedding space (i.e., closeness of related words) is intuitively believed to be crucial. We challenge this assumption and show that completely random output embeddings can outperform laboriously pretrained ones, especially on la… ▽ More

    Submitted 2 April, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

  4. arXiv:2307.12835  [pdf, other

    cs.CL

    Joint Dropout: Improving Generalizability in Low-Resource Neural Machine Translation through Phrase Pair Variables

    Authors: Ali Araabi, Vlad Niculae, Christof Monz

    Abstract: Despite the tremendous success of Neural Machine Translation (NMT), its performance on low-resource language pairs still remains subpar, partly due to the limited ability to handle previously unseen inputs, i.e., generalization. In this paper, we propose a method called Joint Dropout, that addresses the challenge of low-resource neural machine translation by substituting phrases with variables, re… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: Accepted at MT Summit 2023

    MSC Class: 68T50 ACM Class: I.2.7

  5. arXiv:2306.13503  [pdf, other

    stat.ML cs.LG

    Two derivations of Principal Component Analysis on datasets of distributions

    Authors: Vlad Niculae

    Abstract: In this brief note, we formulate Principal Component Analysis (PCA) over datasets consisting not of points but of distributions, characterized by their location and covariance. Just like the usual PCA on points can be equivalently derived via a variance-maximization principle and via a minimization of reconstruction error, we derive a closed-form solution for distributional PCA from both of these… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

    Comments: 4 pages, 1 figure

  6. arXiv:2305.11550  [pdf, other

    cs.CL

    Viewing Knowledge Transfer in Multilingual Machine Translation Through a Representational Lens

    Authors: David Stap, Vlad Niculae, Christof Monz

    Abstract: We argue that translation quality alone is not a sufficient metric for measuring knowledge transfer in multilingual neural machine translation. To support this claim, we introduce Representational Transfer Potential (RTP), which measures representational similarities between languages. We show that RTP can measure both positive and negative transfer (interference), and find that RTP is strongly co… ▽ More

    Submitted 4 December, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP 2023 Findings

  7. arXiv:2301.11898  [pdf, other

    cs.LG cs.AI stat.ML

    DAG Learning on the Permutahedron

    Authors: Valentina Zantedeschi, Luca Franceschi, Jean Kaddour, Matt J. Kusner, Vlad Niculae

    Abstract: We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. Our approach optimizes over the polytope of permutation vectors, the so-called Permutahedron, to learn a topological ordering. Edges can be optimized jointly, or learned conditional on the ordering via a non-differentiable subroutine. Compared to existing continuous optimiz… ▽ More

    Submitted 10 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

    Comments: The Eleventh International Conference on Learning Representations

  8. arXiv:2301.07473  [pdf, other

    cs.LG stat.ML

    Discrete Latent Structure in Neural Networks

    Authors: Vlad Niculae, Caio F. Corro, Nikita Nangia, Tsvetomila Mihaylova, André F. T. Martins

    Abstract: Many types of data from fields including natural language processing, computer vision, and bioinformatics, are well represented by discrete, compositional structures such as trees, sequences, or matchings. Latent structure models are a powerful tool for learning to extract such representations, offering a way to incorporate structural bias, discover insight about the data, and interpret decisions.… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

    ACM Class: I.2.6

  9. arXiv:2208.05225  [pdf, other

    cs.CL

    How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in Neural Machine Translation?

    Authors: Ali Araabi, Christof Monz, Vlad Niculae

    Abstract: Neural Machine Translation (NMT) is an open vocabulary problem. As a result, dealing with the words not occurring during training (a.k.a. out-of-vocabulary (OOV) words) have long been a fundamental challenge for NMT systems. The predominant method to tackle this problem is Byte Pair Encoding (BPE) which splits words, including OOV words, into sub-word segments. BPE has achieved impressive results… ▽ More

    Submitted 17 August, 2022; v1 submitted 10 August, 2022; originally announced August 2022.

    Comments: 14 pages, 6 figures, 1 table, To be published in AMTA 2022 conference

    MSC Class: 68T50 ACM Class: I.2.7

  10. arXiv:2202.03760  [pdf, other

    cs.LG cs.CL

    Modeling Structure with Undirected Neural Networks

    Authors: Tsvetomila Mihaylova, Vlad Niculae, André F. T. Martins

    Abstract: Neural networks are powerful function estimators, leading to their status as a paradigm of choice for modeling structured data. However, unlike other structured representations that emphasize the modularity of the problem -- e.g., factor graphs -- neural networks are usually monolithic map**s from inputs to outputs, with a fixed computation order. This limitation prevents them from capturing dif… ▽ More

    Submitted 17 June, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

    Comments: ICML 2022

  11. arXiv:2108.02658  [pdf, other

    cs.LG

    Sparse Communication via Mixed Distributions

    Authors: António Farinhas, Wilker Aziz, Vlad Niculae, André F. T. Martins

    Abstract: Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols. Reconciling these two forms of communication is desirable for generating human-readable interpretations or learning discrete latent variable models, while maintaining end-to-end differentiability. Some existing approaches (such as the Gumbel-Softmax transf… ▽ More

    Submitted 11 February, 2022; v1 submitted 5 August, 2021; originally announced August 2021.

    Comments: Accepted for oral presentation at ICLR 2022

  12. arXiv:2108.01988  [pdf, other

    cs.LG cs.AI stat.ML

    Sparse Continuous Distributions and Fenchel-Young Losses

    Authors: André F. T. Martins, Marcos Treviso, António Farinhas, Pedro M. Q. Aguiar, Mário A. T. Figueiredo, Mathieu Blondel, Vlad Niculae

    Abstract: Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $α$-entmax, and fused… ▽ More

    Submitted 4 August, 2022; v1 submitted 4 August, 2021; originally announced August 2021.

    Comments: JMLR 2022 camera ready version. arXiv admin note: text overlap with arXiv:2006.07214

  13. arXiv:2010.04627  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Binary Decision Trees by Argmin Differentiation

    Authors: Valentina Zantedeschi, Matt J. Kusner, Vlad Niculae

    Abstract: We address the problem of learning binary decision trees that partition data for some downstream task. We propose to learn discrete parameters (i.e., for tree traversals and node pruning) and continuous parameters (i.e., for tree split functions and prediction functions) simultaneously using argmin differentiation. We do so by sparsely relaxing a mixed-integer program for the discrete parameters,… ▽ More

    Submitted 14 June, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

  14. arXiv:2010.02357  [pdf, other

    cs.CL cs.LG

    Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

    Authors: Tsvetomila Mihaylova, Vlad Niculae, André F. T. Martins

    Abstract: Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy t… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: EMNLP 2020

  15. arXiv:2007.01919  [pdf, other

    cs.LG stat.ML

    Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity

    Authors: Gonçalo M. Correia, Vlad Niculae, Wilker Aziz, André F. T. Martins

    Abstract: Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically resorts to sampling-based approximations of the true marginal, requiring noisy gradient estimators (e.g., score function estimator) or continuous relaxations with l… ▽ More

    Submitted 28 December, 2020; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: Accepted for spotlight presentation at NeurIPS 2020

  16. arXiv:2006.07214  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    Sparse and Continuous Attention Mechanisms

    Authors: André F. T. Martins, António Farinhas, Marcos Treviso, Vlad Niculae, Pedro M. Q. Aguiar, Mário A. T. Figueiredo

    Abstract: Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, there has been recent work on sparse alternatives to softmax (e.g. sparsemax and a… ▽ More

    Submitted 29 October, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

    Comments: Accepted for spotlight presentation at NeurIPS 2020

  17. arXiv:2002.05556  [pdf, other

    cs.CL cs.CV

    Sparse and Structured Visual Attention

    Authors: Pedro Henrique Martins, Vlad Niculae, Zita Marinho, André Martins

    Abstract: Visual attention mechanisms are widely used in multimodal tasks, as visual question answering (VQA). One drawback of softmax-based attention mechanisms is that they assign some probability mass to all image regions, regardless of their adjacency structure and of their relevance to the text. In this paper, to better link the image structure with the text, we replace the traditional softmax attentio… ▽ More

    Submitted 8 July, 2021; v1 submitted 13 February, 2020; originally announced February 2020.

  18. arXiv:2001.04437  [pdf, other

    cs.LG cs.CL stat.ML

    LP-SparseMAP: Differentiable Relaxed Optimization for Sparse Structured Prediction

    Authors: Vlad Niculae, André F. T. Martins

    Abstract: Structured prediction requires manipulating a large number of combinatorial structures, e.g., dependency trees or alignments, either as latent or output variables. Recently, the SparseMAP method has been proposed as a differentiable, sparse alternative to maximum a posteriori (MAP) and marginal inference. SparseMAP returns a combination of a small number of structures, a desirable property in some… ▽ More

    Submitted 5 August, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

    Comments: 34 pages, 5 tables, 4 figures. ICML 2020

  19. arXiv:1909.00015  [pdf, other

    cs.CL stat.ML

    Adaptively Sparse Transformers

    Authors: Gonçalo M. Correia, Vlad Niculae, André F. T. Martins

    Abstract: Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word relationships. However, with standard softmax attention, all attention heads are dense, assigning a non-zero weight to all context words. In this work, we introduc… ▽ More

    Submitted 6 September, 2019; v1 submitted 30 August, 2019; originally announced September 2019.

    Comments: Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019, Hong Kong, China

  20. arXiv:1907.10348  [pdf, ps, other

    cs.LG stat.ML

    Notes on Latent Structure Models and SPIGOT

    Authors: André F. T. Martins, Vlad Niculae

    Abstract: These notes aim to shed light on the recently proposed structured projected intermediate gradient optimization technique (SPIGOT, Peng et al., 2018). SPIGOT is a variant of the straight-through estimator (Bengio et al., 2013) which bypasses gradients of the argmax function by back-propagating a surrogate "gradient." We provide a new interpretation to the proposed gradient and put this technique in… ▽ More

    Submitted 24 July, 2019; originally announced July 2019.

    Comments: 7 pages

  21. arXiv:1905.05702  [pdf, other

    cs.CL cs.LG

    Sparse Sequence-to-Sequence Models

    Authors: Ben Peters, Vlad Niculae, André F. T. Martins

    Abstract: Sequence-to-sequence models are a powerful workhorse of NLP. Most variants employ a softmax transformation in both their attention mechanism and output layer, leading to dense alignments and strictly positive output probabilities. This density is wasteful, making models less interpretable and assigning probability mass to many implausible outputs. In this paper, we propose sparse sequence-to-seque… ▽ More

    Submitted 12 June, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

    Comments: ACL 2019 Camera Ready

  22. arXiv:1901.02324  [pdf, other

    stat.ML cs.LG

    Learning with Fenchel-Young Losses

    Authors: Mathieu Blondel, André F. T. Martins, Vlad Niculae

    Abstract: Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction. Understanding the core principles and theoretical properties underpinning these losses is key to choose the right loss for the right problem, as well as to create new losses which combine their st… ▽ More

    Submitted 2 March, 2020; v1 submitted 8 January, 2019; originally announced January 2019.

    Comments: In Journal of Machine Learning Research, volume 21

  23. arXiv:1809.00653  [pdf, other

    cs.CL cs.LG stat.ML

    Towards Dynamic Computation Graphs via Sparse Latent Structure

    Authors: Vlad Niculae, André F. T. Martins, Claire Cardie

    Abstract: Deep NLP models benefit from underlying structures in the data---e.g., parse trees---typically extracted using off-the-shelf parsers. Recent attempts to jointly learn the latent structure encounter a tradeoff: either make factorization assumptions that limit expressiveness, or sacrifice end-to-end differentiability. Using the recently proposed SparseMAP inference, which retrieves a sparse distribu… ▽ More

    Submitted 3 September, 2018; originally announced September 2018.

    Comments: EMNLP 2018; 9 pages (incl. appendix)

    MSC Class: 68T50 ACM Class: I.2.6; I.2.7

  24. arXiv:1805.09717  [pdf, other

    stat.ML cs.LG

    Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms

    Authors: Mathieu Blondel, André F. T. Martins, Vlad Niculae

    Abstract: This paper studies Fenchel-Young losses, a generic way to construct convex loss functions from a regularization function. We analyze their properties in depth, showing that they unify many well-known loss functions and allow to create useful new ones easily. Fenchel-Young losses constructed from a generalized entropy, including the Shannon and Tsallis entropies, induce predictive probability distr… ▽ More

    Submitted 22 February, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

    Comments: In proceedings of AISTATS 2019

  25. arXiv:1802.04223  [pdf, other

    stat.ML cs.CL cs.LG

    SparseMAP: Differentiable Sparse Structured Inference

    Authors: Vlad Niculae, André F. T. Martins, Mathieu Blondel, Claire Cardie

    Abstract: Structured prediction requires searching over a combinatorial number of structures. To tackle it, we introduce SparseMAP: a new method for sparse structured inference, and its natural loss function. SparseMAP automatically selects only a few global structures: it is situated between MAP inference, which picks a single structure, and marginal inference, which assigns probability mass to all structu… ▽ More

    Submitted 20 June, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

    Comments: Published in ICML 2018. 14 pages, including appendix

    MSC Class: 68T50 ACM Class: I.2.6; I.2.6

  26. arXiv:1705.07704  [pdf, other

    stat.ML cs.CL cs.LG

    A Regularized Framework for Sparse and Structured Neural Attention

    Authors: Vlad Niculae, Mathieu Blondel

    Abstract: Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input. We propose in this paper a new framework for sparse and structured attention, building upon a smoothed max operator. We show that the gradient of this operator defines a map** from real values to probabilities, suitable as an attention mechanism. Our framework includes… ▽ More

    Submitted 22 February, 2019; v1 submitted 22 May, 2017; originally announced May 2017.

    Comments: In proceedings of NeurIPS 2017; added errata

  27. arXiv:1705.07603  [pdf, other

    stat.ML cs.LG

    Multi-output Polynomial Networks and Factorization Machines

    Authors: Mathieu Blondel, Vlad Niculae, Takuma Otsuka, Naonori Ueda

    Abstract: Factorization machines and polynomial networks are supervised polynomial models based on an efficient low-rank decomposition. We extend these models to the multi-output setting, i.e., for learning vector-valued functions, with application to multi-class or multi-task problems. We cast this as the problem of learning a 3-way tensor whose slices share a common basis and propose a convex formulation… ▽ More

    Submitted 4 November, 2017; v1 submitted 22 May, 2017; originally announced May 2017.

    Comments: Published at NIPS 2017. 17 pages, including appendix

  28. arXiv:1704.06869  [pdf, other

    cs.CL

    Argument Mining with Structured SVMs and RNNs

    Authors: Vlad Niculae, Joonsuk Park, Claire Cardie

    Abstract: We propose a novel factor graph model for argument mining, designed for settings in which the argumentative relations in a document do not necessarily form a tree structure. (This is the case in over 20% of the web comments dataset we release.) Our model jointly learns elementary unit type classification and argumentative relation prediction. Moreover, our model supports SVM and RNN parametrizatio… ▽ More

    Submitted 22 April, 2017; originally announced April 2017.

    Comments: Accepted for publication at ACL 2017. 11 pages, 5 figures. Code at https://github.com/vene/marseille and data at http://joonsuk.org/

    MSC Class: 68T50 ACM Class: I.2.7

  29. arXiv:1604.07407  [pdf, other

    cs.CL cs.AI cs.SI physics.soc-ph stat.ML

    Conversational Markers of Constructive Discussions

    Authors: Vlad Niculae, Cristian Danescu-Niculescu-Mizil

    Abstract: Group discussions are essential for organizing every aspect of modern life, from faculty meetings to senate debates, from grant review panels to papal conclaves. While costly in terms of time and organization effort, group discussions are commonly seen as a way of reaching better decisions compared to solutions that do not require coordination between the individuals (e.g. voting)---through discus… ▽ More

    Submitted 25 April, 2016; originally announced April 2016.

    Comments: To appear at NAACL-HLT 2016. 11pp, 5 fig. Data and other info available at http://vene.ro/constructive/

  30. arXiv:1602.01103  [pdf, other

    cs.SI cs.CL physics.soc-ph

    Winning Arguments: Interaction Dynamics and Persuasion Strategies in Good-faith Online Discussions

    Authors: Chenhao Tan, Vlad Niculae, Cristian Danescu-Niculescu-Mizil, Lillian Lee

    Abstract: Changing someone's opinion is arguably one of the most important challenges of social interaction. The underlying process proves difficult to study: it is hard to know how someone's opinions are formed and whether and how someone's views shift. Fortunately, ChangeMyView, an active community on Reddit, provides a platform where users present their own opinions and reasoning, invite others to contes… ▽ More

    Submitted 6 February, 2016; v1 submitted 2 February, 2016; originally announced February 2016.

    Comments: 12 pages, 10 figures, to appear in Proceedings of WWW 2016, data and more at https://chenhaot.com/pages/changemyview.html (v2 made a minor correction on submission rules in ChangeMyView.)

  31. arXiv:1506.04744  [pdf, other

    cs.CL cs.AI cs.SI physics.soc-ph stat.ML

    Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game

    Authors: Vlad Niculae, Srijan Kumar, Jordan Boyd-Graber, Cristian Danescu-Niculescu-Mizil

    Abstract: Interpersonal relations are fickle, with close friendships often dissolving into enmity. In this work, we explore linguistic cues that presage such transitions by studying dyadic interactions in an online strategy game where players form alliances and break those alliances through betrayal. We characterize friendships that are unlikely to last and examine temporal patterns that foretell betrayal.… ▽ More

    Submitted 15 June, 2015; originally announced June 2015.

    Comments: To appear at ACL 2015. 10pp, 4 fig. Data and other info available at http://vene.ro/betrayal/

  32. arXiv:1504.01383  [pdf, other

    cs.CL cs.SI physics.soc-ph

    QUOTUS: The Structure of Political Media Coverage as Revealed by Quoting Patterns

    Authors: Vlad Niculae, Caroline Suen, Justine Zhang, Cristian Danescu-Niculescu-Mizil, Jure Leskovec

    Abstract: Given the extremely large pool of events and stories available, media outlets need to focus on a subset of issues and aspects to convey to their audience. Outlets are often accused of exhibiting a systematic bias in this selection process, with different outlets portraying different versions of reality. However, in the absence of objective measures and empirical evidence, the direction and extent… ▽ More

    Submitted 6 April, 2015; originally announced April 2015.

    Comments: To appear in the Proceedings of WWW 2015. 11pp, 10 fig. Interactive visualization, data, and other info available at http://snap.stanford.edu/quotus/

  33. arXiv:1309.0238  [pdf, ps, other

    cs.LG cs.MS

    API design for machine learning software: experiences from the scikit-learn project

    Authors: Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake Vanderplas, Arnaud Joly, Brian Holt, Gaël Varoquaux

    Abstract: Scikit-learn is an increasingly popular machine learning li- brary. Written in Python, it is designed to be simple and efficient, accessible to non-experts, and reusable in various contexts. In this paper, we present and discuss our design choices for the application programming interface (API) of the project. In particular, we describe the simple and elegant interface shared by all learning and p… ▽ More

    Submitted 1 September, 2013; originally announced September 2013.

    Journal ref: European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Databases (2013)