Skip to main content

Showing 1–20 of 20 results for author: Irsoy, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.16958  [pdf, other

    cs.CL cs.AI cs.LG

    MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies

    Authors: Shiyue Zhang, Shijie Wu, Ozan Irsoy, Steven Lu, Mohit Bansal, Mark Dredze, David Rosenberg

    Abstract: Autoregressive language models are trained by minimizing the cross-entropy of the model distribution Q relative to the data distribution P -- that is, minimizing the forward cross-entropy, which is equivalent to maximum likelihood estimation (MLE). We have observed that models trained in this way may "over-generalize", in the sense that they produce non-human-like text. Moreover, we believe that r… ▽ More

    Submitted 26 May, 2024; v1 submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL 2023 (22 pages)

  2. arXiv:2303.17564  [pdf, other

    cs.LG cs.AI cs.CL q-fin.GN

    BloombergGPT: A Large Language Model for Finance

    Authors: Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, Gideon Mann

    Abstract: The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion pa… ▽ More

    Submitted 21 December, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: Updated to include Training Chronicles (Appendix C)

  3. arXiv:2302.05454  [pdf, other

    cs.CL cs.IR

    Distillation of encoder-decoder transformers for sequence labelling

    Authors: Marco Farina, Duccio Pappadopulo, Anant Gupta, Leslie Huang, Ozan İrsoy, Thamar Solorio

    Abstract: Driven by encouraging results on a wide range of tasks, the field of NLP is experiencing an accelerated race to develop bigger language models. This race for bigger models has also underscored the need to continue the pursuit of practical distillation approaches that can leverage the knowledge acquired by these big models in a compute-efficient manner. Having this goal in mind, we build on recent… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

    Comments: Accepted to Findings of EACL 2023

  4. arXiv:2301.10371  [pdf, other

    cs.CL cs.AI cs.LG

    Weakly Supervised Headline Dependency Parsing

    Authors: Adrian Benton, Tianze Shi, Ozan İrsoy, Igor Malioutov

    Abstract: English news headlines form a register with unique syntactic properties that have been documented in linguistics literature since the 1930s. However, headlines have received surprisingly little attention from the NLP syntactic parsing community. We aim to bridge this gap by providing the first news headline corpus of Universal Dependencies annotated syntactic dependency trees, which enables us to… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: Findings of EMNLP 2022

    ACM Class: I.2.7

    Journal ref: In Proceedings of Findings of EMNLP 2022

  5. arXiv:2210.00319  [pdf, other

    cs.LG

    PathFinder: Discovering Decision Pathways in Deep Neural Networks

    Authors: Ozan İrsoy, Ethem Alpaydın

    Abstract: Explainability is becoming an increasingly important topic for deep neural networks. Though the operation in convolutional layers is easier to understand, processing becomes opaque in fully-connected layers. The basic idea in our work is that each instance, as it flows through the layers, causes a different activation pattern in the hidden layers and in our Paths methodology, we cluster these acti… ▽ More

    Submitted 1 October, 2022; originally announced October 2022.

  6. arXiv:2106.09024  [pdf, other

    cs.CL cs.LG

    Disentangling Online Chats with DAG-Structured LSTMs

    Authors: Duccio Pappadopulo, Lisa Bauer, Marco Farina, Ozan İrsoy, Mohit Bansal

    Abstract: Many modern messaging systems allow fast and synchronous textual communication among many users. The resulting sequence of messages hides a more complicated structure in which independent sub-conversations are interwoven with one another. This poses a challenge for any task aiming to understand the content of the chat logs or gather information from them. The ability to disentangle these conversat… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: 8 pages, 1 figure. Accepted at *SEM 2021

  7. arXiv:2104.13936  [pdf, other

    cs.CL cs.AI cs.LG

    Diversity-Aware Batch Active Learning for Dependency Parsing

    Authors: Tianze Shi, Adrian Benton, Igor Malioutov, Ozan İrsoy

    Abstract: While the predictive performance of modern statistical dependency parsers relies heavily on the availability of expensive expert-annotated treebank data, not all annotations contribute equally to the training of the parsers. In this paper, we attempt to reduce the number of labeled examples needed to train a strong dependency parser using batch active learning (AL). In particular, we investigate w… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

    ACM Class: I.2.7

    Journal ref: In Proceedings of NAACL 2021

  8. arXiv:2104.13933  [pdf, other

    cs.CL cs.AI

    Learning Syntax from Naturally-Occurring Bracketings

    Authors: Tianze Shi, Ozan İrsoy, Igor Malioutov, Lillian Lee

    Abstract: Naturally-occurring bracketings, such as answer fragments to natural language questions and hyperlinks on webpages, can reflect human syntactic intuition regarding phrasal boundaries. Their availability and approximate correspondence to syntax make them appealing as distant information sources to incorporate into unsupervised constituency parsing. But they are noisy and incomplete; to address this… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

    ACM Class: I.2.7

    Journal ref: In Proceedings of NAACL 2021

  9. arXiv:2012.15332  [pdf, other

    cs.CL stat.ML

    Corrected CBOW Performs as well as Skip-gram

    Authors: Ozan İrsoy, Adrian Benton, Karl Stratos

    Abstract: Mikolov et al. (2013a) observed that continuous bag-of-words (CBOW) word embeddings tend to underperform Skip-gram (SG) embeddings, and this finding has been reported in subsequent works. We find that these observations are driven not by fundamental differences in their training objectives, but more likely on faulty negative sampling CBOW implementations in popular libraries such as the official i… ▽ More

    Submitted 9 November, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

    Comments: Presented at WINR at EMNLP 2021, added discussion about FastText, more discussion about findings, additional results on C4 data, wording changes

  10. arXiv:2010.11170  [pdf, other

    cs.CL cs.AI

    Semantic Role Labeling as Syntactic Dependency Parsing

    Authors: Tianze Shi, Igor Malioutov, Ozan İrsoy

    Abstract: We reduce the task of (span-based) PropBank-style semantic role labeling (SRL) to syntactic dependency parsing. Our approach is motivated by our empirical analysis that shows three common syntactic patterns account for over 98% of the SRL annotations for both English and Chinese data. Based on this observation, we present a conversion scheme that packs SRL annotations into dependency tree represen… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

    Comments: Appeared in EMNLP 2020

  11. arXiv:1908.01821  [pdf, other

    cs.CL cs.LG stat.ML

    Dialogue Act Classification in Group Chats with DAG-LSTMs

    Authors: Ozan İrsoy, Rakesh Gosangi, Haimin Zhang, Mu-Hsin Wei, Peter Lund, Duccio Pappadopulo, Brendan Fahy, Neophytos Nephytou, Camilo Ortiz

    Abstract: Dialogue act (DA) classification has been studied for the past two decades and has several key applications such as workflow automation and conversation analytics. Researchers have used, to address this problem, various traditional machine learning models, and more recently deep neural network models such as hierarchical convolutional neural networks (CNNs) and long short-term memory (LSTM) networ… ▽ More

    Submitted 2 August, 2019; originally announced August 2019.

    Comments: Appeared in SIGIR 2019 Workshop on Conversational Interaction Systems

  12. arXiv:1905.00448  [pdf, other

    cs.LG stat.ML

    On Expected Accuracy

    Authors: Ozan İrsoy

    Abstract: We empirically investigate the (negative) expected accuracy as an alternative loss function to cross entropy (negative log likelihood) for classification tasks. Coupled with softmax activation, it has small derivatives over most of its domain, and is therefore hard to optimize. A modified, leaky version is evaluated on a variety of classification tasks, including digit recognition, image classific… ▽ More

    Submitted 1 May, 2019; originally announced May 2019.

  13. arXiv:1812.10158  [pdf, other

    cs.LG stat.ML

    Dropout Regularization in Hierarchical Mixture of Experts

    Authors: Ozan İrsoy, Ethem Alpaydın

    Abstract: Dropout is a very effective method in preventing overfitting and has become the go-to regularizer for multi-layer neural networks in recent years. Hierarchical mixture of experts is a hierarchically gated model that defines a soft decision tree where leaves correspond to experts and decision nodes correspond to gating models that softly choose between its children, and as such, the model defines a… ▽ More

    Submitted 25 December, 2018; originally announced December 2018.

  14. arXiv:1804.02491  [pdf, other

    cs.LG cs.NE stat.ML

    Continuously Constructive Deep Neural Networks

    Authors: Ozan İrsoy, Ethem Alpaydın

    Abstract: Traditionally, deep learning algorithms update the network weights whereas the network architecture is chosen manually, using a process of trial and error. In this work, we propose two novel approaches that automatically update the network structure while also learning its weights. The novelty of our approach lies in our parameterization where the depth, or additional complexity, is encapsulated c… ▽ More

    Submitted 6 April, 2018; originally announced April 2018.

    Comments: 10 pages

  15. arXiv:1802.10229  [pdf, other

    cs.CL

    Collective Entity Disambiguation with Structured Gradient Tree Boosting

    Authors: Yi Yang, Ozan Irsoy, Kazi Shefaet Rahman

    Abstract: We present a gradient-tree-boosting-based structured learning model for jointly disambiguating named entities in a document. Gradient tree boosting is a widely used machine learning algorithm that underlies many top-performing natural language processing systems. Surprisingly, most works limit the use of gradient tree boosting as a tool for regular classification or regression problems, despite th… ▽ More

    Submitted 23 April, 2018; v1 submitted 27 February, 2018; originally announced February 2018.

    Comments: Accepted by NAACL 2018

  16. arXiv:1506.07285  [pdf, other

    cs.CL cs.LG cs.NE

    Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

    Authors: Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, Richard Socher

    Abstract: Most tasks in natural language processing can be cast into question answering (QA) problems over language input. We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers. Questions trigger an iterative attention process which allows the model to condition its attention on the… ▽ More

    Submitted 5 March, 2016; v1 submitted 24 June, 2015; originally announced June 2015.

  17. arXiv:1412.6577  [pdf, other

    cs.LG cs.CL stat.ML

    Modeling Compositionality with Multiplicative Recurrent Neural Networks

    Authors: Ozan İrsoy, Claire Cardie

    Abstract: We present the multiplicative recurrent neural network as a general model for compositional meaning in language, and evaluate it on the task of fine-grained sentiment analysis. We establish a connection to the previously investigated matrix-space models for compositionality, and show they are special cases of the multiplicative recurrent net. Our experiments show that these models perform comparab… ▽ More

    Submitted 2 May, 2015; v1 submitted 19 December, 2014; originally announced December 2014.

    Comments: 10 pages, 2 figures, published at ICLR 2015

  18. arXiv:1412.6388  [pdf, other

    cs.LG stat.ML

    Distributed Decision Trees

    Authors: Ozan İrsoy, Ethem Alpaydın

    Abstract: Recently proposed budding tree is a decision tree algorithm in which every node is part internal node and part leaf. This allows representing every decision tree in a continuous parameter space, and therefore a budding tree can be jointly trained with backpropagation, like a neural network. Even though this continuity allows it to be used in hierarchical representation learning, the learned repres… ▽ More

    Submitted 19 December, 2014; originally announced December 2014.

  19. arXiv:1409.7461  [pdf, other

    cs.LG stat.ML

    Autoencoder Trees

    Authors: Ozan İrsoy, Ethem Alpaydın

    Abstract: We discuss an autoencoder model in which the encoding and decoding functions are implemented by decision trees. We use the soft decision tree where internal nodes realize soft multivariate splits given by a gating function and the overall output is the average of all leaves weighted by the gating values on their path. The encoder tree takes the input and generates a lower dimensional representatio… ▽ More

    Submitted 25 September, 2014; originally announced September 2014.

    Comments: 9 pages

  20. arXiv:1312.0493  [pdf, other

    cs.LG cs.CL stat.ML

    Bidirectional Recursive Neural Networks for Token-Level Labeling with Structure

    Authors: Ozan İrsoy, Claire Cardie

    Abstract: Recently, deep architectures, such as recurrent and recursive neural networks have been successfully applied to various natural language processing tasks. Inspired by bidirectional recurrent neural networks which use representations that summarize the past and future around an instance, we propose a novel architecture that aims to capture the structural information around an input, and use it to l… ▽ More

    Submitted 2 December, 2013; originally announced December 2013.

    Comments: 9 pages, 5 figures, NIPS Deep Learning Workshop 2013