Skip to main content

Showing 1–17 of 17 results for author: Kádár, Á

.
  1. arXiv:2310.15369  [pdf

    physics.app-ph cond-mat.mtrl-sci

    Layer-by-Layer Assembled Nanowire Networks Enable Graph Theoretical Design of Multifunctional Coatings

    Authors: Wenbing Wu, Alain Kadar, Sang Hyun Lee, Bum Chul Park, Jeffery E. Raymond, Thomas K. Tsotsis, Carlos E. S. Cesnik, Sharon C. Glotzer, Valerie Goss, Nicholas A. Kotov

    Abstract: Multifunctional coatings are central for information, biomedical, transportation and energy technologies. These coatings must possess hard-to-attain properties and be scalable, adaptable, and sustainable, which makes layer-by-layer assembly (LBL) of nanomaterials uniquely suitable for these technologies. What remains largely unexplored is that LBL enables computational methodologies for structural… ▽ More

    Submitted 30 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

  2. arXiv:2212.09255  [pdf, other

    cs.CL

    Multi hash embeddings in spaCy

    Authors: Lester James Miranda, Ákos Kádár, Adriane Boyd, Sofie Van Landeghem, Anders Søgaard, Matthew Honnibal

    Abstract: The distributed representation of symbols is one of the key technologies in machine learning systems today, playing a pivotal role in modern natural language processing. Traditional word embeddings associate a separate vector with each word. While this approach is simple and leads to good performance, it requires a lot of memory for representing a large vocabulary. To reduce the memory footprint,… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    ACM Class: I.2.7

  3. arXiv:2201.06384  [pdf, other

    cs.CL cs.CY cs.SI

    Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations

    Authors: Chris Emmery, Ákos Kádár, Grzegorz Chrupała, Walter Daelemans

    Abstract: A limited amount of studies investigates the role of model-agnostic adversarial behavior in toxic content classification. As toxicity classifiers predominantly rely on lexical cues, (deliberately) creative and evolving language-use can be detrimental to the utility of current corpora and state-of-the-art models when they are deployed for content moderation. The less training data is available, the… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: Submitted to LREC 2022

  4. arXiv:2106.04559  [pdf, other

    cs.CL

    Turing: an Accurate and Interpretable Multi-Hypothesis Cross-Domain Natural Language Database Interface

    Authors: Peng Xu, Wenjie Zi, Hamidreza Shahidi, Ákos Kádár, Keyi Tang, Wei Yang, Jawad Ateeq, Harsh Barot, Meidan Alon, Yanshuai Cao

    Abstract: A natural language database interface (NLDB) can democratize data-driven insights for non-technical users. However, existing Text-to-SQL semantic parsers cannot achieve high enough accuracy in the cross-database setting to allow good usability in practice. This work presents Turing, a NLDB system toward bridging this gap. The cross-domain semantic parser of Turing with our novel value prediction m… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: ACL 2021 demonstration track

  5. arXiv:2102.10864  [pdf, other

    cs.CL

    Subword Pooling Makes a Difference

    Authors: Judit Ács, Ákos Kádár, András Kornai

    Abstract: Contextual word-representations became a standard in modern natural language processing systems. These models use subword tokenization to handle large vocabularies and unknown words. Word-level usage of such systems requires a way of pooling multiple subwords that correspond to a single word. In this paper we investigate how the choice of subword pooling affects the downstream performance on three… ▽ More

    Submitted 29 March, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

    Journal ref: EACL2021

  6. arXiv:2101.11310  [pdf, other

    cs.CL cs.CY

    Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling

    Authors: Chris Emmery, Ákos Kádár, Grzegorz Chrupała

    Abstract: Written language contains stylistic cues that can be exploited to automatically infer a variety of potentially sensitive author information. Adversarial stylometry intends to attack such models by rewriting an author's text. Our research proposes several components to facilitate deployment of these adversarial attacks in the wild, where neither data nor target models are accessible. We introduce a… ▽ More

    Submitted 27 January, 2021; originally announced January 2021.

    Comments: Accepted to EACL 2021

  7. arXiv:1911.03678  [pdf, other

    cs.CL cs.CV

    Bootstrap** Disjoint Datasets for Multilingual Multimodal Representation Learning

    Authors: Ákos Kádár, Grzegorz Chrupała, Afra Alishahi, Desmond Elliott

    Abstract: Recent work has highlighted the advantage of jointly learning grounded sentence representations from multiple languages. However, the data used in these studies has been limited to an aligned scenario: the same images annotated with sentences in multiple languages. We focus on the more realistic disjoint scenario in which there is no overlap between the images in multilingual image--caption datase… ▽ More

    Submitted 9 November, 2019; originally announced November 2019.

    Comments: 10 pages

  8. arXiv:1903.06939  [pdf, other

    cs.CL

    Improving Lemmatization of Non-Standard Languages with Joint Learning

    Authors: Enrique Manjavacas, Ákos Kádár, Mike Kestemont

    Abstract: Lemmatization of standard languages is concerned with (i) abstracting over morphological differences and (ii) resolving token-lemma ambiguities of inflected words in order to map them to a dictionary headword. In the present paper we aim to improve lemmatization performance on a set of non-standard historical languages in which the difficulty is increased by an additional aspect (iii): spelling va… ▽ More

    Submitted 16 March, 2019; originally announced March 2019.

    Journal ref: NAACL-HLT 2019

  9. arXiv:1809.07615  [pdf, other

    cs.CL

    Lessons learned in multilingual grounded language learning

    Authors: Ákos Kádár, Desmond Elliott, Marc-Alexandre Côté, Grzegorz Chrupała, Afra Alishahi

    Abstract: Recent work has shown how to learn better visual-semantic embeddings by leveraging image descriptions in more than one language. Here, we investigate in detail which conditions affect the performance of this type of grounded language learning model. We show that multilingual training improves over bilingual training, and that low-resource languages benefit from training with higher-resource langua… ▽ More

    Submitted 20 September, 2018; originally announced September 2018.

    Comments: CoNLL 2018

  10. arXiv:1807.03595  [pdf, other

    cs.CL

    Revisiting the Hierarchical Multiscale LSTM

    Authors: Ákos Kádár, Marc-Alexandre Côté, Grzegorz Chrupała, Afra Alishahi

    Abstract: Hierarchical Multiscale LSTM (Chung et al., 2016a) is a state-of-the-art language model that learns interpretable structure from character-level input. Such models can provide fertile ground for (cognitive) computational linguistics studies. However, the high complexity of the architecture, training procedure and implementations might hinder its applicability. We provide a detailed reproduction an… ▽ More

    Submitted 10 July, 2018; originally announced July 2018.

    Comments: To appear in COLING 2018 (reproduction track)

  11. arXiv:1806.11532  [pdf, other

    cs.LG cs.CL stat.ML

    TextWorld: A Learning Environment for Text-based Games

    Authors: Marc-Alexandre Côté, Ákos Kádár, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Ruo Yu Tao, Matthew Hausknecht, Layla El Asri, Mahmoud Adada, Wendy Tay, Adam Trischler

    Abstract: We introduce TextWorld, a sandbox learning environment for the training and evaluation of RL agents on text-based games. TextWorld is a Python library that handles interactive play-through of text games, as well as backend functions like state tracking and reward assignment. It comes with a curated list of games whose features and challenges we have analyzed. More significantly, it enables users t… ▽ More

    Submitted 8 November, 2019; v1 submitted 29 June, 2018; originally announced June 2018.

    Comments: Presented at the Computer Games Workshop at IJCAI 2018, Stockholm

  12. arXiv:1805.08093  [pdf, ps, other

    cs.CL

    NeuralREG: An end-to-end approach to referring expression generation

    Authors: Thiago Castro Ferreira, Diego Moussallem, Ákos Kádár, Sander Wubben, Emiel Krahmer

    Abstract: Traditionally, Referring Expression Generation (REG) models first decide on the form and then on the content of references to discourse entities in text, typically relying on features such as salience and grammatical function. In this paper, we present a new approach (NeuralREG), relying on deep neural networks, which makes decisions about form and content in one go without explicit feature extrac… ▽ More

    Submitted 21 May, 2018; originally announced May 2018.

    Comments: Accepted for presentation at ACL 2018

  13. arXiv:1803.08869  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    On the difficulty of a distributional semantics of spoken language

    Authors: Grzegorz Chrupała, Lieke Gelderloos, Ákos Kádár, Afra Alishahi

    Abstract: In the domain of unsupervised learning most work on speech has focused on discovering low-level constructs such as phoneme inventories or word-like units. In contrast, for written language, where there is a large body of work on unsupervised induction of semantic representations of words, whole sentences and longer texts. In this study we examine the challenges of adapting these approaches from wr… ▽ More

    Submitted 26 October, 2018; v1 submitted 23 March, 2018; originally announced March 2018.

    Comments: Proceedings of the Society for Computation in Linguistics 2019

  14. arXiv:1710.07300  [pdf, other

    cs.CV

    FigureQA: An Annotated Figure Dataset for Visual Reasoning

    Authors: Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, Akos Kadar, Adam Trischler, Yoshua Bengio

    Abstract: We introduce FigureQA, a visual reasoning corpus of over one million question-answer pairs grounded in over 100,000 images. The images are synthetic, scientific-style figures from five classes: line plots, dot-line plots, vertical and horizontal bar graphs, and pie charts. We formulate our reasoning task by generating questions from 15 templates; questions concern various relationships between plo… ▽ More

    Submitted 22 February, 2018; v1 submitted 19 October, 2017; originally announced October 2017.

    Comments: workshop paper at ICLR 2018

  15. arXiv:1705.04350  [pdf, other

    cs.CL cs.CV

    Imagination improves Multimodal Translation

    Authors: Desmond Elliott, Ákos Kádár

    Abstract: We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. In a multitask learning framework, translations are learned in an attention-based encoder-decoder, and grounded representations are learned through image representation prediction. Our approach improves translation performance compared to the state of the art on the Multi30… ▽ More

    Submitted 7 July, 2017; v1 submitted 11 May, 2017; originally announced May 2017.

    Comments: Clarified main contributions, minor correction to Equation 8, additional comparisons in Table 2, added more related work

  16. arXiv:1602.08952  [pdf, other

    cs.CL cs.LG

    Representation of linguistic form and function in recurrent neural networks

    Authors: Ákos Kádár, Grzegorz Chrupała, Afra Alishahi

    Abstract: We present novel methods for analyzing the activation patterns of RNNs from a linguistic point of view and explore the types of linguistic structure they learn. As a case study, we use a multi-task gated recurrent network architecture consisting of two parallel pathways with shared word embeddings trained on predicting the representations of the visual scene corresponding to an input sentence, and… ▽ More

    Submitted 8 June, 2016; v1 submitted 29 February, 2016; originally announced February 2016.

  17. arXiv:1506.03694  [pdf, other

    cs.CL

    Learning language through pictures

    Authors: Grzegorz Chrupała, Ákos Kádár, Afra Alishahi

    Abstract: We propose Imaginet, a model of learning visually grounded representations of language from coupled textual and visual input. The model consists of two Gated Recurrent Unit networks with shared word embeddings, and uses a multi-task objective by receiving a textual description of a scene and trying to concurrently predict its visual representation and the next word in the sentence. Mimicking an im… ▽ More

    Submitted 19 June, 2015; v1 submitted 11 June, 2015; originally announced June 2015.

    Comments: To appear at ACL 2015