Skip to main content

Showing 1–32 of 32 results for author: Jastrzebski, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.01616  [pdf, other

    q-bio.BM cs.AI cs.LG

    Generative Active Learning for the Search of Small-molecule Protein Binders

    Authors: Maksym Korablyov, Cheng-Hao Liu, Moksh Jain, Almer M. van der Sloot, Eric Jolicoeur, Edward Ruediger, Andrei Cristian Nica, Emmanuel Bengio, Kostiantyn Lapchevskyi, Daniel St-Cyr, Doris Alexandra Schuetz, Victor Ion Butoi, Jarrid Rector-Brooks, Simon Blackburn, Leo Feng, Hadi Nekoei, SaiKrishna Gottipati, Priyesh Vijayan, Prateek Gupta, Ladislav Rampášek, Sasikanth Avancha, Pierre-Luc Bacon, William L. Hamilton, Brooks Paige, Sanchit Misra , et al. (9 additional authors not shown)

    Abstract: Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecu… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  2. arXiv:2310.07313  [pdf, other

    cs.LG stat.ML

    Molecule-Edit Templates for Efficient and Accurate Retrosynthesis Prediction

    Authors: Mikołaj Sacha, Michał Sadowski, Piotr Kozakowski, Ruard van Workum, Stanisław Jastrzębski

    Abstract: Retrosynthesis involves determining a sequence of reactions to synthesize complex molecules from simpler precursors. As this poses a challenge in organic chemistry, machine learning has offered solutions, particularly for predicting possible reaction substrates for a given target molecule. These solutions mainly fall into template-based and template-free categories. The former is efficient but rel… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    ACM Class: I.2.1; I.5.1

  3. arXiv:2210.08645  [pdf, other

    cs.CV cs.LG eess.IV

    An efficient deep neural network to find small objects in large 3D images

    Authors: Jungkyu Park, Jakub Chłędowski, Stanisław Jastrzębski, Jan Witowski, Yanqi Xu, Linda Du, Sushma Gaddam, Eric Kim, Alana Lewin, Ujas Parikh, Anastasia Plaunova, Sardius Chen, Alexandra Millet, James Park, Kristine Pysarenko, Shalin Patel, Julia Goldberg, Melanie Wegener, Linda Moy, Laura Heacock, Beatriu Reig, Krzysztof J. Geras

    Abstract: 3D imaging enables accurate diagnosis by providing spatial information about organ anatomy. However, using 3D images to train AI models is computationally challenging because they consist of 10x or 100x more pixels than their 2D counterparts. To be trained with high-resolution 3D images, convolutional neural networks resort to downsampling them or projecting them to 2D. We propose an effective alt… ▽ More

    Submitted 26 February, 2023; v1 submitted 16 October, 2022; originally announced October 2022.

  4. arXiv:2202.05306  [pdf, other

    cs.LG cs.CV

    Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks

    Authors: Nan Wu, Stanisław Jastrzębski, Kyunghyun Cho, Krzysztof J. Geras

    Abstract: We hypothesize that due to the greedy nature of learning in multi-modal deep neural networks, these models tend to rely on just one modality while under-fitting the other modalities. Such behavior is counter-intuitive and hurts the models' generalization, as we observe empirically. To estimate the model's dependence on each modality, we compute the gain on the accuracy when the model has access to… ▽ More

    Submitted 16 September, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

  5. arXiv:2110.05841  [pdf, other

    cs.LG cs.AI

    Relative Molecule Self-Attention Transformer

    Authors: Łukasz Maziarka, Dawid Majchrowski, Tomasz Danel, Piotr Gaiński, Jacek Tabor, Igor Podolak, Paweł Morkisz, Stanisław Jastrzębski

    Abstract: Self-supervised learning holds promise to revolutionize molecule property prediction - a central task to drug discovery and many more industries - by enabling data efficient learning from scarce experimental data. Despite significant progress, non-pretrained methods can be still competitive in certain settings. We reason that architecture might be a key bottleneck. In particular, enriching the bac… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

  6. arXiv:2012.14193  [pdf, other

    cs.LG stat.ML

    Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization

    Authors: Stanislaw Jastrzebski, Devansh Arpit, Oliver Astrand, Giancarlo Kerg, Huan Wang, Caiming Xiong, Richard Socher, Kyunghyun Cho, Krzysztof Geras

    Abstract: The early phase of training a deep neural network has a dramatic effect on the local curvature of the loss function. For instance, using a small learning rate does not guarantee stable optimization because the optimization trajectory has a tendency to steer towards regions of the loss surface with increasing local curvature. We ask whether this tendency is connected to the widely observed phenomen… ▽ More

    Submitted 11 June, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

    Comments: The last two authors contributed equally. Accepted to the International Conference on Machine Learning 2021

  7. arXiv:2011.14036  [pdf, other

    eess.IV cs.CV cs.CY cs.LG

    Differences between human and machine perception in medical diagnosis

    Authors: Taro Makino, Stanislaw Jastrzebski, Witold Oleszkiewicz, Celin Chacko, Robin Ehrenpreis, Naziya Samreen, Chloe Chhor, Eric Kim, Jiyon Lee, Kristine Pysarenko, Beatriu Reig, Hildegard Toth, Divya Awal, Linda Du, Alice Kim, James Park, Daniel K. Sodickson, Laura Heacock, Linda Moy, Kyunghyun Cho, Krzysztof J. Geras

    Abstract: Deep neural networks (DNNs) show promise in image-based medical diagnosis, but cannot be fully trusted since their performance can be severely degraded by dataset shifts to which human perception remains invariant. If we can better understand the differences between human and machine perception, we can potentially characterize and mitigate this effect. We therefore propose a framework for comparin… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

  8. arXiv:2011.13042  [pdf

    cs.LG

    RetroGNN: Approximating Retrosynthesis by Graph Neural Networks for De Novo Drug Design

    Authors: Cheng-Hao Liu, Maksym Korablyov, Stanisław Jastrzębski, Paweł Włodarczyk-Pruszyński, Yoshua Bengio, Marwin H. S. Segler

    Abstract: De novo molecule generation often results in chemically unfeasible molecules. A natural idea to mitigate this problem is to bias the search process towards more easily synthesizable molecules using a proxy for synthetic accessibility. However, using currently available proxies still results in highly unrealistic compounds. We investigate the feasibility of training deep graph neural networks to ap… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

    Comments: Machine Learning for Molecules Workshop at NeurIPS 2020

  9. arXiv:2011.11486  [pdf, other

    cs.LG

    Latent Adversarial Debiasing: Mitigating Collider Bias in Deep Neural Networks

    Authors: Luke Darlow, Stanisław Jastrzębski, Amos Storkey

    Abstract: Collider bias is a harmful form of sample selection bias that neural networks are ill-equipped to handle. This bias manifests itself when the underlying causal signal is strongly correlated with other confounding signals due to the training data collection procedure. In the situation where the confounding signal is easy-to-learn, deep neural networks will latch onto this and the resulting model wi… ▽ More

    Submitted 19 November, 2020; originally announced November 2020.

    Comments: 10 pages, 4 figures, submitted to AISTATS 2021

  10. arXiv:2008.01774  [pdf, other

    cs.LG cs.CV eess.IV

    An artificial intelligence system for predicting the deterioration of COVID-19 patients in the emergency department

    Authors: Farah E. Shamout, Yiqiu Shen, Nan Wu, Aakash Kaku, Jungkyu Park, Taro Makino, Stanisław Jastrzębski, Jan Witowski, Duo Wang, Ben Zhang, Siddhant Dogra, Meng Cao, Narges Razavian, David Kudlowitz, Lea Azour, William Moore, Yvonne W. Lui, Yindalon Aphinyanaphongs, Carlos Fernandez-Granda, Krzysztof J. Geras

    Abstract: During the coronavirus disease 2019 (COVID-19) pandemic, rapid and accurate triage of patients at the emergency department is critical to inform decision-making. We propose a data-driven approach for automatic prediction of deterioration risk using a deep neural network that learns from chest X-ray images and a gradient boosting model that learns from routine clinical variables. Our AI prognosis s… ▽ More

    Submitted 3 November, 2020; v1 submitted 4 August, 2020; originally announced August 2020.

  11. arXiv:2006.16955  [pdf, other

    q-bio.BM cs.LG stat.ML

    We Should at Least Be Able to Design Molecules That Dock Well

    Authors: Tobiasz Cieplinski, Tomasz Danel, Sabina Podlewska, Stanislaw Jastrzebski

    Abstract: Designing compounds with desired properties is a key element of the drug discovery process. However, measuring progress in the field has been challenging due to the lack of realistic retrospective benchmarks, and the large cost of prospective validation. To close this gap, we propose a benchmark based on docking, a popular computational method for assessing molecule binding to a protein. Concretel… ▽ More

    Submitted 13 June, 2023; v1 submitted 20 June, 2020; originally announced June 2020.

    Comments: Published in Journal of Chemical Information and Modeling

  12. arXiv:2006.15426  [pdf, other

    cs.LG physics.chem-ph stat.ML

    Molecule Edit Graph Attention Network: Modeling Chemical Reactions as Sequences of Graph Edits

    Authors: Mikołaj Sacha, Mikołaj Błaż, Piotr Byrski, Paweł Dąbrowski-Tumański, Mikołaj Chromiński, Rafał Loska, Paweł Włodarczyk-Pruszyński, Stanisław Jastrzębski

    Abstract: The central challenge in automated synthesis planning is to be able to generate and predict outcomes of a diverse set of chemical reactions. In particular, in many cases, the most likely synthesis pathway cannot be applied due to additional constraints, which requires proposing alternative chemical reactions. With this in mind, we present Molecule Edit Graph Attention Network (MEGAN), an end-to-en… ▽ More

    Submitted 25 May, 2021; v1 submitted 27 June, 2020; originally announced June 2020.

  13. arXiv:2003.10041  [pdf, other

    cs.LG cs.CV stat.ML

    Understanding the robustness of deep neural network classifiers for breast cancer screening

    Authors: Witold Oleszkiewicz, Taro Makino, Stanisław Jastrzębski, Tomasz Trzciński, Linda Moy, Kyunghyun Cho, Laura Heacock, Krzysztof J. Geras

    Abstract: Deep neural networks (DNNs) show promise in breast cancer screening, but their robustness to input perturbations must be better understood before they can be clinically implemented. There exists extensive literature on this subject in the context of natural images that can potentially be built upon. However, it cannot be assumed that conclusions about robustness will transfer from natural images t… ▽ More

    Submitted 22 March, 2020; originally announced March 2020.

    Comments: Accepted as a workshop paper at AI4AH, ICLR 2020

  14. arXiv:2002.09572  [pdf, other

    cs.LG stat.ML

    The Break-Even Point on Optimization Trajectories of Deep Neural Networks

    Authors: Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, Krzysztof Geras

    Abstract: The early phase of training of deep neural networks is critical for their final performance. In this work, we study how the hyperparameters of stochastic gradient descent (SGD) used in the early phase of training affect the rest of the optimization trajectory. We argue for the existence of the "break-even" point on this trajectory, beyond which the curvature of the loss surface and noise in the gr… ▽ More

    Submitted 21 February, 2020; originally announced February 2020.

    Comments: Accepted as a spotlight at ICLR 2020. The last two authors contributed equally

  15. arXiv:2002.08264  [pdf, other

    cs.LG physics.comp-ph stat.ML

    Molecule Attention Transformer

    Authors: Łukasz Maziarka, Tomasz Danel, Sławomir Mucha, Krzysztof Rataj, Jacek Tabor, Stanisław Jastrzębski

    Abstract: Designing a single neural network architecture that performs competitively across a range of molecule property prediction tasks remains largely an open challenge, and its solution may unlock a widespread use of deep learning in the drug discovery industry. To move towards this goal, we propose Molecule Attention Transformer (MAT). Our key innovation is to augment the attention mechanism in Transfo… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Journal ref: Graph Representation Learning workshop and Machine Learning and the Physical Sciences workshop at NeurIPS 2019

  16. arXiv:1906.04724  [pdf, other

    cs.LG stat.ML

    Large Scale Structure of Neural Network Loss Landscapes

    Authors: Stanislav Fort, Stanislaw Jastrzebski

    Abstract: There are many surprising and perhaps counter-intuitive properties of optimization of deep neural networks. We propose and experimentally verify a unified phenomenological model of the loss landscape that incorporates many of them. High dimensionality plays a key role in our model. Our core idea is to model the loss landscape as a set of high dimensional \emph{wedges} that together form a large-sc… ▽ More

    Submitted 11 June, 2019; originally announced June 2019.

    Comments: Submitted for review

  17. arXiv:1904.03515  [pdf, other

    cs.LG stat.ML

    Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift

    Authors: Michał Zając, Konrad Zolna, Stanisław Jastrzębski

    Abstract: Recent work has shown that using unlabeled data in semi-supervised learning is not always beneficial and can even hurt generalization, especially when there is a class mismatch between the unlabeled and labeled examples. We investigate this phenomenon for image classification on the CIFAR-10 and the ImageNet datasets, and with many other forms of domain shifts applied (e.g. salt-and-pepper noise).… ▽ More

    Submitted 6 April, 2019; originally announced April 2019.

    Comments: Under review for ECML PKDD 2019

  18. arXiv:1903.08297  [pdf, other

    cs.LG cs.CV stat.ML

    Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening

    Authors: Nan Wu, Jason Phang, Jungkyu Park, Yiqiu Shen, Zhe Huang, Masha Zorin, Stanisław Jastrzębski, Thibault Févry, Joe Katsnelson, Eric Kim, Stacey Wolfson, Ujas Parikh, Sushma Gaddam, Leng Leng Young Lin, Kara Ho, Joshua D. Weinstein, Beatriu Reig, Yiming Gao, Hildegard Toth, Kristine Pysarenko, Alana Lewin, Jiyon Lee, Krystal Airola, Eralda Mema, Stephanie Chung , et al. (7 additional authors not shown)

    Abstract: We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use… ▽ More

    Submitted 19 March, 2019; originally announced March 2019.

    Comments: MIDL 2019 [arXiv:1907.08612]

    Report number: MIDL/2019/ExtendedAbstract/SkxYez76FE

  19. Non-linear ICA based on Cramer-Wold metric

    Authors: Przemysław Spurek, Aleksandra Nowak, Jacek Tabor, Łukasz Maziarka, Stanisław Jastrzębski

    Abstract: Non-linear source separation is a challenging open problem with many applications. We extend a recently proposed Adversarial Non-linear ICA (ANICA) model, and introduce Cramer-Wold ICA (CW-ICA). In contrast to ANICA we use a simple, closed--form optimization target instead of a discriminator--based independence measure. Our results show that CW-ICA achieves comparable results to ANICA, while foreg… ▽ More

    Submitted 1 March, 2019; originally announced March 2019.

    Journal ref: Neural Information Processing. ICONIP 2020

  20. arXiv:1902.00751  [pdf, other

    cs.LG cs.CL stat.ML

    Parameter-Efficient Transfer Learning for NLP

    Authors: Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly

    Abstract: Fine-tuning large pre-trained models is an effective transfer mechanism in NLP. However, in the presence of many downstream tasks, fine-tuning is parameter inefficient: an entire new model is required for every task. As an alternative, we propose transfer with adapter modules. Adapter modules yield a compact and extensible model; they add only a few trainable parameters per task, and new tasks can… ▽ More

    Submitted 13 June, 2019; v1 submitted 2 February, 2019; originally announced February 2019.

  21. arXiv:1901.09491  [pdf, other

    cs.LG cs.NE stat.ML

    Stiffness: A New Perspective on Generalization in Neural Networks

    Authors: Stanislav Fort, Paweł Krzysztof Nowak, Stanislaw Jastrzebski, Srini Narayanan

    Abstract: In this paper we develop a new perspective on generalization of neural networks by proposing and investigating the concept of a neural network stiffness. We measure how stiff a network is by looking at how a small gradient step in the network's parameters on one example affects the loss on another example. Higher stiffness suggests that a network is learning features that generalize. In particular… ▽ More

    Submitted 13 March, 2020; v1 submitted 27 January, 2019; originally announced January 2019.

    Comments: Submitted for review

  22. arXiv:1812.10666  [pdf, other

    cs.LG stat.ML

    Neural Architecture Search Over a Graph Search Space

    Authors: Stanisław Jastrzębski, Quentin de Laroussilhe, Mingxing Tan, Xiao Ma, Neil Houlsby, Andrea Gesmundo

    Abstract: Neural Architecture Search (NAS) enabled the discovery of state-of-the-art architectures in many domains. However, the success of NAS depends on the definition of the search space. Current search spaces are defined as a static sequence of decisions and a set of available actions for each decision. Each possible sequence of actions defines an architecture. We propose a more expressive class of sear… ▽ More

    Submitted 31 July, 2019; v1 submitted 27 December, 2018; originally announced December 2018.

  23. arXiv:1809.08848  [pdf, other

    stat.ML cs.LG

    Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function

    Authors: Wojciech Tarnowski, Piotr Warchoł, Stanisław Jastrzębski, Jacek Tabor, Maciej A. Nowak

    Abstract: We demonstrate that in residual neural networks (ResNets) dynamical isometry is achievable irrespectively of the activation function used. We do that by deriving, with the help of Free Probability and Random Matrix Theories, a universal formula for the spectral density of the input-output Jacobian at initialization, in the large network width and depth limit. The resulting singular value spectrum… ▽ More

    Submitted 4 March, 2019; v1 submitted 24 September, 2018; originally announced September 2018.

    Journal ref: AISTATS 2019

  24. arXiv:1807.05031  [pdf, other

    stat.ML cs.LG

    On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length

    Authors: Stanisław Jastrzębski, Zachary Kenton, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

    Abstract: Stochastic Gradient Descent (SGD) based training of neural networks with a large learning rate or a small batch-size typically ends in well-generalizing, flat regions of the weight space, as indicated by small eigenvalues of the Hessian of the training loss. However, the curvature along the SGD trajectory is poorly understood. An empirical investigation shows that initially SGD visits increasingly… ▽ More

    Submitted 23 December, 2019; v1 submitted 13 July, 2018; originally announced July 2018.

    Journal ref: International Conference on Learning Representations (ICLR) 2019

  25. arXiv:1805.09235  [pdf, other

    cs.LG cs.AI stat.ML

    Cramer-Wold AutoEncoder

    Authors: Szymon Knop, Jacek Tabor, Przemysław Spurek, Igor Podolak, Marcin Mazur, Stanisław Jastrzębski

    Abstract: We propose a new generative model, Cramer-Wold Autoencoder (CWAE). Following WAE, we directly encourage normality of the latent space. Our paper uses also the recent idea from Sliced WAE (SWAE) model, which uses one-dimensional projections as a method of verifying closeness of two distributions. The crucial new ingredient is the introduction of a new (Cramer-Wold) metric in the space of densities,… ▽ More

    Submitted 2 July, 2019; v1 submitted 23 May, 2018; originally announced May 2018.

    Journal ref: Journal of Machine Learning Research, 21, 164, 1-28 2020

  26. arXiv:1804.09259  [pdf, other

    cs.CL

    Commonsense mining as knowledge base completion? A study on the impact of novelty

    Authors: Stanisław Jastrzębski, Dzmitry Bahdanau, Seyedarian Hosseini, Michael Noukhovitch, Yoshua Bengio, Jackie Chi Kit Cheung

    Abstract: Commonsense knowledge bases such as ConceptNet represent knowledge in the form of relational triples. Inspired by the recent work by Li et al., we analyse if knowledge base completion models can be used to mine commonsense knowledge from raw text. We propose novelty of predicted triples with respect to the training set as an important factor in interpreting results. We critically analyse the diffi… ▽ More

    Submitted 24 April, 2018; originally announced April 2018.

    Comments: Published in Workshop on New Forms of Generalization in Deep Learning and Natural Language Processing (NAACL 2018)

  27. arXiv:1711.04623  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Three Factors Influencing Minima in SGD

    Authors: Stanisław Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey

    Abstract: We investigate the dynamical and convergent properties of stochastic gradient descent (SGD) applied to Deep Neural Networks (DNNs). Characterizing the relation between learning rate, batch size and the properties of the final minima, such as width or generalization, remains an open question. In order to tackle this problem we investigate the previously proposed approximation of SGD by a stochastic… ▽ More

    Submitted 13 September, 2018; v1 submitted 13 November, 2017; originally announced November 2017.

    Comments: First two authors contributed equally. Short version accepted into ICLR workshop. Accepted to Artificial Neural Networks and Machine Learning, ICANN 2018

  28. arXiv:1710.04773  [pdf, other

    cs.CV

    Residual Connections Encourage Iterative Inference

    Authors: Stanisław Jastrzębski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, Yoshua Bengio

    Abstract: Residual networks (Resnets) have become a prominent architecture in deep learning. However, a comprehensive understanding of Resnets is still a topic of ongoing research. A recent view argues that Resnets perform iterative refinement of features. We attempt to further expose properties of this aspect. To this end, we study Resnets both analytically and empirically. We formalize the notion of ite… ▽ More

    Submitted 8 March, 2018; v1 submitted 12 October, 2017; originally announced October 2017.

    Comments: First two authors contributed equally. Published in ICLR 2018

  29. arXiv:1706.05394  [pdf, other

    stat.ML cs.LG

    A Closer Look at Memorization in Deep Networks

    Authors: Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S. Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, Simon Lacoste-Julien

    Abstract: We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. r… ▽ More

    Submitted 1 July, 2017; v1 submitted 16 June, 2017; originally announced June 2017.

    Comments: Appears in Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Devansh Arpit, Stanisław Jastrzębski, Nicolas Ballas, and David Krueger contributed equally to this work

  30. arXiv:1706.00286  [pdf, other

    cs.LG cs.CL

    Learning to Compute Word Embeddings On the Fly

    Authors: Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio

    Abstract: Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare. Learning representations for words in the "long tail" of this distribution requires enormous amounts of data. Representations of rare words trained directly on end tasks are usually poor, requiring us to pre-train embeddings on external data, or treat all rare words as out-of-vocabulary words… ▽ More

    Submitted 7 March, 2018; v1 submitted 1 June, 2017; originally announced June 2017.

  31. arXiv:1702.02170  [pdf, other

    cs.CL

    How to evaluate word embeddings? On importance of data efficiency and simple supervised tasks

    Authors: Stanisław Jastrzebski, Damian Leśniak, Wojciech Marian Czarnecki

    Abstract: Maybe the single most important goal of representation learning is making subsequent learning faster. Surprisingly, this fact is not well reflected in the way embeddings are evaluated. In addition, recent practice in word embeddings points towards importance of learning specialized representations. We argue that focus of word representation evaluation should reflect those trends and shift towards… ▽ More

    Submitted 7 February, 2017; originally announced February 2017.

  32. arXiv:1602.06289  [pdf, other

    cs.CL

    Learning to SMILE(S)

    Authors: Stanisław Jastrzębski, Damian Leśniak, Wojciech Marian Czarnecki

    Abstract: This paper shows how one can directly apply natural language processing (NLP) methods to classification problems in cheminformatics. Connection between these seemingly separate fields is shown by considering standard textual representation of compound, SMILES. The problem of activity prediction against a target protein is considered, which is a crucial part of computer aided drug design process. C… ▽ More

    Submitted 8 March, 2018; v1 submitted 19 February, 2016; originally announced February 2016.

    Comments: Accepted as a workshop contribution to ICLR 2016