Skip to main content

Showing 1–21 of 21 results for author: Riley, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.18934  [pdf

    cs.CV cs.HC

    The Visual Experience Dataset: Over 200 Recorded Hours of Integrated Eye Movement, Odometry, and Egocentric Video

    Authors: Michelle R. Greene, Benjamin J. Balas, Mark D. Lescroart, Paul R. MacNeilage, Jennifer A. Hart, Kamran Binaee, Peter A. Hausamann, Ronald Mezile, Bharath Shankar, Christian B. Sinnott, Kaylie Capurro, Savannah Halow, Hunter Howe, Mariam Josyula, Annie Li, Abraham Mieses, Amina Mohamed, Ilya Nudnou, Ezra Parkhill, Peter Riley, Brett Schmidt, Matthew W. Shinkle, Wentao Si, Brian Szekely, Joaquin M. Torres , et al. (1 additional authors not shown)

    Abstract: We introduce the Visual Experience Dataset (VEDB), a compilation of over 240 hours of egocentric video combined with gaze- and head-tracking data that offers an unprecedented view of the visual world as experienced by human observers. The dataset consists of 717 sessions, recorded by 58 observers ranging from 6-49 years old. This paper outlines the data collection, processing, and labeling protoco… ▽ More

    Submitted 15 February, 2024; originally announced April 2024.

    Comments: 36 pages, 1 table, 7 figures

  2. arXiv:2404.01474  [pdf, other

    cs.CL

    Finding Replicable Human Evaluations via Stable Ranking Probability

    Authors: Parker Riley, Daniel Deutsch, George Foster, Viresh Ratnakar, Ali Dabirmoghaddam, Markus Freitag

    Abstract: Reliable human evaluation is critical to the development of successful natural language generation models, but achieving it is notoriously difficult. Stability is a crucial requirement when ranking systems by quality: consistent ranking of systems across repeated evaluations is not just desirable, but essential. Without it, there is no reliable foundation for hill-climbing or product launch decisi… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: To appear at NAACL 2024

  3. arXiv:2308.07286  [pdf, other

    cs.CL cs.LG

    The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation

    Authors: Patrick Fernandes, Daniel Deutsch, Mara Finkelstein, Parker Riley, André F. T. Martins, Graham Neubig, Ankush Garg, Jonathan H. Clark, Markus Freitag, Orhan Firat

    Abstract: Automatic evaluation of machine translation (MT) is a critical tool driving the rapid iterative development of MT systems. While considerable progress has been made on estimating a single scalar quality score, current metrics lack the informativeness of more detailed schemes that annotate individual errors, such as Multidimensional Quality Metrics (MQM). In this paper, we help fill this gap by pro… ▽ More

    Submitted 14 August, 2023; originally announced August 2023.

    Comments: 19 pages

  4. XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

    Authors: Sebastian Ruder, Jonathan H. Clark, Alexander Gutkin, Mihir Kale, Min Ma, Massimo Nicosia, Shruti Rijhwani, Parker Riley, Jean-Michel A. Sarr, Xinyi Wang, John Wieting, Nitish Gupta, Anna Katanova, Christo Kirov, Dana L. Dickinson, Brian Roark, Bidisha Samanta, Connie Tao, David I. Adelani, Vera Axelrod, Isaac Caswell, Colin Cherry, Dan Garrette, Reeve Ingle, Melvin Johnson , et al. (2 additional authors not shown)

    Abstract: Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- languages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot;… ▽ More

    Submitted 24 May, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

  5. arXiv:2305.10403  [pdf, other

    cs.CL cs.AI

    PaLM 2 Technical Report

    Authors: Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yan** Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yu**g Zhang, Gustavo Hernandez Abrego , et al. (103 additional authors not shown)

    Abstract: We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on… ▽ More

    Submitted 13 September, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  6. arXiv:2212.13325  [pdf

    astro-ph.IM astro-ph.SR cs.AI cs.LG

    Heliophysics Discovery Tools for the 21st Century: Data Science and Machine Learning Structures and Recommendations for 2020-2050

    Authors: R. M. McGranaghan, B. Thompson, E. Camporeale, J. Bortnik, M. Bobra, G. Lapenta, S. Wing, B. Poduval, S. Lotz, S. Murray, M. Kirk, T. Y. Chen, H. M. Bain, P. Riley, B. Tremblay, M. Cheung, V. Delouille

    Abstract: Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires… ▽ More

    Submitted 26 December, 2022; originally announced December 2022.

    Comments: 4 pages; Heliophysics 2050 White Paper

  7. arXiv:2210.00193  [pdf, other

    cs.CL

    FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation

    Authors: Parker Riley, Timothy Dozat, Jan A. Botha, Xavier Garcia, Dan Garrette, Jason Riesa, Orhan Firat, Noah Constant

    Abstract: We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware Machine Translation, a type of style-targeted translation. The dataset consists of professional translations from English into two regional variants each of Portuguese and Mandarin Chinese. Source documents are selected to enable detailed analysis of phenomena of interest, including lexically distinct terms and distr… ▽ More

    Submitted 3 October, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

    Comments: Published in TACL Vol. 11 (2023)

  8. arXiv:2203.02540  [pdf, other

    cs.NE cs.LG physics.comp-ph

    Evolving symbolic density functionals

    Authors: He Ma, Arunachalam Narayanaswamy, Patrick Riley, Li Li

    Abstract: Systematic development of accurate density functionals has been a decades-long challenge for scientists. Despite the emerging application of machine learning (ML) in approximating functionals, the resulting ML functionals usually contain more than tens of thousands parameters, which makes a huge gap in the formulation with the conventional human-designed symbolic functionals. We propose a new fram… ▽ More

    Submitted 23 August, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Journal ref: Sci. Adv.8, eabq0279 (2022)

  9. arXiv:2010.03802  [pdf, other

    cs.CL cs.LG

    TextSETTR: Few-Shot Text Style Extraction and Tunable Targeted Restyling

    Authors: Parker Riley, Noah Constant, Mandy Guo, Girish Kumar, David Uthus, Zarana Parekh

    Abstract: We present a novel approach to the problem of text style transfer. Unlike previous approaches requiring style-labeled training data, our method makes use of readily-available unlabeled text by relying on the implicit connection in style between adjacent sentences, and uses labeled data only at inference time. We adapt T5 (Raffel et al., 2020), a strong pretrained text-to-text model, to extract a s… ▽ More

    Submitted 23 June, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

  10. arXiv:2009.08551  [pdf, other

    physics.comp-ph cs.LG

    Kohn-Sham equations as regularizer: building prior knowledge into machine-learned physics

    Authors: Li Li, Stephan Hoyer, Ryan Pederson, Ruoxi Sun, Ekin D. Cubuk, Patrick Riley, Kieron Burke

    Abstract: Including prior knowledge is important for effective machine learning models in physics, and is usually achieved by explicitly adding loss terms or constraints on model architectures. Prior knowledge embedded in the physics computation itself rarely draws attention. We show that solving the Kohn-Sham equations when training neural networks for the exchange-correlation functional provides an implic… ▽ More

    Submitted 17 November, 2020; v1 submitted 17 September, 2020; originally announced September 2020.

    Journal ref: Phys. Rev. Lett. 126, 036401 (2021)

  11. arXiv:2006.16322  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Scaling Symbolic Methods using Gradients for Neural Model Explanation

    Authors: Subham Sekhar Sahoo, Subhashini Venugopalan, Li Li, Rishabh Singh, Patrick Riley

    Abstract: Symbolic techniques based on Satisfiability Modulo Theory (SMT) solvers have been proposed for analyzing and verifying neural network properties, but their usage has been fairly limited owing to their poor scalability with larger networks. In this work, we propose a technique for combining gradient-based methods with symbolic techniques to scale such analyses and demonstrate its application for mo… ▽ More

    Submitted 5 May, 2021; v1 submitted 29 June, 2020; originally announced June 2020.

  12. Machine learning on DNA-encoded libraries: A new paradigm for hit-finding

    Authors: Kevin McCloskey, Eric A. Sigel, Steven Kearnes, Ling Xue, Xia Tian, Dennis Moccia, Diana Gikunju, Sana Bazzaz, Betty Chan, Matthew A. Clark, John W. Cuozzo, Marie-Aude Guié, John P. Guilinger, Christelle Huguet, Christopher D. Hupp, Anthony D. Keefe, Christopher J. Mulhern, Ying Zhang, Patrick Riley

    Abstract: DNA-encoded small molecule libraries (DELs) have enabled discovery of novel inhibitors for many distinct protein targets of therapeutic value through screening of libraries with up to billions of unique small molecules. We demonstrate a new approach applying machine learning to DEL selection data by identifying active molecules from a large commercial collection and a virtual library of easily syn… ▽ More

    Submitted 31 January, 2020; originally announced February 2020.

  13. arXiv:2002.00037  [pdf, ps, other

    cs.CL

    Unsupervised Bilingual Lexicon Induction Across Writing Systems

    Authors: Parker Riley, Daniel Gildea

    Abstract: Recent embedding-based methods in unsupervised bilingual lexicon induction have shown good results, but generally have not leveraged orthographic (spelling) information, which can be helpful for pairs of related languages. This work augments a state-of-the-art method with orthographic features, and extends prior work in this space by proposing methods that can learn and utilize orthographic corres… ▽ More

    Submitted 31 January, 2020; originally announced February 2020.

  14. arXiv:1911.03823  [pdf, other

    cs.CL

    Translationese as a Language in "Multilingual" NMT

    Authors: Parker Riley, Isaac Caswell, Markus Freitag, David Grangier

    Abstract: Machine translation has an undesirable propensity to produce "translationese" artifacts, which can lead to higher BLEU scores while being liked less by human raters. Motivated by this, we model translationese and original (i.e. natural) text as separate languages in a multilingual model, and pose the question: can we perform zero-shot translation between original source text and original target te… ▽ More

    Submitted 9 July, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

    Journal ref: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020) 7737-7746

  15. arXiv:1904.08915  [pdf, other

    cs.LG stat.ML

    Decoding Molecular Graph Embeddings with Reinforcement Learning

    Authors: Steven Kearnes, Li Li, Patrick Riley

    Abstract: We present RL-VAE, a graph-to-graph variational autoencoder that uses reinforcement learning to decode molecular graphs from latent embeddings. Methods have been described previously for graph-to-graph autoencoding, but these approaches require sophisticated decoders that increase the complexity of training and evaluation (such as requiring parallel encoders and decoders or non-trivial graph match… ▽ More

    Submitted 4 June, 2019; v1 submitted 18 April, 2019; originally announced April 2019.

    Comments: Presented at the ICML 2019 Workshop on Learning and Reasoning with Graph-Structured Data. Copyright 2019 by the author(s)

  16. arXiv:1901.07714  [pdf, other

    cs.LG cs.NE stat.ML

    Neural-Guided Symbolic Regression with Asymptotic Constraints

    Authors: Li Li, Minjie Fan, Rishabh Singh, Patrick Riley

    Abstract: Symbolic regression is a type of discrete optimization problem that involves searching expressions that fit given data points. In many cases, other mathematical constraints about the unknown expression not only provide more information beyond just values at some inputs, but also effectively constrain the search space. We identify the asymptotic constraints of leading polynomial powers as the funct… ▽ More

    Submitted 22 December, 2019; v1 submitted 22 January, 2019; originally announced January 2019.

    Journal ref: NeurIPS 2019 Workshop on Knowledge Representation & Reasoning Meets Machine Learning

  17. arXiv:1810.08678  [pdf, other

    cs.LG cs.AI stat.ML

    Optimization of Molecules via Deep Reinforcement Learning

    Authors: Zhenpeng Zhou, Steven Kearnes, Li Li, Richard N. Zare, Patrick Riley

    Abstract: We present a framework, which we call Molecule Deep $Q$-Networks (MolDQN), for molecule optimization by combining domain knowledge of chemistry and state-of-the-art reinforcement learning techniques (double $Q$-learning and randomized value functions). We directly define modifications on molecules, thereby ensuring 100\% chemical validity. Further, we operate without pre-training on any dataset to… ▽ More

    Submitted 28 February, 2019; v1 submitted 19 October, 2018; originally announced October 2018.

  18. arXiv:1802.08219  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

    Authors: Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, Patrick Riley

    Abstract: We introduce tensor field neural networks, which are locally equivariant to 3D rotations, translations, and permutations of points at every layer. 3D rotation equivariance removes the need for data augmentation to identify features in arbitrary orientations. Our network uses filters built from spherical harmonics; due to the mathematical consequences of this filter choice, each layer accepts as in… ▽ More

    Submitted 18 May, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

    Comments: changes for NIPS submission

  19. arXiv:1704.01212  [pdf, other

    cs.LG

    Neural Message Passing for Quantum Chemistry

    Authors: Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, George E. Dahl

    Abstract: Supervised learning on molecules has incredible potential to be useful in chemistry, drug discovery, and materials science. Luckily, several promising and closely related neural network models invariant to molecular symmetries have already been described in the literature. These models learn a message passing algorithm and aggregation procedure to compute a function of their entire input graph. At… ▽ More

    Submitted 12 June, 2017; v1 submitted 4 April, 2017; originally announced April 2017.

    Comments: 14 pages

    ACM Class: I.2.6

  20. Molecular Graph Convolutions: Moving Beyond Fingerprints

    Authors: Steven Kearnes, Kevin McCloskey, Marc Berndl, Vijay Pande, Patrick Riley

    Abstract: Molecular "fingerprints" encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular "graph convolutions", a machine learning… ▽ More

    Submitted 18 August, 2016; v1 submitted 2 March, 2016; originally announced March 2016.

    Comments: See "Version information" section

    Journal ref: J Comput Aided Mol Des (2016)

  21. arXiv:1502.02072  [pdf, other

    stat.ML cs.LG cs.NE

    Massively Multitask Networks for Drug Discovery

    Authors: Bharath Ramsundar, Steven Kearnes, Patrick Riley, Dale Webster, David Konerding, Vijay Pande

    Abstract: Massively multitask neural architectures provide a learning framework for drug discovery that synthesizes information from many distinct biological sources. To train these architectures at scale, we gather large amounts of data from public sources to create a dataset of nearly 40 million measurements across more than 200 biological targets. We investigate several aspects of the multitask framework… ▽ More

    Submitted 6 February, 2015; originally announced February 2015.

    Comments: Preliminary work. Under review by the International Conference on Machine Learning (ICML)