Skip to main content

Showing 1–13 of 13 results for author: Foster, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.14782  [pdf, other

    cs.CL

    Lessons from the Trenches on Reproducible Evaluation of Language Models

    Authors: Stella Biderman, Hailey Schoelkopf, Lintang Sutawika, Leo Gao, Jonathan Tow, Baber Abbasi, Alham Fikri Aji, Pawan Sasanka Ammanamanchi, Sidney Black, Jordan Clive, Anthony DiPofi, Julen Etxaniz, Benjamin Fattori, Jessica Zosa Forde, Charles Foster, Jeffrey Hsu, Mimansa Jaiswal, Wilson Y. Lee, Haonan Li, Charles Lovering, Niklas Muennighoff, Ellie Pavlick, Jason Phang, Aviya Skowron, Samson Tan , et al. (5 additional authors not shown)

    Abstract: Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of reproducibility and transparency. In this paper we draw on three years of experience in evaluating large language models to provide guidance and lessons… ▽ More

    Submitted 29 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  2. arXiv:2404.04491  [pdf, other

    astro-ph.IM astro-ph.GA cs.LG

    Galaxy 3D Shape Recovery using Mixture Density Network

    Authors: Suk Yee Yong, K. E. Harborne, Caroline Foster, Robert Bassett, Gregory B. Poole, Mitchell Cavanagh

    Abstract: Since the turn of the century, astronomers have been exploiting the rich information afforded by combining stellar kinematic maps and imaging in an attempt to recover the intrinsic, three-dimensional (3D) shape of a galaxy. A common intrinsic shape recovery method relies on an expected monotonic relationship between the intrinsic misalignment of the kinematic and morphological axes and the triaxia… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted for publication in PASA. 18 pages, 12 figures, 2 tables

    Journal ref: Publ. Astron. Soc. Aust. 41 (2024) e033

  3. arXiv:2403.06557  [pdf, other

    eess.SY cs.LG cs.RO

    Data-driven architecture to encode information in the kinematics of robots and artificial avatars

    Authors: Francesco De Lellis, Marco Coraggio, Nathan C. Foster, Riccardo Villa, Cristina Becchio, Mario di Bernardo

    Abstract: We present a data-driven control architecture for modifying the kinematics of robots and artificial avatars to encode specific information such as the presence or not of an emotion in the movements of an avatar or robot driven by a human operator. We validate our approach on an experimental dataset obtained during the reach-to-grasp phase of a pick-and-place task.

    Submitted 11 March, 2024; originally announced March 2024.

  4. arXiv:2402.02846  [pdf, other

    physics.optics cs.CR cs.LG physics.app-ph

    Machine Learning Resistant Amorphous Silicon Physically Unclonable Functions (PUFs)

    Authors: Velat Kilic, Neil Macfarlane, Jasper Stround, Samuel Metais, Milad Alemohammad, A. Brinton Cooper, Amy C. Foster, Mark A. Foster

    Abstract: We investigate usage of nonlinear wave chaotic amorphous silicon (a-Si) cavities as physically unclonable functions (PUF). Machine learning attacks on integrated electronic PUFs have been demonstrated to be very effective at modeling PUF behavior. Such attacks on integrated a-Si photonic PUFs are investigated through application of algorithms including linear regression, k-nearest neighbor, decisi… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  5. arXiv:2208.10022  [pdf, other

    cs.IR

    Generalized Relative Neighborhood Graph (GRNG) for Similarity Search

    Authors: Cole Foster, Berk Sevilmis, Benjamin Kimia

    Abstract: Similarity search is a fundamental building block for information retrieval on a variety of datasets. The notion of a neighbor is often based on binary considerations, such as the k nearest neighbors. However, considering that data is often organized as a manifold with low intrinsic dimension, the notion of a neighbor must recognize higher-order relationship, to capture neighbors in all directions… ▽ More

    Submitted 21 August, 2022; originally announced August 2022.

  6. arXiv:2108.09781  [pdf

    cs.CY

    Global Transfers: M-Pesa, Intellectual Property Rights and Digital Innovation

    Authors: Christopher Foster

    Abstract: In July 2020, in the midst of the COVID crisis, the Kenyan mobile operator Safaricom announced that the intellectual property rights (IPR) for mobile money service M-Pesa were "moving back into African control". This paper tracks how the IPR originally came to be held outside Kenya, and the implications for understanding M-Pesa as an inclusive innovation. Through reflection of this analysis of IPR… ▽ More

    Submitted 22 August, 2021; originally announced August 2021.

    Comments: In proceedings of the 1st Virtual Conference on Implications of Information and Digital Technologies for Development, 2021

  7. arXiv:2108.05025  [pdf, other

    cs.CV cs.HC cs.LG

    Learning Oculomotor Behaviors from Scanpath

    Authors: Beibin Li, Nicholas Nuechterlein, Erin Barney, Claire Foster, Minah Kim, Monique Mahony, Adham Atyabi, Li Feng, Quan Wang, Pamela Ventola, Linda Shapiro, Frederick Shic

    Abstract: Identifying oculomotor behaviors relevant for eye-tracking applications is a critical but often challenging task. Aiming to automatically learn and extract knowledge from existing eye-tracking data, we develop a novel method that creates rich representations of oculomotor scanpaths to facilitate the learning of downstream tasks. The proposed stimulus-agnostic Oculomotor Behavior Framework (OBF) mo… ▽ More

    Submitted 11 August, 2021; originally announced August 2021.

    Comments: Accepted ACM ICMI 2021

  8. arXiv:2101.00027  [pdf, other

    cs.CL

    The Pile: An 800GB Dataset of Diverse Text for Language Modeling

    Authors: Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, Connor Leahy

    Abstract: Recent work has demonstrated that increased training dataset diversity improves general cross-domain knowledge and downstream generalization capability for large-scale language models. With this in mind, we present \textit{the Pile}: an 825 GiB English text corpus targeted at training large-scale language models. The Pile is constructed from 22 diverse high-quality subsets -- both existing and new… ▽ More

    Submitted 31 December, 2020; originally announced January 2021.

  9. arXiv:1911.11069  [pdf

    cs.IR cs.CL cs.LG

    Query Expansion for Patent Searching using Word Embedding and Professional Crowdsourcing

    Authors: Arthi Krishna, Ye **, Christine Foster, Greg Gabel, Britt Hanley, Abdou Youssef

    Abstract: The patent examination process includes a search of previous work to verify that a patent application describes a novel invention. Patent examiners primarily use keyword-based searches to uncover prior art. A critical part of keyword searching is query expansion, which is the process of including alternate terms such as synonyms and other related words, since the same concepts are often described… ▽ More

    Submitted 14 November, 2019; originally announced November 2019.

    Comments: Presented at AAAI FSS-19: Artificial Intelligence in Government and Public Sector, Arlington, Virginia, USA

  10. arXiv:1904.03616  [pdf, other

    cs.CV cs.HC cs.LG

    A Facial Affect Analysis System for Autism Spectrum Disorder

    Authors: Beibin Li, Sachin Mehta, Deepali Aneja, Claire Foster, Pamela Ventola, Frederick Shic, Linda Shapiro

    Abstract: In this paper, we introduce an end-to-end machine learning-based system for classifying autism spectrum disorder (ASD) using facial attributes such as expressions, action units, arousal, and valence. Our system classifies ASD using representations of different facial attributes from convolutional neural networks, which are trained on images in the wild. Our experimental results show that different… ▽ More

    Submitted 7 April, 2019; originally announced April 2019.

    Comments: 5 pages (including 1 page for reference), 3 figures

  11. arXiv:1711.02222  [pdf

    physics.optics cs.CR

    Information-Dense Nonlinear Photonic Physical Unclonable Function

    Authors: Brian C. Grubel, Bryan T. Bosworth, Michael R. Kossey, A. Brinton Cooper, Mark A. Foster, Amy C. Foster

    Abstract: We present a comprehensive investigation into the complexity of a new private key storage apparatus: a novel silicon photonic physical unclonable function (PUF) based on ultrafast nonlinear optical interactions in a chaotic silicon microcavity that is both unclonable and impossible to emulate. This device provides remarkable improvements to total information content (raw cryptographic material), i… ▽ More

    Submitted 6 November, 2017; originally announced November 2017.

  12. arXiv:1711.01439  [pdf

    cs.CR physics.optics

    Secure Communications using Nonlinear Silicon Photonic Keys

    Authors: Brian C. Grubel, Bryan T. Bosworth, Michael R. Kossey, A. Brinton Cooper, Mark A. Foster, Amy C. Foster

    Abstract: We present a secure communication system constructed using pairs of nonlinear photonic physical unclonable functions (PUFs) that harness physical chaos in integrated silicon micro-cavities. Compared to a large, electronically stored one-time pad, our method provisions large amounts of information within the intrinsically complex nanostructure of the micro-cavities. By probing a micro-cavity with a… ▽ More

    Submitted 5 February, 2018; v1 submitted 4 November, 2017; originally announced November 2017.

    Comments: 12 pages. Replaced with revised version

  13. arXiv:1611.06962  [pdf, other

    cs.CV

    Sampled Image Tagging and Retrieval Methods on User Generated Content

    Authors: Karl Ni, Kyle Zaragoza, Charles Foster, Carmen Carrano, Barry Chen, Yonas Tesfaye, Alex Gude

    Abstract: Traditional image tagging and retrieval algorithms have limited value as a result of being trained with heavily curated datasets. These limitations are most evident when arbitrary search words are used that do not intersect with training set labels. Weak labels from user generated content (UGC) found in the wild (e.g., Google Photos, FlickR, etc.) have an almost unlimited number of unique words in… ▽ More

    Submitted 2 December, 2016; v1 submitted 21 November, 2016; originally announced November 2016.