Skip to main content

Showing 1–21 of 21 results for author: Frank, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10447  [pdf, other

    cs.CV

    The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences

    Authors: Bria Long, Violet Xiang, Stefan Stojanov, Robert Z. Sparks, Zi Yin, Grace E. Keene, Alvin W. M. Tan, Steven Y. Feng, Chengxu Zhuang, Virginia A. Marchman, Daniel L. K. Yamins, Michael C. Frank

    Abstract: Human children far exceed modern machine learning algorithms in their sample efficiency, achieving high performance in key domains with much less data than current models. This ''data gap'' is a key challenge both for building intelligent artificial systems and for understanding human development. Egocentric video capturing children's experience -- their ''training data'' -- is a key ingredient fo… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures, 4 tables and SI. Submitted to NeurIPS Datasets and Benchmarks

  2. arXiv:2406.10215  [pdf, other

    cs.CL cs.LG

    DevBench: A multimodal developmental benchmark for language learning

    Authors: Alvin Wei Ming Tan, Sunny Yu, Bria Long, Wan**g Anya Ma, Tonya Murray, Rebecca D. Silverman, Jason D. Yeatman, Michael C. Frank

    Abstract: How (dis)similar are the learning trajectories of vision-language models and children? Recent modeling work has attempted to understand the gap between models' and humans' data efficiency by constructing models trained on less data, especially multimodal naturalistic data. However, such models are often evaluated on adult-level benchmarks, with limited breadth in language abilities tested, and wit… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  3. arXiv:2405.19211  [pdf, other

    cs.LG

    Gone but Not Forgotten: Improved Benchmarks for Machine Unlearning

    Authors: Keltin Grimes, Collin Abidi, Cole Frank, Shannon Gallagher

    Abstract: Machine learning models are vulnerable to adversarial attacks, including attacks that leak information about the model's training data. There has recently been an increase in interest about how to best address privacy concerns, especially in the presence of data-removal requests. Machine unlearning algorithms aim to efficiently update trained models to comply with data deletion requests while main… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  4. arXiv:2404.02418  [pdf, other

    cs.CL cs.AI

    Auxiliary task demands mask the capabilities of smaller language models

    Authors: Jennifer Hu, Michael C. Frank

    Abstract: Developmental psychologists have argued about when cognitive capacities such as language understanding or theory of mind emerge. These debates often hinge on the concept of "task demands" -- the auxiliary challenges associated with performing a particular evaluation -- that may mask the child's underlying ability. The same issues arise when measuring the capacities of language models (LMs): perfor… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  5. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  6. arXiv:2308.08628  [pdf, other

    cs.CL

    Learning the meanings of function words from grounded language using a visual question answering model

    Authors: Eva Portelance, Michael C. Frank, Dan Jurafsky

    Abstract: Interpreting a seemingly-simple function word like "or", "behind", or "more" can require logical, numerical, and relational reasoning. How are such words learned by children? Prior acquisition theories have often relied on positing a foundation of innate knowledge. Yet recent neural-network based visual question answering models apparently can learn to use function words as part of answering quest… ▽ More

    Submitted 22 April, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: Published in Cognitive Science 2024

    ACM Class: I.2.7; I.2.6; I.2.10

  7. arXiv:2307.11078  [pdf, other

    q-bio.NC cs.LG cs.SD eess.AS

    Brain2Music: Reconstructing Music from Human Brain Activity

    Authors: Timo I. Denk, Yu Takagi, Takuya Matsuyama, Andrea Agostinelli, Tomoya Nakai, Christian Frank, Shinji Nishimoto

    Abstract: The process of reconstructing experiences from human brain activity offers a unique lens into how the brain interprets and represents the world. In this paper, we introduce a method for reconstructing music from brain activity, captured using functional magnetic resonance imaging (fMRI). Our approach uses either music retrieval or the MusicLM music generation model conditioned on embeddings derive… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Preprint; 21 pages; supplementary material: https://google-research.github.io/seanet/brain2music

  8. arXiv:2306.12925  [pdf, other

    cs.CL cs.AI cs.SD eess.AS stat.ML

    AudioPaLM: A Large Language Model That Can Speak and Listen

    Authors: Paul K. Rubenstein, Chulayuth Asawaroengchai, Duc Dung Nguyen, Ankur Bapna, Zalán Borsos, Félix de Chaumont Quitry, Peter Chen, Dalia El Badawy, Wei Han, Eugene Kharitonov, Hannah Muckenhirn, Dirk Padfield, James Qin, Danny Rozenberg, Tara Sainath, Johan Schalkwyk, Matt Sharifi, Michelle Tadmor Ramanovich, Marco Tagliasacchi, Alexandru Tudor, Mihajlo Velimirović, Damien Vincent, Jiahui Yu, Yongqiang Wang, Vicky Zayats , et al. (5 additional authors not shown)

    Abstract: We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and speech-to-speech translation. AudioPaLM inherits the… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: Technical report

  9. arXiv:2303.00092  [pdf, other

    cs.CR cs.CV eess.IV

    A study on the use of perceptual hashing to detect manipulation of embedded messages in images

    Authors: Sven-Jannik Wöhnert, Kai Hendrik Wöhnert, Eldar Almamedov, Carsten Frank, Volker Skwarek

    Abstract: Typically, metadata of images are stored in a specific data segment of the image file. However, to securely detect changes, data can also be embedded within images. This follows the goal to invisibly and robustly embed as much information as possible to, ideally, even survive compression. This work searches for embedding principles which allow to distinguish between unintended changes by lossy i… ▽ More

    Submitted 28 February, 2023; originally announced March 2023.

    Comments: 12 pages, 3 figures submitted, accepted and presented at IPCV 2022, subconference of CSCE, https://american-cse.org/csce2022/conferences-IPCV as the publication of the proceedings is delayed, the permission for a (pre-)publication on arxiv was granted https://american-cse.org/csce2022/publisher

  10. arXiv:2302.14727  [pdf, other

    cs.CL

    Automatically Classifying Emotions based on Text: A Comparative Exploration of Different Datasets

    Authors: Anna Koufakou, Jairo Garciga, Adam Paul, Joseph Morelli, Christopher Frank

    Abstract: Emotion Classification based on text is a task with many applications which has received growing interest in recent years. This paper presents a preliminary study with the goal to help researchers and practitioners gain insight into relatively new datasets as well as emotion classification in general. We focus on three datasets that were recently presented in the related literature, and we explore… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

    Comments: Accepted at IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2022)

  11. arXiv:2302.03917  [pdf, other

    cs.SD cs.LG eess.AS

    Noise2Music: Text-conditioned Music Generation with Diffusion Models

    Authors: Qingqing Huang, Daniel S. Park, Tao Wang, Timo I. Denk, Andy Ly, Nanxin Chen, Zhengdong Zhang, Zhishuai Zhang, Jiahui Yu, Christian Frank, Jesse Engel, Quoc V. Le, William Chan, Zhifeng Chen, Wei Han

    Abstract: We introduce Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts. Two types of diffusion models, a generator model, which generates an intermediate representation conditioned on text, and a cascader model, which generates high-fidelity audio conditioned on the intermediate representation and possibly the text, are trained and… ▽ More

    Submitted 6 March, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

    Comments: 15 pages

  12. arXiv:2301.11325  [pdf, other

    cs.SD cs.LG eess.AS

    MusicLM: Generating Music From Text

    Authors: Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, Christian Frank

    Abstract: We introduce MusicLM, a model generating high-fidelity music from text descriptions such as "a calming violin melody backed by a distorted guitar riff". MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes. Our experiments show that MusicLM outperforms previous s… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: Supplementary material at https://google-research.github.io/seanet/musiclm/examples and https://kaggle.com/datasets/googleai/musiccaps

  13. arXiv:2301.00646  [pdf, other

    eess.AS cs.MA cs.RO cs.SD

    Addressing the Selection Bias in Voice Assistance: Training Voice Assistance Model in Python with Equal Data Selection

    Authors: Kashav Piya, Srijal Shrestha, Cameran Frank, Estephanos Jebessa, Tauheed Khan Mohd

    Abstract: In recent times, voice assistants have become a part of our day-to-day lives, allowing information retrieval by voice synthesis, voice recognition, and natural language processing. These voice assistants can be found in many modern-day devices such as Apple, Amazon, Google, and Samsung. This project is primarily focused on Virtual Assistance in Natural Language Processing. Natural Language Process… ▽ More

    Submitted 20 December, 2022; originally announced January 2023.

  14. arXiv:2109.06232  [pdf, other

    cs.CL cs.IT cs.NE

    The Emergence of the Shape Bias Results from Communicative Efficiency

    Authors: Eva Portelance, Michael C. Frank, Dan Jurafsky, Alessandro Sordoni, Romain Laroche

    Abstract: By the age of two, children tend to assume that new word categories are based on objects' shape, rather than their color or texture; this assumption is called the shape bias. They are thought to learn this bias by observing that their caregiver's language is biased towards shape based categories. This presents a chicken and egg problem: if the shape bias must be present in the language in order fo… ▽ More

    Submitted 14 September, 2021; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: Accepted at CoNLL 2021

  15. arXiv:2106.09590  [pdf, other

    cs.IR cs.DB

    Open Data and the Status Quo -- A Fine-Grained Evaluation Framework for Open Data Quality and an Analysis of Open Data portals in Germany

    Authors: Lisa Wenige, Claus Stadler, Michael Martin, Richard Figura, Robert Sauter, Christopher W. Frank

    Abstract: This paper presents a framework for assessing data and metadata quality within Open Data portals. Although a few benchmark frameworks already exist for this purpose, they are not yet detailed enough in both breadth and depth to make valid statements about the actual discoverability and accessibility of publicly available data collections. To address this research gap, we have designed a quality fr… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

  16. arXiv:2104.05857  [pdf, other

    cs.CL cs.AI

    From partners to populations: A hierarchical Bayesian account of coordination and convention

    Authors: Robert D. Hawkins, Michael Franke, Michael C. Frank, Adele E. Goldberg, Kenny Smith, Thomas L. Griffiths, Noah D. Goodman

    Abstract: Languages are powerful solutions to coordination problems: they provide stable, shared expectations about how the words we say correspond to the beliefs and intentions in our heads. Yet language use in a variable and non-stationary social environment requires linguistic representations to be flexible: old words acquire new ad hoc or partner-specific meanings on the fly. In this paper, we introduce… ▽ More

    Submitted 2 December, 2021; v1 submitted 12 April, 2021; originally announced April 2021.

    Comments: In press at Psychological Review

  17. arXiv:2006.07968  [pdf, other

    cs.LG cs.AI cs.CL

    Relational reasoning and generalization using non-symbolic neural networks

    Authors: Atticus Geiger, Alexandra Carstensen, Michael C. Frank, Christopher Potts

    Abstract: The notion of equality (identity) is simple and ubiquitous, making it a key case study for broader questions about the representations supporting abstract relational reasoning. Previous work suggested that neural networks were not suitable models of human relational reasoning because they could not represent mathematically identity, the most basic form of equality. We revisit this question. In our… ▽ More

    Submitted 1 May, 2022; v1 submitted 14 June, 2020; originally announced June 2020.

  18. arXiv:1912.07199  [pdf, other

    cs.CL

    Characterizing the dynamics of learning in repeated reference games

    Authors: Robert D. Hawkins, Michael C. Frank, Noah D. Goodman

    Abstract: The language we use over the course of conversation changes as we establish common ground and learn what our partner finds meaningful. Here we draw upon recent advances in natural language processing to provide a finer-grained characterization of the dynamics of this learning process. We release an open corpus (>15,000 utterances) of extended dyadic interactions in a classic repeated reference gam… ▽ More

    Submitted 13 April, 2020; v1 submitted 16 December, 2019; originally announced December 2019.

    Comments: Accepted at Cognitive Science

  19. arXiv:1910.11664  [pdf, other

    eess.AS cs.LG cs.SD

    SPICE: Self-supervised Pitch Estimation

    Authors: Beat Gfeller, Christian Frank, Dominik Roblek, Matt Sharifi, Marco Tagliasacchi, Mihajlo Velimirović

    Abstract: We propose a model to estimate the fundamental frequency in monophonic audio, often referred to as pitch estimation. We acknowledge the fact that obtaining ground truth annotations at the required temporal and frequency resolution is a particularly daunting task. Therefore, we propose to adopt a self-supervised learning technique, which is able to estimate pitch without any form of supervision. Th… ▽ More

    Submitted 4 September, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: Accepted to IEEE Transactions on Audio, Speech and Language Processing

    Journal ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1118-1128, 2020

  20. arXiv:1711.09401  [pdf, other

    cs.AI

    Pedagogical learning

    Authors: Long Ouyang, Michael C. Frank

    Abstract: A common assumption in machine learning is that training data are i.i.d. samples from some distribution. Processes that generate i.i.d. samples are, in a sense, uninformative---they produce data without regard to how good this data is for learning. By contrast, cognitive science research has shown that when people generate training data for others (i.e., teaching), they deliberately select example… ▽ More

    Submitted 30 November, 2017; v1 submitted 26 November, 2017; originally announced November 2017.

  21. arXiv:1709.09443  [pdf, other

    cs.CL

    Prosodic Features from Large Corpora of Child-Directed Speech as Predictors of the Age of Acquisition of Words

    Authors: Lea Frermann, Michael C. Frank

    Abstract: The impressive ability of children to acquire language is a widely studied phenomenon, and the factors influencing the pace and patterns of word learning remains a subject of active research. Although many models predicting the age of acquisition of words have been proposed, little emphasis has been directed to the raw input children achieve. In this work we present a comparatively large-scale mul… ▽ More

    Submitted 27 September, 2017; originally announced September 2017.