Skip to main content

Showing 1–6 of 6 results for author: Khorrami, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.05259  [pdf, other

    eess.AS cs.AI cs.CL

    A model of early word acquisition based on realistic-scale audiovisual naming events

    Authors: Khazar Khorrami, Okko Räsänen

    Abstract: Infants gradually learn to parse continuous speech into words and connect names with objects, yet the mechanisms behind development of early word perception skills remain unknown. We studied the extent to which early words can be acquired through statistical learning from regularities in audiovisual sensory input. We simulated word learning in infants up to 12 months of age in a realistic setting,… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 22 pages, 4 figures, journal article, submitted for review

  2. arXiv:2306.09820  [pdf, other

    eess.AS cs.SD

    Crowdsourcing and Evaluating Text-Based Audio Retrieval Relevances

    Authors: Huang Xie, Khazar Khorrami, Okko Räsänen, Tuomas Virtanen

    Abstract: This paper explores grading text-based audio retrieval relevances with crowdsourcing assessments. Given a free-form text (e.g., a caption) as a query, crowdworkers are asked to grade audio clips using numeric scores (between 0 and 100) to indicate their judgements of how much the sound content of an audio clip matches the text, where 0 indicates no content match at all and 100 indicates perfect co… ▽ More

    Submitted 15 August, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted at DCASE 2023 Workshop

  3. Simultaneous or Sequential Training? How Speech Representations Cooperate in a Multi-Task Self-Supervised Learning System

    Authors: Khazar Khorrami, María Andrea Cruz Blandón, Tuomas Virtanen, Okko Räsänen

    Abstract: Speech representation learning with self-supervised algorithms has resulted in notable performance boosts in many downstream tasks. Recent work combined self-supervised learning (SSL) and visually grounded speech (VGS) processing mechanisms for representation learning. The joint training with SSL and VGS mechanisms provides the opportunity to utilize both unlabeled speech and speech-related visual… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: 5 pages, accepted by EUSIPCO 2023

  4. arXiv:2109.14200  [pdf

    eess.AS cs.AI cs.CL cs.CV cs.LG cs.SD

    Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation

    Authors: Khazar Khorrami, Okko Räsänen

    Abstract: Decades of research has studied how language learning infants learn to discriminate speech sounds, segment words, and associate words with their meanings. While gradual development of such capabilities is unquestionable, the exact nature of these skills and the underlying mental representations yet remains unclear. In parallel, computational studies have shown that basic comprehension of speech ca… ▽ More

    Submitted 6 March, 2024; v1 submitted 29 September, 2021; originally announced September 2021.

    Comments: Final manuscript published in Language Development Research under CC BY-NC-SA 4.0. Pre-print redistributed through arXiv with permission. Replaces corrupted PsyArXiv pre-print repository at https://psyarxiv.com/37zna

    ACM Class: I.2.0; I.2.6; I.2.7; I.2.10

    Journal ref: Language Development Research, 1(1), 123-191 (2021)

  5. Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models

    Authors: Khazar Khorrami, Okko Räsänen

    Abstract: Systems that can find correspondences between multiple modalities, such as between speech and images, have great potential to solve different recognition and data analysis tasks in an unsupervised manner. This work studies multimodal learning in the context of visually grounded speech (VGS) models, and focuses on their recently demonstrated capability to extract spatiotemporal alignments between s… ▽ More

    Submitted 5 July, 2021; originally announced August 2021.

    Comments: To be published in Proc. Interspeech-2021, Brno, Czech Republic

  6. arXiv:1906.09832  [pdf

    cs.CL cs.LG cs.SD

    A computational model of early language acquisition from audiovisual experiences of young infants

    Authors: Okko Räsänen, Khazar Khorrami

    Abstract: Earlier research has suggested that human infants might use statistical dependencies between speech and non-linguistic multimodal input to bootstrap their language learning before they know how to segment words from running speech. However, feasibility of this hypothesis in terms of real-world infant experiences has remained unclear. This paper presents a step towards a more realistic test of the… ▽ More

    Submitted 24 June, 2019; originally announced June 2019.