Skip to main content

Showing 1–15 of 15 results for author: Apostoloff, N

.
  1. arXiv:2308.09514  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning

    Authors: Miguel Sarabia, Elena Menyaylenko, Alessandro Toso, Skyler Seto, Zakaria Aldeneh, Shadi Pirhosseinloo, Luca Zappella, Barry-John Theobald, Nicholas Apostoloff, Jonathan Sheaffer

    Abstract: We present Spatial LibriSpeech, a spatial audio dataset with over 650 hours of 19-channel audio, first-order ambisonics, and optional distractor noise. Spatial LibriSpeech is designed for machine learning model training, and it includes labels for source position, speaking direction, room acoustics and geometry. Spatial LibriSpeech is generated by augmenting LibriSpeech samples with 200k+ simulate… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Journal ref: Proceedings of INTERSPEECH (2023), pp. 3724-3728

  2. arXiv:2203.10117  [pdf, other

    cs.SD cs.CV cs.GR eess.AS

    On the role of Lip Articulation in Visual Speech Perception

    Authors: Zakaria Aldeneh, Masha Fedzechkina, Skyler Seto, Katherine Metcalf, Miguel Sarabia, Nicholas Apostoloff, Barry-John Theobald

    Abstract: Generating realistic lip motion from audio to simulate speech production is critical for driving natural character animation. Previous research has shown that traditional metrics used to optimize and assess models for generating lip motion from speech are not a good indicator of subjective opinion of animation quality. Devising metrics that align with subjective opinion first requires understandin… ▽ More

    Submitted 10 November, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: Submitted to ICASSP 2023

  3. arXiv:2202.09472  [pdf, other

    cs.LG

    FedEmbed: Personalized Private Federated Learning

    Authors: Andrew Silva, Katherine Metcalf, Nicholas Apostoloff, Barry-John Theobald

    Abstract: Federated learning enables the deployment of machine learning to problems for which centralized data collection is impractical. Adding differential privacy guarantees bounds on privacy while data are contributed to a global model. Adding personalization to federated learning introduces new challenges as we must account for preferences of individual users, where a data sample could have conflicting… ▽ More

    Submitted 18 February, 2022; originally announced February 2022.

    Comments: 15 pages

    MSC Class: 68T99 ACM Class: I.2.0

  4. arXiv:2202.03586  [pdf, other

    cs.CV cs.AI cs.LG

    Fair SA: Sensitivity Analysis for Fairness in Face Recognition

    Authors: Aparna R. Joshi, Xavier Suau, Nivedha Sivakumar, Luca Zappella, Nicholas Apostoloff

    Abstract: As the use of deep learning in high impact domains becomes ubiquitous, it is increasingly important to assess the resilience of models. One such high impact domain is that of face recognition, with real world applications involving images affected by various degradations, such as motion blur or high exposure. Moreover, images captured across different attributes, such as gender and race, can also… ▽ More

    Submitted 9 February, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

    Comments: 8 pages, 5 figures, to be published in NeurIPS 2021 Workshop, Algorithmic Fairness through the Lens of Causality and Robustness

  5. arXiv:2202.01719  [pdf, other

    cs.LG cs.CV

    FORML: Learning to Reweight Data for Fairness

    Authors: Bobby Yan, Skyler Seto, Nicholas Apostoloff

    Abstract: Machine learning models are trained to minimize the mean loss for a single metric, and thus typically do not consider fairness and robustness. Neglecting such metrics in training can make these models prone to fairness violations when training data are imbalanced or test distributions differ. This work introduces Fairness Optimized Reweighting via Meta-Learning (FORML), a training algorithm that b… ▽ More

    Submitted 19 July, 2022; v1 submitted 3 February, 2022; originally announced February 2022.

    Comments: 9 pages, 2 figures, Presented at ICML 2022 DataPerf Workshop

  6. arXiv:2111.12427  [pdf, other

    cs.LG cs.CV

    Challenges of Adversarial Image Augmentations

    Authors: Arno Blaas, Xavier Suau, Jason Ramapuram, Nicholas Apostoloff, Luca Zappella

    Abstract: Image augmentations applied during training are crucial for the generalization performance of image classifiers. Therefore, a large body of research has focused on finding the optimal augmentation policy for a given task. Yet, RandAugment [2], a simple random augmentation policy, has recently been shown to outperform existing sophisticated policies. Only Adversarial AutoAugment (AdvAA) [11], an ap… ▽ More

    Submitted 3 December, 2021; v1 submitted 24 November, 2021; originally announced November 2021.

    Comments: To appear at the ICBINB 2021 Neurips Workshop

  7. arXiv:2110.02802  [pdf, other

    cs.CL

    Self-conditioning pre-trained language models

    Authors: Xavier Suau, Luca Zappella, Nicholas Apostoloff

    Abstract: In this paper we aim to investigate the mechanisms that guide text generation with pre-trained Transformer-based Language Models (TLMs). Grounded on the Product of Experts formulation by Hinton (1999), we describe a generative mechanism that exploits expert units which naturally exist in TLMs. Such units are responsible for detecting concepts in the input and conditioning text generation on such c… ▽ More

    Submitted 14 June, 2023; v1 submitted 30 September, 2021; originally announced October 2021.

    Comments: 8 pages and supplementary material, accepted at ICML 2022

  8. arXiv:2102.11012  [pdf, other

    cs.CL cs.AI cs.LG

    Multimodal Punctuation Prediction with Contextual Dropout

    Authors: Andrew Silva, Barry-John Theobald, Nicholas Apostoloff

    Abstract: Automatic speech recognition (ASR) is widely used in consumer electronics. ASR greatly improves the utility and accessibility of technology, but usually the output is only word sequences without punctuation. This can result in ambiguity in inferring user-intent. We first present a transformer-based approach for punctuation prediction that achieves 8% improvement on the IWSLT 2012 TED Task, beating… ▽ More

    Submitted 12 February, 2021; originally announced February 2021.

    Comments: Accepted for publication at ICASSP 2021

    ACM Class: I.2.7

  9. arXiv:2012.05225  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    MorphGAN: One-Shot Face Synthesis GAN for Detecting Recognition Bias

    Authors: Nataniel Ruiz, Barry-John Theobald, Anurag Ranjan, Ahmed Hussein Abdelaziz, Nicholas Apostoloff

    Abstract: To detect bias in face recognition networks, it can be useful to probe a network under test using samples in which only specific attributes vary in some controlled way. However, capturing a sufficiently large dataset with specific control over the attributes of interest is difficult. In this work, we describe a simulator that applies specific head pose and facial expression adjustments to images o… ▽ More

    Submitted 10 December, 2020; v1 submitted 9 December, 2020; originally announced December 2020.

  10. arXiv:2005.13616  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Modality Dropout for Improved Performance-driven Talking Faces

    Authors: Ahmed Hussen Abdelaziz, Barry-John Theobald, Paul Dixon, Reinhard Knothe, Nicholas Apostoloff, Sachin Kajareker

    Abstract: We describe our novel deep learning approach for driving animated faces using both acoustic and visual information. In particular, speech-related facial movements are generated using audiovisual information, and non-speech facial movements are generated using only visual information. To ensure that our model exploits both modalities during training, batches are generated that contain audio-only, v… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: Pre-print

  11. arXiv:2005.07647  [pdf, other

    cs.AI cs.CL cs.LG

    Finding Experts in Transformer Models

    Authors: Xavier Suau, Luca Zappella, Nicholas Apostoloff

    Abstract: In this work we study the presence of expert units in pre-trained Transformer Models (TM), and how they impact a model's performance. We define expert units to be neurons that are able to classify a concept with a given average precision, where a concept is represented by a binary set of sentences containing the concept (or not). Leveraging the OneSec dataset (Scarlini et al., 2019), we compile a… ▽ More

    Submitted 15 May, 2020; originally announced May 2020.

  12. arXiv:1905.06860  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models

    Authors: Ahmed Hussen Abdelaziz, Barry-John Theobald, Justin Binder, Gabriele Fanelli, Paul Dixon, Nicholas Apostoloff, Thibaut Weise, Sachin Kajareker

    Abstract: Speech-driven visual speech synthesis involves map** features extracted from acoustic speech to the corresponding lip animation controls for a face model. This map** can take many forms, but a powerful approach is to use deep neural networks (DNNs). However, a limitation is the lack of synchronized audio, video, and depth data required to reliably train the DNNs, especially for speaker-indepen… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

    Comments: 9 pages, 2 figures, 2 tables

    ACM Class: I.2.m; I.3.8

  13. arXiv:1904.01664  [pdf, other

    cs.HC cs.AI cs.CL

    Mirroring to Build Trust in Digital Assistants

    Authors: Katherine Metcalf, Barry-John Theobald, Garrett Weinberg, Robert Lee, Ing-Marie Jonsson, Russ Webb, Nicholas Apostoloff

    Abstract: We describe experiments towards building a conversational digital assistant that considers the preferred conversational style of the user. In particular, these experiments are designed to measure whether users prefer and trust an assistant whose conversational style matches their own. To this end we conducted a user study where subjects interacted with a digital assistant that responded in a way t… ▽ More

    Submitted 2 April, 2019; originally announced April 2019.

    Comments: Preprint

  14. arXiv:1812.04145  [pdf, other

    cs.LG cs.MA stat.ML

    Learning Sharing Behaviors with Arbitrary Numbers of Agents

    Authors: Katherine Metcalf, Barry-John Theobald, Nicholas Apostoloff

    Abstract: We propose a method for modeling and learning turn-taking behaviors for accessing a shared resource. We model the individual behavior for each agent in an interaction and then use a multi-agent fusion model to generate a summary over the expected actions of the group to render the model independent of the number of agents. The individual behavior models are weighted finite state transducers (WFSTs… ▽ More

    Submitted 10 December, 2018; originally announced December 2018.

    Comments: 14 pages, 9 figures, 3 tables, International Conference on Autonomous Agents and Multiagent Systems (AAMAS), machine learning, Reinforcement learning

  15. arXiv:1807.10585  [pdf, ps, other

    cs.CV

    Filter Distillation for Network Compression

    Authors: Xavier Suau, Luca Zappella, Nicholas Apostoloff

    Abstract: In this paper we introduce Principal Filter Analysis (PFA), an easy to use and effective method for neural network compression. PFA exploits the correlation between filter responses within network layers to recommend a smaller network that maintain as much as possible the accuracy of the full model. We propose two algorithms: the first allows users to target compression to specific network propert… ▽ More

    Submitted 11 December, 2019; v1 submitted 20 July, 2018; originally announced July 2018.

    Comments: 10 pages, 3 figures, Deep neural network compression, spectral analysis, machine learning

    Journal ref: WACV 2020