Skip to main content

Showing 1–28 of 28 results for author: Anumanchipalli, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.15754  [pdf, other

    cs.CV cs.CL cs.LG cs.SD eess.AS

    Multimodal Segmentation for Vocal Tract Modeling

    Authors: Rishi Jain, Bohan Yu, Peter Wu, Tejas Prabhune, Gopala Anumanchipalli

    Abstract: Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  2. arXiv:2406.12998  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Articulatory Encodec: Vocal Tract Kinematics as a Codec for Speech

    Authors: Cheol Jun Cho, Peter Wu, Tejas S. Prabhune, Dhruv Agarwal, Gopala K. Anumanchipalli

    Abstract: Vocal tract articulation is a natural, grounded control space of speech production. The spatiotemporal coordination of articulators combined with the vocal source shapes intelligible speech sounds to enable effective spoken communication. Based on this physiological grounding of speech, we propose a new framework of neural encoding-decoding of speech -- articulatory encodec. The articulatory encod… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  3. arXiv:2405.00664  [pdf, other

    cs.CL cs.AI cs.LG

    Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3

    Authors: Junsang Yoon, Akshat Gupta, Gopala Anumanchipalli

    Abstract: This study presents a targeted model editing analysis focused on the latest large language model, Llama-3. We explore the efficacy of popular model editing techniques - ROME, MEMIT, and EMMET, which are designed for precise layer interventions. We identify the most effective layers for targeted edits through an evaluation that encompasses up to 4096 edits across three distinct strategies: sequenti… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  4. arXiv:2403.14236  [pdf, other

    cs.LG cs.AI cs.CL

    A Unified Framework for Model Editing

    Authors: Akshat Gupta, Dev Sajnani, Gopala Anumanchipalli

    Abstract: We introduce a unifying framework that brings two leading "locate-and-edit" model editing techniques -- ROME and MEMIT -- under a single conceptual umbrella, optimizing for the same goal, which we call the preservation-memorization objective. ROME uses an equality constraint to perform one edit at a time, whereas MEMIT employs a more flexible least-square constraint that allows for batched edits.… ▽ More

    Submitted 22 April, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: EMMET can do batched edits of batch size 10k with performance very similar to MEMIT

  5. arXiv:2403.07175  [pdf, other

    cs.CL cs.AI

    Rebuilding ROME : Resolving Model Collapse during Sequential Model Editing

    Authors: Akshat Gupta, Sidharth Baskaran, Gopala Anumanchipalli

    Abstract: Recent work using Rank-One Model Editing (ROME), a popular model editing method, has shown that there are certain facts that the algorithm is unable to edit without breaking the model. Such edits have previously been called disabling edits. These disabling edits cause immediate model collapse and limits the use of ROME for sequential editing. In this paper, we show that disabling edits are an arti… ▽ More

    Submitted 16 April, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Added explanation of failure of original implementation of ROME in the paper

  6. arXiv:2402.14805  [pdf, other

    cs.CL cs.AI

    Identifying Multiple Personalities in Large Language Models with External Evaluation

    Authors: Xiaoyang Song, Yuta Adachi, Jessie Feng, Mouwei Lin, Linhao Yu, Frank Li, Akshat Gupta, Gopala Anumanchipalli, Simerjot Kaur

    Abstract: As Large Language Models (LLMs) are integrated with human daily applications rapidly, many societal and ethical concerns are raised regarding the behavior of LLMs. One of the ways to comprehend LLMs' behavior is to analyze their personalities. Many recent studies quantify LLMs' personalities using self-assessment tests that are created for humans. Yet many critiques question the applicability and… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  7. arXiv:2401.10015  [pdf, other

    cs.CL eess.AS

    Towards Hierarchical Spoken Language Dysfluency Modeling

    Authors: Jiachen Lian, Gopala Anumanchipalli

    Abstract: Speech disfluency modeling is the bottleneck for both speech therapy and language learning. However, there is no effective AI solution to systematically tackle this problem. We solidify the concept of disfluent speech and disfluent speech modeling. We then present Hierarchical Unconstrained Disfluency Modeling (H-UDM) approach, the hierarchical extension of UDM that addresses both disfluency trans… ▽ More

    Submitted 21 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 2024 EACL. Hierarchical extension of our previous workshop paper arXiv:2312.12810

  8. arXiv:2401.07453  [pdf, other

    cs.CL cs.AI cs.IR

    Model Editing at Scale leads to Gradual and Catastrophic Forgetting

    Authors: Akshat Gupta, Anurag Rao, Gopala Anumanchipalli

    Abstract: Editing knowledge in large language models is an attractive capability to have which allows us to correct incorrectly learnt facts during pre-training, as well as update the model with an ever-growing list of new facts. While existing model editing techniques have shown promise, they are usually evaluated using metrics for reliability, specificity and generalization over one or few edits. We argue… ▽ More

    Submitted 10 June, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

    Comments: ACL 2024 Findings

  9. arXiv:2312.12810  [pdf, other

    eess.AS cs.SD

    Unconstrained Dysfluency Modeling for Dysfluent Speech Transcription and Detection

    Authors: Jiachen Lian, Carly Feng, Naasir Farooqi, Steve Li, Anshul Kashyap, Cheol Jun Cho, Peter Wu, Robbie Netzorg, Tingle Li, Gopala Krishna Anumanchipalli

    Abstract: Dysfluent speech modeling requires time-accurate and silence-aware transcription at both the word-level and phonetic-level. However, current research in dysfluency modeling primarily focuses on either transcription or detection, and the performance of each aspect remains limited. In this work, we present an unconstrained dysfluency modeling (UDM) approach that addresses both transcription and dete… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: 2023 ASRU

  10. arXiv:2312.08494  [pdf, other

    cs.SD cs.LG eess.AS

    PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models

    Authors: Robin Netzorg, Ajil Jalal, Luna McNulty, Gopala Krishna Anumanchipalli

    Abstract: Perceptual modification of voice is an elusive goal. While non-experts can modify an image or sentence perceptually with available tools, it is not clear how to similarly modify speech along perceptual axes. Voice conversion does make it possible to convert one voice to another, but these modifications are handled by black box models, and the specifics of what perceptual qualities to modify and ho… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  11. arXiv:2310.16287  [pdf, other

    cs.SD cs.GR eess.AS

    Towards Streaming Speech-to-Avatar Synthesis

    Authors: Tejas S. Prabhune, Peter Wu, Bohan Yu, Gopala K. Anumanchipalli

    Abstract: Streaming speech-to-avatar synthesis creates real-time animations for a virtual character from audio data. Accurate avatar representations of speech are important for the visualization of sound in linguistics, phonetics, and phonology, visual feedback to assist second language acquisition, and virtual embodiment for paralyzed patients. Previous works have highlighted the capability of deep articul… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Submitted to ICASSP 2024

  12. arXiv:2310.10803  [pdf, other

    cs.CL eess.AS

    SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT

    Authors: Cheol Jun Cho, Abdelrahman Mohamed, Shang-Wen Li, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Data-driven unit discovery in self-supervised learning (SSL) of speech has embarked on a new era of spoken language processing. Yet, the discovered units often remain in phonetic space and the units beyond phonemes are largely underexplored. Here, we demonstrate that a syllabic organization emerges in learning sentence-level representation of speech. In particular, we adopt "self-distillation" obj… ▽ More

    Submitted 16 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  13. arXiv:2310.10788  [pdf, other

    eess.AS cs.CL

    Self-Supervised Models of Speech Infer Universal Articulatory Kinematics

    Authors: Cheol Jun Cho, Abdelrahman Mohamed, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Self-Supervised Learning (SSL) based models of speech have shown remarkable performance on a range of downstream tasks. These state-of-the-art models have remained blackboxes, but many recent studies have begun "probing" models like HuBERT, to correlate their internal representations to different aspects of speech. In this paper, we show "inference of articulatory kinematics" as fundamental proper… ▽ More

    Submitted 16 January, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  14. arXiv:2310.02497  [pdf, other

    cs.SD cs.LG eess.AS

    Towards an Interpretable Representation of Speaker Identity via Perceptual Voice Qualities

    Authors: Robin Netzorg, Bohan Yu, Andrea Guzman, Peter Wu, Luna McNulty, Gopala Anumanchipalli

    Abstract: Unlike other data modalities such as text and vision, speech does not lend itself to easy interpretation. While lay people can understand how to describe an image or sentence via perception, non-expert descriptions of speech often end at high-level demographic information, such as gender or age. In this paper, we propose a possible interpretable representation of speaker identity based on perceptu… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  15. arXiv:2309.09088  [pdf, other

    cs.SD eess.AS

    Enhancing GAN-Based Vocoders with Contrastive Learning Under Data-limited Condition

    Authors: Haoming Guo, Seth Z. Zhao, Jiachen Lian, Gopala Anumanchipalli, Gerald Friedland

    Abstract: Vocoder models have recently achieved substantial progress in generating authentic audio comparable to human quality while significantly reducing memory requirement and inference time. However, these data-hungry generative models require large-scale audio data for learning good representations. In this paper, we apply contrastive learning methods in training the vocoder to improve the perceptual q… ▽ More

    Submitted 18 December, 2023; v1 submitted 16 September, 2023; originally announced September 2023.

  16. arXiv:2309.08163  [pdf, other

    cs.CL cs.AI

    Self-Assessment Tests are Unreliable Measures of LLM Personality

    Authors: Akshat Gupta, Xiaoyang Song, Gopala Anumanchipalli

    Abstract: As large language models (LLM) evolve in their capabilities, various recent studies have tried to quantify their behavior using psychological tools created to study human behavior. One such example is the measurement of "personality" of LLMs using self-assessment personality tests developed to measure human personality. Yet almost none of these works verify the applicability of these tests on LLMs… ▽ More

    Submitted 2 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

  17. arXiv:2309.07861  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    CiwaGAN: Articulatory information exchange

    Authors: Gašper Beguš, Thomas Lu, Alan Zhou, Peter Wu, Gopala K. Anumanchipalli

    Abstract: Humans encode information into sounds by controlling articulators and decode information from sounds using the auditory apparatus. This paper introduces CiwaGAN, a model of human spoken language acquisition that combines unsupervised articulatory modeling with an unsupervised model of information exchange through the auditory modality. While prior research includes unsupervised articulatory modeli… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  18. arXiv:2308.06443  [pdf, other

    cs.LG eess.AS

    Neural Latent Aligner: Cross-trial Alignment for Learning Representations of Complex, Naturalistic Neural Data

    Authors: Cheol Jun Cho, Edward F. Chang, Gopala K. Anumanchipalli

    Abstract: Understanding the neural implementation of complex human behaviors is one of the major goals in neuroscience. To this end, it is crucial to find a true representation of the neural data, which is challenging due to the high complexity of behaviors and the low signal-to-ratio (SNR) of the signals. Here, we propose a novel unsupervised learning framework, Neural Latent Aligner (NLA), to find well-co… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: Accepted at ICML 2023

    Journal ref: Proceedings of the 40th International Conference on Machine Learning (2023), PMLR 202:5661-5676

  19. arXiv:2302.06774  [pdf, other

    eess.AS cs.SD

    Speaker-Independent Acoustic-to-Articulatory Speech Inversion

    Authors: Peter Wu, Li-Wei Chen, Cheol Jun Cho, Shinji Watanabe, Louis Goldstein, Alan W Black, Gopala K. Anumanchipalli

    Abstract: To build speech processing methods that can handle speech as naturally as humans, researchers have explored multiple ways of building an invertible map** from speech to an interpretable space. The articulatory space is a promising inversion target, since this space captures the mechanics of speech production. To this end, we build an acoustic-to-articulatory inversion (AAI) model that leverages… ▽ More

    Submitted 24 July, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

  20. arXiv:2210.16498  [pdf, other

    eess.AS cs.SD

    Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization

    Authors: Jiachen Lian, Alan W Black, Yi**g Lu, Louis Goldstein, Shinji Watanabe, Gopala K. Anumanchipalli

    Abstract: Articulatory representation learning is the fundamental research in modeling neural speech production system. Our previous work has established a deep paradigm to decompose the articulatory kinematics data into gestures, which explicitly model the phonological and linguistic structure encoded with human speech production mechanism, and corresponding gestural scores. We continue with this line of w… ▽ More

    Submitted 20 February, 2023; v1 submitted 29 October, 2022; originally announced October 2022.

    Comments: Accepted to 2023 ICASSP. Camera Ready

  21. arXiv:2210.15272  [pdf, ps, other

    eess.AS cs.SD eess.SP

    A Fast and Accurate Pitch Estimation Algorithm Based on the Pseudo Wigner-Ville Distribution

    Authors: Yisi Liu, Peter Wu, Alan W Black, Gopala K. Anumanchipalli

    Abstract: Estimation of fundamental frequency (F0) in voiced segments of speech signals, also known as pitch tracking, plays a crucial role in pitch synchronous speech analysis, speech synthesis, and speech manipulation. In this paper, we capitalize on the high time and frequency resolution of the pseudo Wigner-Ville distribution (PWVD) and propose a new PWVD-based pitch estimation method. We devise an effi… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

  22. arXiv:2210.15173  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Articulation GAN: Unsupervised modeling of articulatory learning

    Authors: Gašper Beguš, Alan Zhou, Peter Wu, Gopala K Anumanchipalli

    Abstract: Generative deep neural networks are widely used for speech synthesis, but most existing models directly generate waveforms or spectral outputs. Humans, however, produce speech by controlling articulators, which results in the production of speech sounds through physical properties of sound propagation. We introduce the Articulatory Generator to the Generative Adversarial Network paradigm, a new un… ▽ More

    Submitted 12 March, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: ICASSP 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing

  23. Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech

    Authors: Cheol Jun Cho, Peter Wu, Abdelrahman Mohamed, Gopala K. Anumanchipalli

    Abstract: Recent self-supervised learning (SSL) models have proven to learn rich representations of speech, which can readily be utilized by diverse downstream tasks. To understand such utilities, various analyses have been done for speech SSL models to reveal which and how information is encoded in the learned representations. Although the scope of previous analyses is extensive in acoustic, phonetic, and… ▽ More

    Submitted 20 July, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

  24. arXiv:2209.06337  [pdf, other

    eess.AS cs.SD q-bio.QM

    Deep Speech Synthesis from Articulatory Representations

    Authors: Peter Wu, Shinji Watanabe, Louis Goldstein, Alan W Black, Gopala K. Anumanchipalli

    Abstract: In the articulatory synthesis task, speech is synthesized from input features containing information about the physical behavior of the human vocal tract. This task provides a promising direction for speech synthesis research, as the articulatory space is compact, smooth, and interpretable. Current works have highlighted the potential for deep learning models to perform articulatory synthesis. How… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

  25. arXiv:2206.02512  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    UTTS: Unsupervised TTS with Conditional Disentangled Sequential Variational Auto-encoder

    Authors: Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu

    Abstract: In this paper, we propose a novel unsupervised text-to-speech (UTTS) framework which does not require text-audio pairs for the TTS acoustic modeling (AM). UTTS is a multi-speaker speech synthesizer that supports zero-shot voice cloning, it is developed from a perspective of disentangled speech representation learning. The framework offers a flexible choice of a speaker's duration model, timbre fea… ▽ More

    Submitted 11 October, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: Under Review

  26. arXiv:2205.05227  [pdf, ps, other

    eess.AS cs.AI cs.CL cs.SD

    Towards Improved Zero-shot Voice Conversion with Conditional DSVAE

    Authors: Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu

    Abstract: Disentangling content and speaking style information is essential for zero-shot non-parallel voice conversion (VC). Our previous study investigated a novel framework with disentangled sequential variational autoencoder (DSVAE) as the backbone for information decomposition. We have demonstrated that simultaneous disentangling content embedding and speaker embedding from one utterance is feasible fo… ▽ More

    Submitted 20 June, 2022; v1 submitted 10 May, 2022; originally announced May 2022.

    Comments: Accepted to 2022 Interspeech. Demo link is here https://jlian2.github.io/Improved-Voice-Conversion-with-Conditional-DSVAE/

  27. arXiv:2204.00465  [pdf, other

    eess.AS cs.AI eess.SP

    Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition

    Authors: Jiachen Lian, Alan W Black, Louis Goldstein, Gopala Krishna Anumanchipalli

    Abstract: Most of the research on data-driven speech representation learning has focused on raw audios in an end-to-end manner, paying little attention to their internal phonological or gestural structure. This work, investigating the speech representations derived from articulatory kinematics signals, uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data… ▽ More

    Submitted 20 June, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: Accepted to 2022 Interspeech. Code is publicly available at https://github.com/Berkeley-Speech-Group/ema_gesture

  28. arXiv:1909.01401  [pdf, other

    cs.LG cs.CL q-bio.NC stat.ML

    Brain2Char: A Deep Architecture for Decoding Text from Brain Recordings

    Authors: Pengfei Sun, Gopala K. Anumanchipalli, Edward F. Chang

    Abstract: Decoding language representations directly from the brain can enable new Brain-Computer Interfaces (BCI) for high bandwidth human-human and human-machine communication. Clinically, such technologies can restore communication in people with neurological conditions affecting their ability to speak. In this study, we propose a novel deep network architecture Brain2Char, for directly decoding text (sp… ▽ More

    Submitted 3 September, 2019; originally announced September 2019.