Search | arXiv e-print repository

Audiobox: Unified Audio Generation with Natural Language Prompts

Authors: Apoorv Vyas, Bowen Shi, Matthew Le, Andros Tjandra, Yi-Chiao Wu, Baishan Guo, Jiemin Zhang, Xinyue Zhang, Robert Adkins, William Ngan, Jeff Wang, Ivan Cruz, Bapi Akula, Akinniyi Akinyemi, Brian Ellis, Rashel Moritz, Yael Yungster, Alice Rakotoarison, Liang Tan, Chris Summers, Carleigh Wood, Joshua Lane, Mary Williamson, Wei-Ning Hsu

Abstract: Audio is an essential part of our life, but creating it often requires expertise and is time-consuming. Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data. However, these models lack controllability in sever… ▽ More Audio is an essential part of our life, but creating it often requires expertise and is time-consuming. Research communities have made great progress over the past year advancing the performance of large scale audio generative models for a single modality (speech, sound, or music) through adopting more powerful generative models and scaling data. However, these models lack controllability in several aspects: speech generation models cannot synthesize novel styles based on text description and are limited on domain coverage such as outdoor environments; sound generation models only provide coarse-grained control based on descriptions like "a person speaking" and would only generate mumbling human voices. This paper presents Audiobox, a unified model based on flow-matching that is capable of generating various audio modalities. We design description-based and example-based prompting to enhance controllability and unify speech and sound generation paradigms. We allow transcript, vocal, and other audio styles to be controlled independently when generating speech. To improve model generalization with limited labels, we adapt a self-supervised infilling objective to pre-train on large quantities of unlabeled audio. Audiobox sets new benchmarks on speech and sound generation (0.745 similarity on Librispeech for zero-shot TTS; 0.77 FAD on AudioCaps for text-to-sound) and unlocks new methods for generating audio with novel vocal and acoustic styles. We further integrate Bespoke Solvers, which speeds up generation by over 25 times compared to the default ODE solver for flow-matching, without loss of performance on several tasks. Our demo is available at https://audiobox.metademolab.com/ △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2306.04765 [pdf, other]

The HCI Aspects of Public Deployment of Research Chatbots: A User Study, Design Recommendations, and Open Challenges

Authors: Morteza Behrooz, William Ngan, Joshua Lane, Giuliano Morse, Benjamin Babcock, Kurt Shuster, Mojtaba Komeili, Moya Chen, Melanie Kambadur, Y-Lan Boureau, Jason Weston

Abstract: Publicly deploying research chatbots is a nuanced topic involving necessary risk-benefit analyses. While there have recently been frequent discussions on whether it is responsible to deploy such models, there has been far less focus on the interaction paradigms and design approaches that the resulting interfaces should adopt, in order to achieve their goals more effectively. We aim to pose, ground… ▽ More Publicly deploying research chatbots is a nuanced topic involving necessary risk-benefit analyses. While there have recently been frequent discussions on whether it is responsible to deploy such models, there has been far less focus on the interaction paradigms and design approaches that the resulting interfaces should adopt, in order to achieve their goals more effectively. We aim to pose, ground, and attempt to answer HCI questions involved in this scope, by reporting on a mixed-methods user study conducted on a recent research chatbot. We find that abstract anthropomorphic representation for the agent has a significant effect on user's perception, that offering AI explainability may have an impact on feedback rates, and that two (diegetic and extradiegetic) levels of the chat experience should be intentionally designed. We offer design recommendations and areas of further focus for the research community. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2306.04707 [pdf, other]

Improving Open Language Models by Learning from Organic Interactions

Authors: **g Xu, Da Ju, Joshua Lane, Mojtaba Komeili, Eric Michael Smith, Megan Ung, Morteza Behrooz, William Ngan, Rashel Moritz, Sainbayar Sukhbaatar, Y-Lan Boureau, Jason Weston, Kurt Shuster

Abstract: We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety. We are publicly releasing the participating de-identified interaction data for use by the research community, in order to spur further progress. Training models with org… ▽ More We present BlenderBot 3x, an update on the conversational model BlenderBot 3, which is now trained using organic conversation and feedback data from participating users of the system in order to improve both its skills and safety. We are publicly releasing the participating de-identified interaction data for use by the research community, in order to spur further progress. Training models with organic data is challenging because interactions with people "in the wild" include both high quality conversations and feedback, as well as adversarial and toxic behavior. We study techniques that enable learning from helpful teachers while avoiding learning from people who are trying to trick the model into unhelpful or toxic responses. BlenderBot 3x is both preferred in conversation to BlenderBot 3, and is shown to produce safer responses in challenging situations. While our current models are still far from perfect, we believe further improvement can be achieved by continued use of the techniques explored in this work. △ Less

Submitted 7 June, 2023; originally announced June 2023.

arXiv:2208.03188 [pdf, other]

BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

Authors: Kurt Shuster, **g Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, Morteza Behrooz, William Ngan, Spencer Poff, Naman Goyal, Arthur Szlam, Y-Lan Boureau, Melanie Kambadur, Jason Weston

Abstract: We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (arc… ▽ More We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (architecture, model and training scheme), and details of its deployment, including safety mechanisms. Human evaluations show its superiority to existing open-domain dialogue agents, including its predecessors (Roller et al., 2021; Komeili et al., 2022). Finally, we detail our plan for continual learning using the data collected from deployment, which will also be publicly released. The goal of this research program is thus to enable the community to study ever-improving responsible agents that learn through interaction. △ Less

Submitted 10 August, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

arXiv:2103.12842 [pdf]

Is radicalization reinforced by social media censorship?

Authors: Justin E. Lane, Kevin McCaffree, F. LeRon Shults

Abstract: Radicalized beliefs, such as those tied to QAnon, Russiagate, and other political conspiracy theories, can lead some individuals and groups to engage in violent behavior, as evidenced in recent months. Understanding the mechanisms by which such beliefs are accepted, spread, and intensified is critical for any attempt to mitigate radicalization and avoid increased political polarization. This artic… ▽ More Radicalized beliefs, such as those tied to QAnon, Russiagate, and other political conspiracy theories, can lead some individuals and groups to engage in violent behavior, as evidenced in recent months. Understanding the mechanisms by which such beliefs are accepted, spread, and intensified is critical for any attempt to mitigate radicalization and avoid increased political polarization. This article presents and agent-based model of a social media network that enables investigation of the effects of censorship on the amount of dissenting information to which agents become exposed and the certainty of their radicalized views. The model explores two forms of censorship: 1) decentralized censorship-in which individuals can choose to break an online social network tie (unfriend or unfollow) with another individual who transmits conflicting beliefs and 2) centralized censorship-in which a single authority can ban an individual from the social media network for spreading a certain type of belief. This model suggests that both forms of censorship increase certainty in radicalized views by decreasing the amount of dissent to which an agent is exposed, but centralized "banning" of individuals has the strongest effect on radicalization. △ Less

Submitted 23 March, 2021; originally announced March 2021.

arXiv:2102.11009 [pdf]

The Moral Foundations of Left-Wing Authoritarianism: On the Character, Cohesion, and Clout of Tribal Equalitarian Discourse

Authors: Justin E. Lane, Kevin McCaffree, F. LeRon Shults

Abstract: Left-wing authoritarianism remains far less understood than right-wing authoritarianism. We contribute to the literature on the former, which typically relies on surveys, using a new social media analytics approach. We use a list of 60 terms to provide an exploratory sketch of the outlines of a political ideology (tribal equalitarianism) with origins in 19th and 20th century social philosophy. We… ▽ More Left-wing authoritarianism remains far less understood than right-wing authoritarianism. We contribute to the literature on the former, which typically relies on surveys, using a new social media analytics approach. We use a list of 60 terms to provide an exploratory sketch of the outlines of a political ideology (tribal equalitarianism) with origins in 19th and 20th century social philosophy. We then use analyses of the English Corpus of Google Books (over 8 million books) and scraped unique tweets from Twitter (n = 202,852) to conduct a series of investigations to discern the extent to which this ideology is cohesive amongst the public, reveals signatures of authoritarianism and has been growing in popularity. Though exploratory, our results provide some evidence of left-wing authoritarianism in two forms (1) a uniquely conservative moral signature amongst ostensible liberals using measures from Moral Foundations Theory and (2) a substantial prevalence of anger, relative to anxiety or sadness. In general, results indicate that this worldview is growing in popularity, is increasingly cohesive, and shows signatures of authoritarianism. △ Less

Submitted 22 February, 2021; originally announced February 2021.

arXiv:2009.09425 [pdf]

Modelling Threat Causation for Religiosity and Nationalism in Europe

Authors: Josh Bullock, Justin E. Lane, Igor Mikloušić, LeRon Shults

Abstract: Europe's contemporary political landscape has been shaped by massive shifts in recent decades caused by geopolitical upheavals such as Brexit and now, COVID-19. The way in which policy makers respond to the current pandemic could have large effects on how the world looks after the pandemic subsides. We aim to investigate complex questions post COVID-19 around the relationships and intersections co… ▽ More Europe's contemporary political landscape has been shaped by massive shifts in recent decades caused by geopolitical upheavals such as Brexit and now, COVID-19. The way in which policy makers respond to the current pandemic could have large effects on how the world looks after the pandemic subsides. We aim to investigate complex questions post COVID-19 around the relationships and intersections concerning nationalism, religiosity, and anti-immigrant sentiment from a socio-cognitive perspective by applying a mixed-method approach (survey and modelling); in a context where unprecedented contagion threats have caused huge instability. There are still significant gaps in the scholarly literature on populism and nationalism. In particular, there is a lack of attention to the role of evolved human psychology in responding to persistent threats, which can fall into four broad categories in the literature: predation (threats to one's life via being eaten or killed in some other way), contagion (threats to one's life via physical infection), natural (threats to one's life via natural disasters), and social (threats to one's life by destroying social standing). These threats have been discussed in light of their effects on religion and other forms of behaviour, but they have not been employed to study nationalist and populist behaviours. In what follows, two studies are presented that begin to fill this gap in the literature. The first is a survey used to inform our theoretical framework and explore the different possible relationships in an online sample. The second is a study of a computer simulation. Both studies (completed in 2020) found very clear effects among the relevant variables, enabling us to identify trends that require further explanation and research as we move toward models that can adequately inform policy discussions. △ Less

Submitted 26 September, 2020; v1 submitted 20 September, 2020; originally announced September 2020.

Comments: https://github.com/cogijl/kingstonThreatStudy

arXiv:2001.04298 [pdf]

Using Deep Learning to Explore Local Physical Similarity for Global-scale Bridging in Thermal-hydraulic Simulation

Authors: Han Bao, Nam Dinh, Linyu Lin, Robert Youngblood, Jeffrey Lane, Hongbin Zhang

Abstract: Current system thermal-hydraulic codes have limited credibility in simulating real plant conditions, especially when the geometry and boundary conditions are extrapolated beyond the range of test facilities. This paper proposes a data-driven approach, Feature Similarity Measurement FFSM), to establish a technical basis to overcome these difficulties by exploring local patterns using machine learni… ▽ More Current system thermal-hydraulic codes have limited credibility in simulating real plant conditions, especially when the geometry and boundary conditions are extrapolated beyond the range of test facilities. This paper proposes a data-driven approach, Feature Similarity Measurement FFSM), to establish a technical basis to overcome these difficulties by exploring local patterns using machine learning. The underlying local patterns in multiscale data are represented by a set of physical features that embody the information from a physical system of interest, empirical correlations, and the effect of mesh size. After performing a limited number of high-fidelity numerical simulations and a sufficient amount of fast-running coarse-mesh simulations, an error database is built, and deep learning is applied to construct and explore the relationship between the local physical features and simulation errors. Case studies based on mixed convection have been designed for demonstrating the capability of data-driven models in bridging global scale gaps. △ Less

Submitted 6 January, 2020; originally announced January 2020.

Comments: 24 pages, 10 tables, 12 figures. This manuscript has been submitted to Annuals of Nuclear Energy

arXiv:1910.05807 [pdf, other]

Shared E-scooters: Business, Pleasure, or Transit?

Authors: William Espinoza, Matthew Howard, Julia Lane, Pascal Van Hentenryck

Abstract: Shared e-scooters have become a familiar sight in many cities around the world. Yet the role they play in the mobility space is still poorly understood. This paper presents a study of the use of Bird e-scooters in the city of Atlanta. Starting with raw data which contains the location of available Birds over time, the study identifies trips and leverages the Google Places API to associate each tri… ▽ More Shared e-scooters have become a familiar sight in many cities around the world. Yet the role they play in the mobility space is still poorly understood. This paper presents a study of the use of Bird e-scooters in the city of Atlanta. Starting with raw data which contains the location of available Birds over time, the study identifies trips and leverages the Google Places API to associate each trip origin and destination with a Point of Interest (POI). The resulting trip data is then used to understand the role of e-scooters in mobility by clustering trips using 10 collections of POIs, including business, food and recreation, parking, transit, health, and residential. The trips between these POI clusters reveal some surprising, albeit sensible, findings about the role of e-scooters in mobility, as well as the time of the day where they are most popular. △ Less

Submitted 13 October, 2019; originally announced October 2019.

arXiv:1902.02492 [pdf, other]

Dual-Reference Design for Holographic Coherent Diffraction Imaging

Authors: David A. Barmherzig, Ju Sun, Emmanuel J. Candès, T. J. Lane, Po-Nan Li

Abstract: A new reference design is introduced for holographic coherent diffraction imaging. This consists in two references - "block" and "pinhole" shaped regions - placed adjacent to the imaging specimen. An efficient recovery algorithm is provided for the resulting holographic phase retrieval problem, which is based on solving a structured, overdetermined linear system. Analysis of the expected recovery… ▽ More A new reference design is introduced for holographic coherent diffraction imaging. This consists in two references - "block" and "pinhole" shaped regions - placed adjacent to the imaging specimen. An efficient recovery algorithm is provided for the resulting holographic phase retrieval problem, which is based on solving a structured, overdetermined linear system. Analysis of the expected recovery error on noisy data, which is contaminated by Poisson shot noise, shows that this simple modification synergizes the individual references and hence leads to uniformly superior performance over single-reference schemes. Numerical experiments on simulated data confirm the theoretical prediction, and the proposed dual-reference scheme achieves a smaller recovery error than leading single-reference schemes. △ Less

Submitted 25 June, 2019; v1 submitted 7 February, 2019; originally announced February 2019.

arXiv:1901.06453 [pdf, other]

doi 10.1088/1361-6420/ab23d1

Holographic Phase Retrieval and Reference Design

Authors: David A. Barmherzig, Ju Sun, T. J. Lane, Po-Nan Li, Emmanuel J. Candès

Abstract: A general mathematical framework and recovery algorithm is presented for the holographic phase retrieval problem. In this problem, which arises in holographic coherent diffraction imaging, a "reference" portion of the signal to be recovered via phase retrieval is a priori known from experimental design. A generic formula is also derived for the expected recovery error when the measurement data is… ▽ More A general mathematical framework and recovery algorithm is presented for the holographic phase retrieval problem. In this problem, which arises in holographic coherent diffraction imaging, a "reference" portion of the signal to be recovered via phase retrieval is a priori known from experimental design. A generic formula is also derived for the expected recovery error when the measurement data is corrupted by Poisson shot noise. This facilitates an optimization perspective towards reference design and analysis. We employ this optimization perspective towards quantifying the performance of various reference choices. △ Less

Submitted 21 April, 2019; v1 submitted 18 January, 2019; originally announced January 2019.

Comments: 27 pages, 10 figures

arXiv:1712.02881 [pdf, other]

doi 10.1109/HUMANOIDS.2017.8246925

A Pilot Study on Using an Intelligent Life-like Robot as a Companion for Elderly Individuals with Dementia and Depression

Authors: Hojjat Abdollahi, Ali Mollahosseini, Josh T. Lane, Mohammad H. Mahoor

Abstract: This paper presents the design, development, methodology, and the results of a pilot study on using an intelligent, emotive and perceptive social robot (aka Companionbot) for improving the quality of life of elderly people with dementia and/or depression. Ryan Companionbot prototyped in this project, is a rear-projected life-like conversational robot. Ryan is equipped with features that can (1) in… ▽ More This paper presents the design, development, methodology, and the results of a pilot study on using an intelligent, emotive and perceptive social robot (aka Companionbot) for improving the quality of life of elderly people with dementia and/or depression. Ryan Companionbot prototyped in this project, is a rear-projected life-like conversational robot. Ryan is equipped with features that can (1) interpret and respond to users' emotions through facial expressions and spoken language, (2) proactively engage in conversations with users, and (3) remind them about their daily life schedules (e.g. taking their medicine on time). Ryan engages users in cognitive games and reminiscence activities. We conducted a pilot study with six elderly individuals with moderate dementia and/or depression living in a senior living facility in Denver. Each individual had 24/7 access to a Ryan in his/her room for a period of 4-6 weeks. Our observations of these individuals, interviews with them and their caregivers, and analyses of their interactions during this period revealed that they established rapport with the robot and greatly valued and enjoyed having a Companionbot in their room. △ Less

Submitted 7 December, 2017; originally announced December 2017.

Comments: Published in 2017 IEEE-RAS International Conference on Humanoid Robots

Showing 1–12 of 12 results for author: Lane, J