-
A Unified Statistical And Computational Framework For Ex-Post Harmonisation Of Aggregate Statistics
Authors:
Cynthia A. Huang
Abstract:
Ex-post harmonisation is one of many data preprocessing processes used to combine the increasingly vast and diverse sources of data available for research and analysis. Documenting provenance and ensuring the quality of multi-source datasets is vital for ensuring trustworthy scientific research and encouraging reuse of existing harmonisation efforts. However, capturing and communicating statistica…
▽ More
Ex-post harmonisation is one of many data preprocessing processes used to combine the increasingly vast and diverse sources of data available for research and analysis. Documenting provenance and ensuring the quality of multi-source datasets is vital for ensuring trustworthy scientific research and encouraging reuse of existing harmonisation efforts. However, capturing and communicating statistically relevant properties of harmonised datasets is difficult without a universal standard for describing harmonisation operations. Our paper combines mathematical and computer science perspectives to address this need. The Crossmaps Framework defines a new approach for transforming existing variables collected under a specific measurement or classification standard to an imputed counterfactual variable indexed by some target standard. It uses computational graphs to separate intended transformation logic from actual data transformations, and avoid the risk of syntactically valid data manipulation scripts resulting in statistically questionable data. In this paper, we introduce the Crossmaps Framework through the example of ex-post harmonisation of aggregated statistics in the social sciences. We define a new provenance task abstraction, the crossmap transform, and formalise two associated objects, the shared mass array and the crossmap. We further define graph, matrix and list encodings of crossmaps and discuss resulting implications for understanding statistical properties of ex-post harmonisation and designing error minimising workflows.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Visualising category recoding and numeric redistributions
Authors:
Cynthia A. Huang
Abstract:
This paper proposes graphical representations of data and rationale provenance in workflows that convert both category labels and associated numeric data between distinct but semantically related taxonomies. We motivate the graphical representations with a new task abstraction, the cross-taxonomy transformation, and associated graph-based information structure, the crossmap. The task abstraction s…
▽ More
This paper proposes graphical representations of data and rationale provenance in workflows that convert both category labels and associated numeric data between distinct but semantically related taxonomies. We motivate the graphical representations with a new task abstraction, the cross-taxonomy transformation, and associated graph-based information structure, the crossmap. The task abstraction supports the separation of category recoding and numeric redistribution decisions from the specifics of data manipulation in ex-post data harmonisation. The crossmap structure is illustrated using an example conversion of numeric statistics from a country-specific taxonomy to an international classification standard. We discuss the opportunities and challenges of using visualisation to audit and communicate cross-taxonomy transformations and present candidate graphical representations.
△ Less
Submitted 12 August, 2023;
originally announced August 2023.
-
Redefining Relationships in Music
Authors:
Christian Detweiler,
Beth Coleman,
Fernando Diaz,
Lieke Dom,
Chris Donahue,
Jesse Engel,
Cheng-Zhi Anna Huang,
Larry James,
Ethan Manilow,
Amanda McCroskery,
Kyle Pedersen,
Pamela Peter-Agbia,
Negar Rostamzadeh,
Robert Thomas,
Marco Zamarato,
Ben Zevenbergen
Abstract:
AI tools increasingly shape how we discover, make and experience music. While these tools can have the potential to empower creativity, they may fundamentally redefine relationships between stakeholders, to the benefit of some and the detriment of others. In this position paper, we argue that these tools will fundamentally reshape our music culture, with profound effects (for better and for worse)…
▽ More
AI tools increasingly shape how we discover, make and experience music. While these tools can have the potential to empower creativity, they may fundamentally redefine relationships between stakeholders, to the benefit of some and the detriment of others. In this position paper, we argue that these tools will fundamentally reshape our music culture, with profound effects (for better and for worse) on creators, consumers and the commercial enterprises that often connect them. By paying careful attention to emerging Music AI technologies and developments in other creative domains and understanding the implications, people working in this space could decrease the possible negative impacts on the practice, consumption and meaning of music. Given that many of these technologies are already available, there is some urgency in conducting analyses of these technologies now. It is important that people develo** and working with these tools address these issues now to help guide their evolution to be equitable and empower creativity. We identify some potential risks and opportunities associated with existing and forthcoming AI tools for music, though more work is needed to identify concrete actions which leverage the opportunities while mitigating risks.
△ Less
Submitted 16 December, 2022; v1 submitted 13 December, 2022;
originally announced December 2022.
-
Improving Source Separation by Explicitly Modeling Dependencies Between Sources
Authors:
Ethan Manilow,
Curtis Hawthorne,
Cheng-Zhi Anna Huang,
Bryan Pardo,
Jesse Engel
Abstract:
We propose a new method for training a supervised source separation system that aims to learn the interdependent relationships between all combinations of sources in a mixture. Rather than independently estimating each source from a mix, we reframe the source separation problem as an Orderless Neural Autoregressive Density Estimator (NADE), and estimate each source from both the mix and a random s…
▽ More
We propose a new method for training a supervised source separation system that aims to learn the interdependent relationships between all combinations of sources in a mixture. Rather than independently estimating each source from a mix, we reframe the source separation problem as an Orderless Neural Autoregressive Density Estimator (NADE), and estimate each source from both the mix and a random subset of the other sources. We adapt a standard source separation architecture, Demucs, with additional inputs for each individual source, in addition to the input mixture. We randomly mask these input sources during training so that the network learns the conditional dependencies between the sources. By pairing this training method with a block Gibbs sampling procedure at inference time, we demonstrate that the network can iteratively improve its separation performance by conditioning a source estimate on its earlier source estimates. Experiments on two source separation datasets show that training a Demucs model with an Orderless NADE approach and using Gibbs sampling (up to 512 steps) at inference time strongly outperforms a Demucs baseline that uses a standard regression loss and direct (one step) estimation of sources.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling
Authors:
Yusong Wu,
Ethan Manilow,
Yi Deng,
Rigel Swavely,
Kyle Kastner,
Tim Cooijmans,
Aaron Courville,
Cheng-Zhi Anna Huang,
Jesse Engel
Abstract:
Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detailed expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control. In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments…
▽ More
Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detailed expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control. In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments that enables both realistic neural audio synthesis and detailed user control. Starting from interpretable Differentiable Digital Signal Processing (DDSP) synthesis parameters, we infer musical notes and high-level properties of their expressive performance (such as timbre, vibrato, dynamics, and articulation). This creates a 3-level hierarchy (notes, performance, synthesis) that affords individuals the option to intervene at each level, or utilize trained priors (performance given notes, synthesis given performance) for creative assistance. Through quantitative experiments and listening tests, we demonstrate that this hierarchy can reconstruct high-fidelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence. By utilizing an interpretable hierarchy, with multiple levels of granularity, MIDI-DDSP opens the door to assistive tools to empower individuals across a diverse range of musical experience.
△ Less
Submitted 17 March, 2022; v1 submitted 16 December, 2021;
originally announced December 2021.
-
AI Song Contest: Human-AI Co-Creation in Songwriting
Authors:
Cheng-Zhi Anna Huang,
Hendrik Vincent Koops,
Ed Newton-Rex,
Monica Dinculescu,
Carrie J. Cai
Abstract:
Machine learning is challenging the way we make music. Although research in deep generative models has dramatically improved the capability and fluency of music models, recent work has shown that it can be challenging for humans to partner with this new class of algorithms. In this paper, we present findings on what 13 musician/developer teams, a total of 61 users, needed when co-creating a song w…
▽ More
Machine learning is challenging the way we make music. Although research in deep generative models has dramatically improved the capability and fluency of music models, recent work has shown that it can be challenging for humans to partner with this new class of algorithms. In this paper, we present findings on what 13 musician/developer teams, a total of 61 users, needed when co-creating a song with AI, the challenges they faced, and how they leveraged and repurposed existing characteristics of AI to overcome some of these challenges. Many teams adopted modular approaches, such as independently running multiple smaller models that align with the musical building blocks of a song, before re-combining their results. As ML models are not easily steerable, teams also generated massive numbers of samples and curated them post-hoc, or used a range of strategies to direct the generation, or algorithmically ranked the samples. Ultimately, teams not only had to manage the "flare and focus" aspects of the creative process, but also juggle them with a parallel process of exploring and curating multiple ML models and outputs. These findings reflect a need to design machine learning-powered music interfaces that are more decomposable, steerable, interpretable, and adaptive, which in return will enable artists to more effectively explore how AI can extend their personal expression.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
The Bach Doodle: Approachable music composition with machine learning at scale
Authors:
Cheng-Zhi Anna Huang,
Curtis Hawthorne,
Adam Roberts,
Monica Dinculescu,
James Wexler,
Leon Hong,
Jacob Howcroft
Abstract:
To make music composition more approachable, we designed the first AI-powered Google Doodle, the Bach Doodle, where users can create their own melody and have it harmonized by a machine learning model Coconet (Huang et al., 2017) in the style of Bach. For users to input melodies, we designed a simplified sheet-music based interface. To support an interactive experience at scale, we re-implemented…
▽ More
To make music composition more approachable, we designed the first AI-powered Google Doodle, the Bach Doodle, where users can create their own melody and have it harmonized by a machine learning model Coconet (Huang et al., 2017) in the style of Bach. For users to input melodies, we designed a simplified sheet-music based interface. To support an interactive experience at scale, we re-implemented Coconet in TensorFlow.js (Smilkov et al., 2019) to run in the browser and reduced its runtime from 40s to 2s by adopting dilated depth-wise separable convolutions and fusing operations. We also reduced the model download size to approximately 400KB through post-training weight quantization. We calibrated a speed test based on partial model evaluation time to determine if the harmonization request should be performed locally or sent to remote TPU servers. In three days, people spent 350 years worth of time playing with the Bach Doodle, and Coconet received more than 55 million queries. Users could choose to rate their compositions and contribute them to a public dataset, which we are releasing with this paper. We hope that the community finds this dataset useful for applications ranging from ethnomusicological studies, to music education, to improving machine learning models.
△ Less
Submitted 14 July, 2019;
originally announced July 2019.
-
Counterpoint by Convolution
Authors:
Cheng-Zhi Anna Huang,
Tim Cooijmans,
Adam Roberts,
Aaron Courville,
Douglas Eck
Abstract:
Machine learning models of music typically break up the task of composition into a chronological process, composing a piece of music in a single pass from beginning to end. On the contrary, human composers write music in a nonlinear fashion, scribbling motifs here and there, often revisiting choices previously made. In order to better approximate this process, we train a convolutional neural netwo…
▽ More
Machine learning models of music typically break up the task of composition into a chronological process, composing a piece of music in a single pass from beginning to end. On the contrary, human composers write music in a nonlinear fashion, scribbling motifs here and there, often revisiting choices previously made. In order to better approximate this process, we train a convolutional neural network to complete partial musical scores, and explore the use of blocked Gibbs sampling as an analogue to rewriting. Neither the model nor the generative procedure are tied to a particular causal direction of composition. Our model is an instance of orderless NADE (Uria et al., 2014), which allows more direct ancestral sampling. However, we find that Gibbs sampling greatly improves sample quality, which we demonstrate to be due to some conditional distributions being poorly modeled. Moreover, we show that even the cheap approximate blocked Gibbs procedure from Yao et al. (2014) yields better samples than ancestral sampling, based on both log-likelihood and human evaluation.
△ Less
Submitted 17 March, 2019;
originally announced March 2019.
-
Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset
Authors:
Curtis Hawthorne,
Andriy Stasyuk,
Adam Roberts,
Ian Simon,
Cheng-Zhi Anna Huang,
Sander Dieleman,
Erich Elsen,
Jesse Engel,
Douglas Eck
Abstract:
Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales. Fortunately, most music is also highly structured and can be represented as discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of…
▽ More
Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales. Fortunately, most music is also highly structured and can be represented as discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude (~0.1 ms to ~100 s), a process we call Wave2Midi2Wave. This large advance in the state of the art is enabled by our release of the new MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) dataset, composed of over 172 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms. The networks and the dataset together present a promising approach toward creating new expressive and interpretable neural models of music.
△ Less
Submitted 17 January, 2019; v1 submitted 29 October, 2018;
originally announced October 2018.
-
Music Transformer
Authors:
Cheng-Zhi Anna Huang,
Ashish Vaswani,
Jakob Uszkoreit,
Noam Shazeer,
Ian Simon,
Curtis Hawthorne,
Andrew M. Dai,
Matthew D. Hoffman,
Monica Dinculescu,
Douglas Eck
Abstract:
Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence.…
▽ More
Music relies heavily on repetition to build structure and meaning. Self-reference occurs on multiple timescales, from motifs to phrases to reusing of entire sections of music, such as in pieces with ABA structure. The Transformer (Vaswani et al., 2017), a sequence model based on self-attention, has achieved compelling results in many generation tasks that require maintaining long-range coherence. This suggests that self-attention might also be well-suited to modeling music. In musical composition and performance, however, relative timing is critically important. Existing approaches for representing relative positional information in the Transformer modulate attention based on pairwise distance (Shaw et al., 2018). This is impractical for long sequences such as musical compositions since their memory complexity for intermediate relative information is quadratic in the sequence length. We propose an algorithm that reduces their intermediate memory requirement to linear in the sequence length. This enables us to demonstrate that a Transformer with our modified relative attention mechanism can generate minute-long compositions (thousands of steps, four times the length modeled in Oore et al., 2018) with compelling structure, generate continuations that coherently elaborate on a given motif, and in a seq2seq setup generate accompaniments conditioned on melodies. We evaluate the Transformer with our relative attention mechanism on two datasets, JSB Chorales and Piano-e-Competition, and obtain state-of-the-art results on the latter.
△ Less
Submitted 12 December, 2018; v1 submitted 12 September, 2018;
originally announced September 2018.
-
NMR studies of the topological insulator Bi2Te3
Authors:
A. O. Antonenko,
E. V. Charnaya,
D. Yu. Nefedov,
D. Yu. Podorozhkin,
A. V. Uskov,
A. S. Bugaev,
M. K. Lee,
L. J. Chang,
S. V. Naumov,
Yu. A. Perevozchikova,
V. V. Chistyakov,
J. C. A. Huang,
V. V. Marchenkov
Abstract:
Te NMR studies were carried out for the bismuth telluride topological insulator in a wide range from room temperature down to 12.5 K. The measurements were made on a Bruker Avance 400 pulse spectrometer. The NMR spectra were collected for the mortar and pestle powder sample and for single crystalline stacks with orientations c parallel and perpendicular to field. The activation energy responsible…
▽ More
Te NMR studies were carried out for the bismuth telluride topological insulator in a wide range from room temperature down to 12.5 K. The measurements were made on a Bruker Avance 400 pulse spectrometer. The NMR spectra were collected for the mortar and pestle powder sample and for single crystalline stacks with orientations c parallel and perpendicular to field. The activation energy responsible for thermal activation. The spectra for the stack with c parallel to field showed some particular behavior below 91 K.
△ Less
Submitted 18 January, 2017;
originally announced January 2017.
-
Robust topological insulator surface state in MBE grown (Bi_{1-x}Sb_x)_2Se_3
Authors:
Y. Hung Liu,
C. Wei Chong,
W. Chuan Chen,
J. C. A. Huang,
C. -Maw Cheng,
K. -Ding Tsuei,
Z. Li,
H. Qiu,
V. V. Marchenkov
Abstract:
(Bi1-xSbx)2Se3 thin films have been prepared using molecular beam epitaxy (MBE). We demonstrate the angle-resolved photoemission spectroscopy (ARPES) and transport evidence for the existence of strong and robust topological surface states in this ternary system. Large tunability in transport properties by varying the Sb do** level has also been observed, where insulating phase could be achieved…
▽ More
(Bi1-xSbx)2Se3 thin films have been prepared using molecular beam epitaxy (MBE). We demonstrate the angle-resolved photoemission spectroscopy (ARPES) and transport evidence for the existence of strong and robust topological surface states in this ternary system. Large tunability in transport properties by varying the Sb do** level has also been observed, where insulating phase could be achieved at x=0.5. Our results reveal the potential of this system for the study of tunable topological insulator and metal-insulator transition based device physics.
△ Less
Submitted 25 November, 2016;
originally announced November 2016.
-
Substrate-induced structures of bismuth adsorption on graphene: a first principle study
Authors:
S. Y. Lin,
S. L. Chang,
H. H. Chen,
S. H. Su,
J. C. A. Huang,
M. -F. Lin
Abstract:
The geometric and electronic properties of Bi-adsorbed monolayer graphene, enriched by the strong effect of substrate, are investigated by first-principles calculations. The six-layered substrate, corrugated buffer layer, and slightly deformed monolayer graphene are all simulated. Adatom arrangements are thoroughly studied by analyzing the ground-state energies, bismuth adsorption energies, and Bi…
▽ More
The geometric and electronic properties of Bi-adsorbed monolayer graphene, enriched by the strong effect of substrate, are investigated by first-principles calculations. The six-layered substrate, corrugated buffer layer, and slightly deformed monolayer graphene are all simulated. Adatom arrangements are thoroughly studied by analyzing the ground-state energies, bismuth adsorption energies, and Bi-Bi interaction energies of different adatom heights, inter-adatom distance, adsorption sites, and hexagonal positions. A hexagonal array of Bi atoms is dominated by the interactions between the buffer layer and the monolayer graphene. An increase in temperature can overcome a $\sim 50$ meV energy barrier and induce triangular and rectangular nanoclusters. The most stable and metastable structures agree with the scanning tunneling microscopy measurements. The density of states exhibits a finite value at the Fermi level, a dip at $\sim -0.2$ eV, and a peak at $\sim -0.6$ eV, as observed in the experimental measurements of the tunneling conductance.
△ Less
Submitted 22 February, 2016; v1 submitted 28 July, 2015;
originally announced July 2015.
-
Oxygen Vacancy Induced Ferromagnetism in V$_2$O$_{5-x}$
Authors:
Zhi Ren Xiao,
Guang Yu Guo,
Po Han Lee,
Hua Shu Hsu,
Jung Chun Andrew Huang
Abstract:
{\it Ab initio} calculations within density functional theory with generalized gradient approximation have been performed to study the effects of oxygen vacancies on the electronic structure and magnetism in undoped V$_2$O$_{5-x}$ ($0 < x < 0.5$). It is found that the introduction of oxygen vacancies would induce ferromagnetism in V$_2$O$_{5-x}$ with the magnetization being proportional to the O…
▽ More
{\it Ab initio} calculations within density functional theory with generalized gradient approximation have been performed to study the effects of oxygen vacancies on the electronic structure and magnetism in undoped V$_2$O$_{5-x}$ ($0 < x < 0.5$). It is found that the introduction of oxygen vacancies would induce ferromagnetism in V$_2$O$_{5-x}$ with the magnetization being proportional to the O vacancy concentration $x$. The calculated electronic structure reveals that the valence electrons released by the introduction of oxygen vacancies would occupy mainly the neighboring V $d_{xy}$-dominant band which then becomes spin-polarized due to intra-atomic exchange interaction, thereby giving rise to the half-metallic ferromagnetism.
△ Less
Submitted 10 February, 2008;
originally announced February 2008.