Search | arXiv e-print repository

doi 10.1109/TASLP.2024.3407529

MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

Authors: Yu-Fen Huang, Nikki Moran, Simon Coleman, Jon Kelly, Shun-Hwa Wei, Po-Yin Chen, Yun-Hsin Huang, Tsung-** Chen, Yu-Chia Kuo, Yu-Chi Wei, Chih-Hsuan Li, Da-Yu Huang, Hsuan-Kai Kao, Ting-Wei Lin, Li Su

Abstract: In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music m… ▽ More In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music mOtion with Semantic Annotation) dataset, which contains high quality 3-D motion capture data, aligned audio recordings, and note-by-note semantic annotations of pitch, beat, phrase, dynamic, articulation, and harmony for 742 professional music performances by 23 professional musicians, comprising more than 30 hours and 570 K notes of data. To our knowledge, this is the largest cross-modal music dataset with note-level annotations to date. To demonstrate the usage of the MOSA dataset, we present several innovative cross-modal music information retrieval (MIR) and musical content generation tasks, including the detection of beats, downbeats, phrase, and expressive contents from audio, video and motion data, and the generation of musicians' body motion from given music audio. The dataset and codes are available alongside this publication (https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset). △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024. 14 pages, 7 figures. Dataset is available on: https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset/tree/main and https://zenodo.org/records/11393449

arXiv:2110.05162 [pdf]

doi 10.1109/CANOPIEHPC54579.2021.00005.

Deploying Containerized QuantEx Quantum Simulation Software on HPC Systems

Authors: David Brayford, John Brennan, Momme Allalen, Kenneth Hanley, Luigi Iapichino, Lee ORiordan, Niall Moran

Abstract: The simulation of quantum circuits using the tensor network method is very computationally demanding and requires significant High Performance Computing (HPC) resources to find an efficient contraction order and to perform the contraction of the large tensor networks. In addition, the researchers want a workflow that is easy to customize, reproduce and migrate to different HPC systems. In this pap… ▽ More The simulation of quantum circuits using the tensor network method is very computationally demanding and requires significant High Performance Computing (HPC) resources to find an efficient contraction order and to perform the contraction of the large tensor networks. In addition, the researchers want a workflow that is easy to customize, reproduce and migrate to different HPC systems. In this paper, we discuss the issues associated with the deployment of the QuantEX quantum computing simulation software within containers on different HPC systems. Also, we compare the performance of the containerized software with the software running on bare metal. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Journal ref: 2021 3rd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC)

arXiv:1912.07557 [pdf, other]

Self-Play Learning Without a Reward Metric

Authors: Dan Schmidt, Nick Moran, Jonathan S. Rosenfeld, Jonathan Rosenthal, Jonathan Yedidia

Abstract: The AlphaZero algorithm for the learning of strategy games via self-play, which has produced superhuman ability in the games of Go, chess, and shogi, uses a quantitative reward function for game outcomes, requiring the users of the algorithm to explicitly balance different components of the reward against each other, such as the game winner and margin of victory. We present a modification to the A… ▽ More The AlphaZero algorithm for the learning of strategy games via self-play, which has produced superhuman ability in the games of Go, chess, and shogi, uses a quantitative reward function for game outcomes, requiring the users of the algorithm to explicitly balance different components of the reward against each other, such as the game winner and margin of victory. We present a modification to the AlphaZero algorithm that requires only a total ordering over game outcomes, obviating the need to perform any quantitative balancing of reward components. We demonstrate that this system learns optimal play in a comparable amount of time to AlphaZero on a sample game. △ Less

Submitted 16 December, 2019; originally announced December 2019.

Comments: 6 pages, 4 figures

arXiv:1910.11908 [pdf, other]

Noisier2Noise: Learning to Denoise from Unpaired Noisy Data

Authors: Nick Moran, Dan Schmidt, Yu Zhong, Patrick Coady

Abstract: We present a method for training a neural network to perform image denoising without access to clean training examples or access to paired noisy training examples. Our method requires only a single noisy realization of each training example and a statistical model of the noise distribution, and is applicable to a wide variety of noise models, including spatially structured noise. Our model produce… ▽ More We present a method for training a neural network to perform image denoising without access to clean training examples or access to paired noisy training examples. Our method requires only a single noisy realization of each training example and a statistical model of the noise distribution, and is applicable to a wide variety of noise models, including spatially structured noise. Our model produces results which are competitive with other learned methods which require richer training data, and outperforms traditional non-learned denoising methods. We present derivations of our method for arbitrary additive noise, an improvement specific to Gaussian additive noise, and an extension to multiplicative Bernoulli noise. △ Less

Submitted 25 October, 2019; originally announced October 2019.

arXiv:1904.05712 [pdf, other]

Reconstructing Network Inputs with Additive Perturbation Signatures

Authors: Nick Moran, Chiraag Juvekar

Abstract: In this work, we present preliminary results demonstrating the ability to recover a significant amount of information about secret model inputs given only very limited access to model outputs and the ability evaluate the model on additive perturbations to the input. In this work, we present preliminary results demonstrating the ability to recover a significant amount of information about secret model inputs given only very limited access to model outputs and the ability evaluate the model on additive perturbations to the input. △ Less

Submitted 11 April, 2019; originally announced April 2019.

arXiv:1804.04187 [pdf, other]

Coevolutionary Neural Population Models

Authors: Nick Moran, Jordan Pollack

Abstract: We present a method for using neural networks to model evolutionary population dynamics, and draw parallels to recent deep learning advancements in which adversarially-trained neural networks engage in coevolutionary interactions. We conduct experiments which demonstrate that models from evolutionary game theory are capable of describing the behavior of these neural population systems. We present a method for using neural networks to model evolutionary population dynamics, and draw parallels to recent deep learning advancements in which adversarially-trained neural networks engage in coevolutionary interactions. We conduct experiments which demonstrate that models from evolutionary game theory are capable of describing the behavior of these neural population systems. △ Less

Submitted 11 April, 2018; originally announced April 2018.

arXiv:1803.00940 [pdf, other]

Protecting JPEG Images Against Adversarial Attacks

Authors: Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, James Storer

Abstract: As deep neural networks (DNNs) have been integrated into critical systems, several methods to attack these systems have been developed. These adversarial attacks make imperceptible modifications to an image that fool DNN classifiers. We present an adaptive JPEG encoder which defends against many of these attacks. Experimentally, we show that our method produces images with high visual quality whil… ▽ More As deep neural networks (DNNs) have been integrated into critical systems, several methods to attack these systems have been developed. These adversarial attacks make imperceptible modifications to an image that fool DNN classifiers. We present an adaptive JPEG encoder which defends against many of these attacks. Experimentally, we show that our method produces images with high visual quality while greatly reducing the potency of state-of-the-art attacks. Our algorithm requires only a modest increase in encoding time, produces a compressed image which can be decompressed by an off-the-shelf JPEG decoder, and classified by an unmodified classifier △ Less

Submitted 2 March, 2018; originally announced March 2018.

Comments: Accepted to IEEE Data Compression Conference

arXiv:1801.08926 [pdf, other]

Deflecting Adversarial Attacks with Pixel Deflection

Authors: Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, James Storer

Abstract: CNNs are poised to become integral parts of many critical systems. Despite their robustness to natural variations, image pixel values can be manipulated, via small, carefully crafted, imperceptible perturbations, to cause a model to misclassify images. We present an algorithm to process an image so that classification accuracy is significantly preserved in the presence of such adversarial manipula… ▽ More CNNs are poised to become integral parts of many critical systems. Despite their robustness to natural variations, image pixel values can be manipulated, via small, carefully crafted, imperceptible perturbations, to cause a model to misclassify images. We present an algorithm to process an image so that classification accuracy is significantly preserved in the presence of such adversarial manipulations. Image classifiers tend to be robust to natural noise, and adversarial attacks tend to be agnostic to object location. These observations motivate our strategy, which leverages model robustness to defend against adversarial perturbations by forcing the image to match natural image statistics. Our algorithm locally corrupts the image by redistributing pixel values via a process we term pixel deflection. A subsequent wavelet-based denoising operation softens this corruption, as well as some of the adversarial changes. We demonstrate experimentally that the combination of these techniques enables the effective recovery of the true class, against a variety of robust attacks. Our results compare favorably with current state-of-the-art defenses, without requiring retraining or modifying the CNN. △ Less

Submitted 30 March, 2018; v1 submitted 26 January, 2018; originally announced January 2018.

Comments: Accepted to IEEE CVPR 2018 as Spotlight

arXiv:1612.08712 [pdf, other]

Semantic Perceptual Image Compression using Deep Convolution Networks

Authors: Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, James Storer

Abstract: It has long been considered a significant problem to improve the visual quality of lossy image and video compression. Recent advances in computing power together with the availability of large training data sets has increased interest in the application of deep learning cnns to address image recognition and image processing tasks. Here, we present a powerful cnn tailored to the specific task of se… ▽ More It has long been considered a significant problem to improve the visual quality of lossy image and video compression. Recent advances in computing power together with the availability of large training data sets has increased interest in the application of deep learning cnns to address image recognition and image processing tasks. Here, we present a powerful cnn tailored to the specific task of semantic image understanding to achieve higher visual quality in lossy compression. A modest increase in complexity is incorporated to the encoder which allows a standard, off-the-shelf jpeg decoder to be used. While jpeg encoding may be optimized for generic images, the process is ultimately unaware of the specific content of the image to be compressed. Our technique makes jpeg content-aware by designing and training a model to identify multiple semantic regions in a given image. Unlike object detection techniques, our model does not require labeling of object positions and is able to identify objects in a single pass. We present a new cnn architecture directed specifically to image compression, which generates a map that highlights semantically-salient regions so that they can be encoded at higher quality as compared to background regions. By adding a complete set of features for every class, and then taking a threshold over the sum of all feature activations, we generate a map that highlights semantically-salient regions so that they can be encoded at a better quality compared to background regions. Experiments are presented on the Kodak PhotoCD dataset and the MIT Saliency Benchmark dataset, in which our algorithm achieves higher visual quality for the same compressed size. △ Less

Submitted 29 March, 2017; v1 submitted 27 December, 2016; originally announced December 2016.

Comments: Accepted to Data Compression Conference, 11 pages, 5 figures

Showing 1–9 of 9 results for author: Moran, N