Search | arXiv e-print repository

Spin-Wave Voices: Sonification of Nanoscale Spin Waves as an Engagement and Research Tool

Authors: Santa Pile, Oleg Lesota, Silvan David Peter, Christina Humer, Martin Gasser

Abstract: Magnonics is an emerging research field that addresses the use of spin waves (magnons), purely magnetic waves, for information transport and processing. Spin waves are a potential replacement for electric current in modern computational devices that would make them more compact and energy efficient. The field is yet little known, even among physicists. Additionally, with the development of new mea… ▽ More Magnonics is an emerging research field that addresses the use of spin waves (magnons), purely magnetic waves, for information transport and processing. Spin waves are a potential replacement for electric current in modern computational devices that would make them more compact and energy efficient. The field is yet little known, even among physicists. Additionally, with the development of new measuring techniques and computational physics, the obtained magnetic data becomes more complex, in some cases including 3D vector fields and time-resolution. This work presents an approach to the audio-visual representation of the spin waves and discusses its use as a tool for science communication exhibits and possible data analysis tool. The work also details an instance of such an exhibit presented at the annual international digital art exhibition Ars Electronica Festival in 2022. △ Less

Submitted 21 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: Accepted to The 29th International Conference on Auditory Display (ICAD 2024) conference proceedings

arXiv:2401.02979 [pdf, other]

doi 10.1145/3632754.3632759

Are we describing the same sound? An analysis of word embedding spaces of expressive piano performance

Authors: Silvan David Peter, Shreyan Chowdhury, Carlos Eduardo Cancino-Chacón, Gerhard Widmer

Abstract: Semantic embeddings play a crucial role in natural language-based information retrieval. Embedding models represent words and contexts as vectors whose spatial configuration is derived from the distribution of words in large text corpora. While such representations are generally very powerful, they might fail to account for fine-grained domain-specific nuances. In this article, we investigate this… ▽ More Semantic embeddings play a crucial role in natural language-based information retrieval. Embedding models represent words and contexts as vectors whose spatial configuration is derived from the distribution of words in large text corpora. While such representations are generally very powerful, they might fail to account for fine-grained domain-specific nuances. In this article, we investigate this uncertainty for the domain of characterizations of expressive piano performance. Using a music research dataset of free text performance characterizations and a follow-up study sorting the annotations into clusters, we derive a ground truth for a domain-specific semantic similarity structure. We test five embedding models and their similarity structure for correspondence with the ground truth. We further assess the effects of contextualizing prompts, hubness reduction, cross-modal similarity, and k-means clustering. The quality of embedding models shows great variability with respect to this task; more general models perform better than domain-adapted ones and the best model configurations reach human-level agreement. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Journal ref: Proceedings of the Forum for Information Retrieval Evaluation, FIRE, 2023, Panjim, India

arXiv:2401.00471 [pdf, other]

doi 10.1145/3625135.3625141

Sounding Out Reconstruction Error-Based Evaluation of Generative Models of Expressive Performance

Authors: Silvan David Peter, Carlos Eduardo Cancino-Chacón, Emmanouil Karystinaios, Gerhard Widmer

Abstract: Generative models of expressive piano performance are usually assessed by comparing their predictions to a reference human performance. A generative algorithm is taken to be better than competing ones if it produces performances that are closer to a human reference performance. However, expert human performers can (and do) interpret music in different ways, making for different possible references… ▽ More Generative models of expressive piano performance are usually assessed by comparing their predictions to a reference human performance. A generative algorithm is taken to be better than competing ones if it produces performances that are closer to a human reference performance. However, expert human performers can (and do) interpret music in different ways, making for different possible references, and quantitative closeness is not necessarily aligned with perceptual similarity, raising concerns about the validity of this evaluation approach. In this work, we present a number of experiments that shed light on this problem. Using precisely measured high-quality performances of classical piano music, we carry out a listening test indicating that listeners can sometimes perceive subtle performance difference that go unnoticed under quantitative evaluation. We further present tests that indicate that such evaluation frameworks show a lot of variability in reliability and validity across different reference performances and pieces. We discuss these results and their implications for quantitative evaluation, and hope to foster a critical appreciation of the uncertainties involved in quantitative assessments of such performances within the wider music information retrieval (MIR) community. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Journal ref: 10th International Conference on Digital Libraries for Musicology, November 10, 2023, Milan, Italy

arXiv:2401.00466 [pdf, other]

doi 10.5281/zenodo.10265367

Online Symbolic Music Alignment with Offline Reinforcement Learning

Authors: Silvan David Peter

Abstract: Symbolic Music Alignment is the process of matching performed MIDI notes to corresponding score notes. In this paper, we introduce a reinforcement learning (RL)-based online symbolic music alignment technique. The RL agent - an attention-based neural network - iteratively estimates the current score position from local score and performance contexts. For this symbolic alignment task, environment s… ▽ More Symbolic Music Alignment is the process of matching performed MIDI notes to corresponding score notes. In this paper, we introduce a reinforcement learning (RL)-based online symbolic music alignment technique. The RL agent - an attention-based neural network - iteratively estimates the current score position from local score and performance contexts. For this symbolic alignment task, environment states can be sampled exhaustively and the reward is dense, rendering a formulation as a simplified offline RL problem straightforward. We evaluate the trained agent in three ways. First, in its capacity to identify correct score positions for sampled test contexts; second, as the core technique of a complete algorithm for symbolic online note-wise alignment; and finally, as a real-time symbolic score follower. We further investigate the pitch-based score and performance representations used as the agent's inputs. To this end, we develop a second model, a two-step Dynamic Time War** (DTW)-based offline alignment algorithm leveraging the same input representation. The proposed model outperforms a state-of-the-art reference model of offline symbolic music alignment. △ Less

Submitted 31 December, 2023; originally announced January 2024.

Journal ref: Proceedings of the 24th International Society for Music Information Retrieval Conference, {ISMIR} 2023, Milan, Italy, November 5-9, 2023

arXiv:2208.14958 [pdf, other]

A Realism Metric for Generated LiDAR Point Clouds

Authors: Larissa T. Triess, Christoph B. Rist, David Peter, J. Marius Zöllner

Abstract: A considerable amount of research is concerned with the generation of realistic sensor data. LiDAR point clouds are generated by complex simulations or learned generative models. The generated data is usually exploited to enable or improve downstream perception algorithms. Two major questions arise from these procedures: First, how to evaluate the realism of the generated data? Second, does more r… ▽ More A considerable amount of research is concerned with the generation of realistic sensor data. LiDAR point clouds are generated by complex simulations or learned generative models. The generated data is usually exploited to enable or improve downstream perception algorithms. Two major questions arise from these procedures: First, how to evaluate the realism of the generated data? Second, does more realistic data also lead to better perception performance? This paper addresses both questions and presents a novel metric to quantify the realism of LiDAR point clouds. Relevant features are learned from real-world and synthetic point clouds by training on a proxy classification task. In a series of experiments, we demonstrate the application of our metric to determine the realism of generated LiDAR data and compare the realism estimation of our metric to the performance of a segmentation model. We confirm that our metric provides an indication for the downstream segmentation performance. △ Less

Submitted 31 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2109.11775

arXiv:2206.01104 [pdf, other]

The match file format: Encoding Alignments between Scores and Performances

Authors: Francesco Foscarin, Emmanouil Karystinaios, Silvan David Peter, Carlos Cancino-Chacón, Maarten Grachten, Gerhard Widmer

Abstract: This paper presents the specifications of match: a file format that extends a MIDI human performance with note-, beat-, and downbeat-level alignments to a corresponding musical score. This enables advanced analyses of the performance that are relevant for various tasks, such as expressive performance modeling, score following, music transcription, and performer classification. The match file inclu… ▽ More This paper presents the specifications of match: a file format that extends a MIDI human performance with note-, beat-, and downbeat-level alignments to a corresponding musical score. This enables advanced analyses of the performance that are relevant for various tasks, such as expressive performance modeling, score following, music transcription, and performer classification. The match file includes a set of score-related descriptors that makes it usable also as a bare-bones score representation. For applications that require the use of structural score elements (e.g., voices, parts, beams, slurs), the match file can be easily combined with the symbolic score. To support the practical application of our work, we release a corrected and upgraded version of the Vienna4x22 dataset of scores and performances aligned with match files. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Journal ref: Proceedings of the Music Encoding Conference (MEC), 2022, Halifax, Canada

arXiv:2206.01071 [pdf, other]

Partitura: A Python Package for Symbolic Music Processing

Authors: Carlos Cancino-Chacón, Silvan David Peter, Emmanouil Karystinaios, Francesco Foscarin, Maarten Grachten, Gerhard Widmer

Abstract: Partitura is a lightweight Python package for handling symbolic musical information. It provides easy access to features commonly used in music information retrieval tasks, like note arrays (lists of timed pitched events) and 2D piano roll matrices, as well as other score elements such as time and key signatures, performance directives, and repeat structures. Partitura can load musical scores (in… ▽ More Partitura is a lightweight Python package for handling symbolic musical information. It provides easy access to features commonly used in music information retrieval tasks, like note arrays (lists of timed pitched events) and 2D piano roll matrices, as well as other score elements such as time and key signatures, performance directives, and repeat structures. Partitura can load musical scores (in MEI, MusicXML, Kern, and MIDI formats), MIDI performances, and score-to-performance alignments. The package includes some tools for music analysis, such as automatic pitch spelling, key signature identification, and voice separation. Partitura is an open-source project and is available at https://github.com/CPJKU/partitura/. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Journal ref: Proceedings of the Music Encoding Conference (MEC), 2022, Halifax, Canada

arXiv:2202.08526 [pdf, other]

Point Cloud Generation with Continuous Conditioning

Authors: Larissa T. Triess, Andre Bühler, David Peter, Fabian B. Flohr, J. Marius Zöllner

Abstract: Generative models can be used to synthesize 3D objects of high quality and diversity. However, there is typically no control over the properties of the generated object.This paper proposes a novel generative adversarial network (GAN) setup that generates 3D point cloud shapes conditioned on a continuous parameter. In an exemplary application, we use this to guide the generative process to create a… ▽ More Generative models can be used to synthesize 3D objects of high quality and diversity. However, there is typically no control over the properties of the generated object.This paper proposes a novel generative adversarial network (GAN) setup that generates 3D point cloud shapes conditioned on a continuous parameter. In an exemplary application, we use this to guide the generative process to create a 3D object with a custom-fit shape. We formulate this generation process in a multi-task setting by using the concept of auxiliary classifier GANs. Further, we propose to sample the generator label input for training from a kernel density estimation (KDE) of the dataset. Our ablations show that this leads to significant performance increase in regions with few samples. Extensive quantitative and qualitative experiments show that we gain explicit control over the object dimensions while maintaining good generation quality and diversity. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: Accepted at International Conference on Artificial Intelligence and Statistics (AISTATS) 2022

Journal ref: 2022 International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 151:4462-4481

arXiv:2111.15615 [pdf, other]

Semi-Local Convolutions for LiDAR Scan Processing

Authors: Larissa T. Triess, David Peter, J. Marius Zöllner

Abstract: A number of applications, such as mobile robots or automated vehicles, use LiDAR sensors to obtain detailed information about their three-dimensional surroundings. Many methods use image-like projections to efficiently process these LiDAR measurements and use deep convolutional neural networks to predict semantic classes for each point in the scan. The spatial stationary assumption enables the usa… ▽ More A number of applications, such as mobile robots or automated vehicles, use LiDAR sensors to obtain detailed information about their three-dimensional surroundings. Many methods use image-like projections to efficiently process these LiDAR measurements and use deep convolutional neural networks to predict semantic classes for each point in the scan. The spatial stationary assumption enables the usage of convolutions. However, LiDAR scans exhibit large differences in appearance over the vertical axis. Therefore, we propose semi local convolution (SLC), a convolution layer with reduced amount of weight-sharing along the vertical dimension. We are first to investigate the usage of such a layer independent of any other model changes. Our experiments did not show any improvement over traditional convolution layers in terms of segmentation IoU or accuracy. △ Less

Submitted 30 November, 2021; originally announced November 2021.

Comments: arXiv admin note: text overlap with arXiv:2004.11803

Journal ref: ICBINB Workshop at NeurIPS 2021

arXiv:2109.11775 [pdf, other]

doi 10.1007/978-3-030-92659-5_44

Quantifying point cloud realism through adversarially learned latent representations

Authors: Larissa T. Triess, David Peter, Stefan A. Baur, J. Marius Zöllner

Abstract: Judging the quality of samples synthesized by generative models can be tedious and time consuming, especially for complex data structures, such as point clouds. This paper presents a novel approach to quantify the realism of local regions in LiDAR point clouds. Relevant features are learned from real-world and synthetic point clouds by training on a proxy classification task. Inspired by fair netw… ▽ More Judging the quality of samples synthesized by generative models can be tedious and time consuming, especially for complex data structures, such as point clouds. This paper presents a novel approach to quantify the realism of local regions in LiDAR point clouds. Relevant features are learned from real-world and synthetic point clouds by training on a proxy classification task. Inspired by fair networks, we use an adversarial technique to discourage the encoding of dataset-specific information. The resulting metric can assign a quality score to samples without requiring any task specific annotations. In a series of experiments, we confirm the soundness of our metric by applying it in controllable task setups and on unseen data. Additional experiments show reliable interpolation capabilities of the metric between data with varying degree of realism. As one important application, we demonstrate how the local realism score can be used for anomaly detection in point clouds. △ Less

Submitted 24 September, 2021; originally announced September 2021.

Comments: 2021 German Conference on Pattern Recognition (GCPR). Project Page: http://ltriess.github.io/lidar-metric

Journal ref: 2021 German Conference on Pattern Recognition (GCPR)

arXiv:2104.06666 [pdf, other]

End-to-end Keyword Spotting using Neural Architecture Search and Quantization

Authors: David Peter, Wolfgang Roth, Franz Pernkopf

Abstract: This paper introduces neural architecture search (NAS) for the automatic discovery of end-to-end keyword spotting (KWS) models in limited resource environments. We employ a differentiable NAS approach to optimize the structure of convolutional neural networks (CNNs) operating on raw audio waveforms. After a suitable KWS model is found with NAS, we conduct quantization of weights and activations to… ▽ More This paper introduces neural architecture search (NAS) for the automatic discovery of end-to-end keyword spotting (KWS) models in limited resource environments. We employ a differentiable NAS approach to optimize the structure of convolutional neural networks (CNNs) operating on raw audio waveforms. After a suitable KWS model is found with NAS, we conduct quantization of weights and activations to reduce the memory footprint. We conduct extensive experiments on the Google speech commands dataset. In particular, we compare our end-to-end approach to mel-frequency cepstral coefficient (MFCC) based systems. For quantization, we compare fixed bit-width quantization and trained bit-width quantization. Using NAS only, we were able to obtain a highly efficient model with an accuracy of 95.55% using 75.7k parameters and 13.6M operations. Using trained bit-width quantization, the same model achieves a test accuracy of 93.76% while using on average only 2.91 bits per activation and 2.51 bits per weight. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Comments: arXiv admin note: text overlap with arXiv:2012.10138

arXiv:2012.10138 [pdf, other]

Resource-efficient DNNs for Keyword Spotting using Neural Architecture Search and Quantization

Authors: David Peter, Wolfgang Roth, Franz Pernkopf

Abstract: This paper introduces neural architecture search (NAS) for the automatic discovery of small models for keyword spotting (KWS) in limited resource environments. We employ a differentiable NAS approach to optimize the structure of convolutional neural networks (CNNs) to maximize the classification accuracy while minimizing the number of operations per inference. Using NAS only, we were able to obtai… ▽ More This paper introduces neural architecture search (NAS) for the automatic discovery of small models for keyword spotting (KWS) in limited resource environments. We employ a differentiable NAS approach to optimize the structure of convolutional neural networks (CNNs) to maximize the classification accuracy while minimizing the number of operations per inference. Using NAS only, we were able to obtain a highly efficient model with 95.4% accuracy on the Google speech commands dataset with 494.8 kB of memory usage and 19.6 million operations. Additionally, weight quantization is used to reduce the memory consumption even further. We show that weight quantization to low bit-widths (e.g. 1 bit) can be used without substantial loss in accuracy. By increasing the number of input features from 10 MFCC to 20 MFCC we were able to increase the accuracy to 96.3% at 340.1 kB of memory usage and 27.1 million operations. △ Less

Submitted 18 December, 2020; originally announced December 2020.

arXiv:2004.11803 [pdf, other]

doi 10.1109/IV47402.2020.9304631

Scan-based Semantic Segmentation of LiDAR Point Clouds: An Experimental Study

Authors: Larissa T. Triess, David Peter, Christoph B. Rist, J. Marius Zöllner

Abstract: Autonomous vehicles need to have a semantic understanding of the three-dimensional world around them in order to reason about their environment. State of the art methods use deep neural networks to predict semantic classes for each point in a LiDAR scan. A powerful and efficient way to process LiDAR measurements is to use two-dimensional, image-like projections. In this work, we perform a comprehe… ▽ More Autonomous vehicles need to have a semantic understanding of the three-dimensional world around them in order to reason about their environment. State of the art methods use deep neural networks to predict semantic classes for each point in a LiDAR scan. A powerful and efficient way to process LiDAR measurements is to use two-dimensional, image-like projections. In this work, we perform a comprehensive experimental study of image-based semantic segmentation architectures for LiDAR point clouds. We demonstrate various techniques to boost the performance and to improve runtime as well as memory constraints. First, we examine the effect of network size and suggest that much faster inference times can be achieved at a very low cost to accuracy. Next, we introduce an improved point cloud projection technique that does not suffer from systematic occlusions. We use a cyclic padding mechanism that provides context at the horizontal field-of-view boundaries. In a third part, we perform experiments with a soft Dice loss function that directly optimizes for the intersection-over-union metric. Finally, we propose a new kind of convolution layer with a reduced amount of weight-sharing along one of the two spatial dimensions, addressing the large difference in appearance along the vertical axis of a LiDAR scan. We propose a final set of the above methods with which the model achieves an increase of 3.2% in mIoU segmentation performance over the baseline while requiring only 42% of the original inference time. △ Less

Submitted 24 September, 2021; v1 submitted 6 April, 2020; originally announced April 2020.

Comments: Project Page: http://ltriess.github.io/scan-semseg

Journal ref: IEEE Intelligent Vehicles Symposium (IV), 2020, pp. 1116-1121

arXiv:1907.00787 [pdf, other]

doi 10.1109/IVS.2019.8813771

CNN-based synthesis of realistic high-resolution LiDAR data

Authors: Larissa T. Triess, David Peter, Christoph B. Rist, Markus Enzweiler, J. Marius Zöllner

Abstract: This paper presents a novel CNN-based approach for synthesizing high-resolution LiDAR point cloud data. Our approach generates semantically and perceptually realistic results with guidance from specialized loss-functions. First, we utilize a modified per-point loss that addresses missing LiDAR point measurements. Second, we align the quality of our generated output with real-world sensor data by a… ▽ More This paper presents a novel CNN-based approach for synthesizing high-resolution LiDAR point cloud data. Our approach generates semantically and perceptually realistic results with guidance from specialized loss-functions. First, we utilize a modified per-point loss that addresses missing LiDAR point measurements. Second, we align the quality of our generated output with real-world sensor data by applying a perceptual loss. In large-scale experiments on real-world datasets, we evaluate both the geometric accuracy and semantic segmentation performance using our generated data vs. ground truth. In a mean opinion score testing we further assess the perceptual quality of our generated point clouds. Our results demonstrate a significant quantitative and qualitative improvement in both geometry and semantics over traditional non CNN-based up-sampling methods. △ Less

Submitted 24 September, 2021; v1 submitted 28 June, 2019; originally announced July 2019.

Comments: Project Page: http://ltriess.github.io/pc-upsampling

Journal ref: IEEE Intelligent Vehicles Symposium (IV), 2019, pp. 1512-1519

arXiv:1901.10183 [pdf, other]

A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep Learning

Authors: Tal Ben-Nun, Maciej Besta, Simon Huber, Alexandros Nikolaos Ziogas, Daniel Peter, Torsten Hoefler

Abstract: We introduce Deep500: the first customizable benchmarking infrastructure that enables fair comparison of the plethora of deep learning frameworks, algorithms, libraries, and techniques. The key idea behind Deep500 is its modular design, where deep learning is factorized into four distinct levels: operators, network processing, training, and distributed training. Our evaluation illustrates that Dee… ▽ More We introduce Deep500: the first customizable benchmarking infrastructure that enables fair comparison of the plethora of deep learning frameworks, algorithms, libraries, and techniques. The key idea behind Deep500 is its modular design, where deep learning is factorized into four distinct levels: operators, network processing, training, and distributed training. Our evaluation illustrates that Deep500 is customizable (enables combining and benchmarking different deep learning codes) and fair (uses carefully selected metrics). Moreover, Deep500 is fast (incurs negligible overheads), verifiable (offers infrastructure to analyze correctness), and reproducible. Finally, as the first distributed and reproducible benchmarking system for deep learning, Deep500 provides software infrastructure to utilize the most powerful supercomputers for extreme-scale workloads. △ Less

Submitted 13 June, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

Comments: Accepted to IPDPS 2019

arXiv:1804.09915 [pdf, other]

Boosting LiDAR-based Semantic Labeling by Cross-Modal Training Data Generation

Authors: Florian Piewak, Peter **gera, Manuel Schäfer, David Peter, Beate Schwarz, Nick Schneider, David Pfeiffer, Markus Enzweiler, Marius Zöllner

Abstract: Mobile robots and autonomous vehicles rely on multi-modal sensor setups to perceive and understand their surroundings. Aside from cameras, LiDAR sensors represent a central component of state-of-the-art perception systems. In addition to accurate spatial perception, a comprehensive semantic understanding of the environment is essential for efficient and safe operation. In this paper we present a n… ▽ More Mobile robots and autonomous vehicles rely on multi-modal sensor setups to perceive and understand their surroundings. Aside from cameras, LiDAR sensors represent a central component of state-of-the-art perception systems. In addition to accurate spatial perception, a comprehensive semantic understanding of the environment is essential for efficient and safe operation. In this paper we present a novel deep neural network architecture called LiLaNet for point-wise, multi-class semantic labeling of semi-dense LiDAR data. The network utilizes virtual image projections of the 3D point clouds for efficient inference. Further, we propose an automated process for large-scale cross-modal training data generation called Autolabeling, in order to boost semantic labeling performance while kee** the manual annotation effort low. The effectiveness of the proposed network architecture as well as the automated data generation process is demonstrated on a manually annotated ground truth dataset. LiLaNet is shown to significantly outperform current state-of-the-art CNN architectures for LiDAR data. Applying our automatically generated large-scale training data yields a boost of up to 14 percentage points compared to networks trained on manually annotated data only. △ Less

Submitted 26 April, 2018; originally announced April 2018.

Showing 1–16 of 16 results for author: Peter, D