Search | arXiv e-print repository

Artistic Curve Steganography Carried by Musical Audio

Abstract: In this work, we create artistic closed loop curves that trace out images and 3D shapes, which we then hide in musical audio as a form of steganography. We use traveling salesperson art to create artistic plane loops to trace out image contours, and we use Hamiltonian cycles on triangle meshes to create artistic space loops that fill out 3D surfaces. Our embedding scheme is designed to faithfully… ▽ More In this work, we create artistic closed loop curves that trace out images and 3D shapes, which we then hide in musical audio as a form of steganography. We use traveling salesperson art to create artistic plane loops to trace out image contours, and we use Hamiltonian cycles on triangle meshes to create artistic space loops that fill out 3D surfaces. Our embedding scheme is designed to faithfully preserve the geometry of these loops after lossy compression, while kee** their presence undetectable to the audio listener. To accomplish this, we hide each dimension of the curve in a different frequency, and we perturb a sliding window sum of the magnitude of that frequency to best match the target curve at that dimension, while hiding scale information in that frequency's phase. In the process, we exploit geometric properties of the curves to help to more effectively hide and recover them. Our scheme is simple and encoding happens efficiently with a nonnegative least squares framework, while decoding is trivial. We validate our technique quantitatively on large datasets of images and audio, and we show results of a crowd sourced listening test that validate that the hidden information is indeed unobtrusive. △ Less

Submitted 28 January, 2023; originally announced January 2023.

Comments: 18 pages, 14 figures, in Proceedings of EvoMUSART 2023

ACM Class: I.3.8; E.4; I.5.4

arXiv:2212.01648 [pdf, other]

The DOPE Distance is SIC: A Stable, Informative, and Computable Metric on Time Series And Ordered Merge Trees

Authors: Christopher J. Tralie, Zachary Schlamowitz, Jose Arbelo, Antonio I. Delgado, Charley Kirk, Nicholas A. Scoville

Abstract: Metrics for merge trees that are simultaneously stable, informative, and efficiently computable have so far eluded researchers. We show in this work that it is possible to devise such a metric when restricting merge trees to ordered domains such as the interval and the circle. We present the ``dynamic ordered persistence editing'' (DOPE) distance, which we prove is stable and informative while sat… ▽ More Metrics for merge trees that are simultaneously stable, informative, and efficiently computable have so far eluded researchers. We show in this work that it is possible to devise such a metric when restricting merge trees to ordered domains such as the interval and the circle. We present the ``dynamic ordered persistence editing'' (DOPE) distance, which we prove is stable and informative while satisfying metric properties. We then devise a simple $O(N^2)$ dynamic programming algorithm to compute it on the interval and an $O(N^3)$ algorithm to compute it on the circle. Surprisingly, we accomplish this by ignoring all of the hierarchical information of the merge tree and simply focusing on a sequence of ordered critical points, which can be interpreted as a time series. Thus our algorithm is more similar to string edit distance and dynamic time war** than it is to more conventional merge tree comparison algorithms. In the context of time series with the interval as a domain, we show empirically on the UCR time series classification dataset that DOPE performs better than bottleneck/Wasserstein distances between persistence diagrams. △ Less

Submitted 3 December, 2022; originally announced December 2022.

Comments: 31 pages, 12 Figures

ACM Class: H.3.3; E.1; F.2.1

arXiv:2109.02472 [pdf, other]

doi 10.1109/MSP.2021.3105941

Audio-based Musical Version Identification: Elements and Challenges

Authors: Furkan Yesiler, Guillaume Doras, Rachel M. Bittner, Christopher J. Tralie, Joan Serrà

Abstract: In this article, we aim to provide a review of the key ideas and approaches proposed in 20 years of scientific literature around musical version identification (VI) research and connect them to current practice. For more than a decade, VI systems suffered from the accuracy-scalability trade-off, with attempts to increase accuracy that typically resulted in cumbersome, non-scalable systems. Recent… ▽ More In this article, we aim to provide a review of the key ideas and approaches proposed in 20 years of scientific literature around musical version identification (VI) research and connect them to current practice. For more than a decade, VI systems suffered from the accuracy-scalability trade-off, with attempts to increase accuracy that typically resulted in cumbersome, non-scalable systems. Recent years, however, have witnessed the rise of deep learning-based approaches that take a step toward bridging the accuracy-scalability gap, yielding systems that can realistically be deployed in industrial applications. Although this trend positively influences the number of researchers and institutions working on VI, it may also result in obscuring the literature before the deep learning era. To appreciate two decades of novel ideas in VI research and to facilitate building better systems, we now review some of the successful concepts and applications proposed in the literature and study their evolution throughout the years. △ Less

Submitted 6 September, 2021; originally announced September 2021.

Comments: Accepted to be published in IEEE Signal Processing Magazine

arXiv:2008.02734 [pdf, other]

Exact, Parallelizable Dynamic Time War** Alignment with Linear Memory

Authors: Christopher Tralie, Elizabeth Dempsey

Abstract: Audio alignment is a fundamental preprocessing step in many MIR pipelines. For two audio clips with M and N frames, respectively, the most popular approach, dynamic time war** (DTW), has O(MN) requirements in both memory and computation, which is prohibitive for frame-level alignments at reasonable rates. To address this, a variety of memory efficient algorithms exist to approximate the optimal… ▽ More Audio alignment is a fundamental preprocessing step in many MIR pipelines. For two audio clips with M and N frames, respectively, the most popular approach, dynamic time war** (DTW), has O(MN) requirements in both memory and computation, which is prohibitive for frame-level alignments at reasonable rates. To address this, a variety of memory efficient algorithms exist to approximate the optimal alignment under the DTW cost. To our knowledge, however, no exact algorithms exist that are guaranteed to break the quadratic memory barrier. In this work, we present a divide and conquer algorithm that computes the exact globally optimal DTW alignment using O(M+N) memory. Its runtime is still O(MN), trading off memory for a 2x increase in computation. However, the algorithm can be parallelized up to a factor of min(M, N) with the same memory constraints, so it can still run more efficiently than the textbook version with an adequate GPU. We use our algorithm to compute exact alignments on a collection of orchestral music, which we use as ground truth to benchmark the alignment accuracy of several popular approximate alignment schemes at scales that were not previously possible. △ Less

Submitted 4 August, 2020; originally announced August 2020.

Comments: 12 Pages, 6 Figures, 1 Table, ISMIR 2020

ACM Class: H.5.5; H.3.3; F.2.1

arXiv:1902.01023 [pdf, other]

Enhanced Hierarchical Music Structure Annotations via Feature Level Similarity Fusion

Authors: Christopher J. Tralie, Brian McFee

Abstract: We describe a novel pipeline to automatically discover hierarchies of repeated sections in musical audio. The proposed method uses similarity network fusion (SNF) to combine different frame-level features into clean affinity matrices, which are then used as input to spectral clustering. While prior spectral clustering approaches to music structure analysis have pre-processed affinity matrices with… ▽ More We describe a novel pipeline to automatically discover hierarchies of repeated sections in musical audio. The proposed method uses similarity network fusion (SNF) to combine different frame-level features into clean affinity matrices, which are then used as input to spectral clustering. While prior spectral clustering approaches to music structure analysis have pre-processed affinity matrices with heuristics specifically designed for this task, we show that the SNF approach directly yields segmentations which agree better with human annotators, as measured by the ``L-measure'' metric for hierarchical annotations. Furthermore, the SNF approach immediately supports arbitrarily many input features, allowing us to simultaneously discover structure encoded in timbral, harmonic, and rhythmic representations without any changes to the base algorithm. △ Less

Submitted 3 February, 2019; originally announced February 2019.

Comments: 5 pages, 3 figures, 1 table

ACM Class: H.5.5

Journal ref: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2019

arXiv:1810.10324 [pdf, other]

Multi-scale Geometric Summaries for Similarity-based Sensor Fusion

Authors: Christopher J. Tralie, Paul Bendich, John Harer

Abstract: In this work, we address fusion of heterogeneous sensor data using wavelet-based summaries of fused self-similarity information from each sensor. The technique we develop is quite general, does not require domain specific knowledge or physical models, and requires no training. Nonetheless, it can perform surprisingly well at the general task of differentiating classes of time-ordered behavior sequ… ▽ More In this work, we address fusion of heterogeneous sensor data using wavelet-based summaries of fused self-similarity information from each sensor. The technique we develop is quite general, does not require domain specific knowledge or physical models, and requires no training. Nonetheless, it can perform surprisingly well at the general task of differentiating classes of time-ordered behavior sequences which are sensed by more than one modality. As a demonstration of our capabilities in the audio to video context, we focus on the differentiation of speech sequences. Data from two or more modalities first are represented using self-similarity matrices(SSMs) corresponding to time-ordered point clouds in feature spaces of each of these data sources; we note that these feature spaces can be of entirely different scale and dimensionality. A fused similarity template is then derived from the modality-specific SSMs using a technique called similarity network fusion (SNF). We investigate pipelines using SNF as both an upstream (feature-level) and a downstream (ranking-level) fusion technique. Multiscale geometric features of this template are then extracted using a recently-developed technique called the scattering transform, and these features are then used to differentiate speech sequences. This method outperforms unsupervised techniques which operate directly on the raw data, and it also outperforms stovepiped methods which operate on SSMs separately derived from the distinct modalities. The benefits of this method become even more apparent as the simulated peak signal to noise ratio decreases. △ Less

Submitted 4 January, 2019; v1 submitted 13 October, 2018; originally announced October 2018.

Comments: 9 pages, 13 Figures

MSC Class: 65T60 ACM Class: H.5.5; H.5.1; I.5

arXiv:1809.07131 [pdf, other]

Twisty Takens: A Geometric Characterization of Good Observations on Dense Trajectories

Authors: Boyan Xu, Christopher J. Tralie, Alice Antia, Michael Lin, Jose A. Perea

Abstract: In nonlinear time series analysis and dynamical systems theory, Takens' embedding theorem states that the sliding window embedding of a generic observation along trajectories in a state space, recovers the region traversed by the dynamics. This can be used, for instance, to show that sliding window embeddings of periodic signals recover topological loops, and that sliding window embeddings of quas… ▽ More In nonlinear time series analysis and dynamical systems theory, Takens' embedding theorem states that the sliding window embedding of a generic observation along trajectories in a state space, recovers the region traversed by the dynamics. This can be used, for instance, to show that sliding window embeddings of periodic signals recover topological loops, and that sliding window embeddings of quasiperiodic signals recover high-dimensional torii. However, in spite of these motivating examples, Takens' theorem does not in general prescribe how to choose such an observation function given particular dynamics in a state space. In this work, we state conditions on observation functions defined on compact Riemannian manifolds, that lead to successful reconstructions for particular dynamics. We apply our theory and construct families of time series whose sliding window embeddings trace tori, Klein bottles, spheres, and projective planes. This greatly enriches the set of examples of time series known to concentrate on various shapes via sliding window embeddings, and will hopefully help other researchers in identifying them in naturally occurring phenomena. We also present numerical experiments showing how to recover low dimensional representations of the underlying dynamics on state space, by using the persistent cohomology of sliding window embeddings and Eilenberg-MacLane (i.e., circular and real projective) coordinates. △ Less

Submitted 5 May, 2019; v1 submitted 19 September, 2018; originally announced September 2018.

Comments: 25 pages, 12 figures

MSC Class: 37M10; 37M05; 37N99 ACM Class: I.3.5; G.1.m

arXiv:1806.06347 [pdf, other]

Cover Song Synthesis by Analogy

Authors: Christopher J. Tralie

Abstract: In this work, we pose and address the following "cover song analogies" problem: given a song A by artist 1 and a cover song A' of this song by artist 2, and given a different song B by artist 1, synthesize a song B' which is a cover of B in the style of artist 2. Normally, such a polyphonic style transfer problem would be quite challenging, but we show how the cover songs example constrains the pr… ▽ More In this work, we pose and address the following "cover song analogies" problem: given a song A by artist 1 and a cover song A' of this song by artist 2, and given a different song B by artist 1, synthesize a song B' which is a cover of B in the style of artist 2. Normally, such a polyphonic style transfer problem would be quite challenging, but we show how the cover songs example constrains the problem, making it easier to solve. First, we extract the longest common beat-synchronous subsequence between A and A', and we time stretch the corresponding beat intervals in A' so that they align with A. We then derive a version of joint 2D convolutional NMF, which we apply to the constant-Q spectrograms of the synchronized segments to learn a translation dictionary of sound templates from A to A'. Finally, we apply the learned templates as filters to the song B, and we mash up the translated filtered components into the synthesized song B' using audio mosaicing. We showcase our algorithm on several examples, including a synthesized cover version of Michael Jackson's "Bad" by Alien Ant Farm, learned from the latter's "Smooth Criminal" cover.' △ Less

Submitted 29 June, 2018; v1 submitted 17 June, 2018; originally announced June 2018.

Comments: 11 pages, 5 figures

ACM Class: H.5.5; H.5.1

arXiv:1805.06021 [pdf, other]

Topological Eulerian Synthesis of Slow Motion Periodic Videos

Authors: Christopher Tralie, Matthew Berger

Abstract: We consider the problem of taking a video that is comprised of multiple periods of repetitive motion, and reordering the frames of the video into a single period, producing a detailed, single cycle video of motion. This problem is challenging, as such videos often contain noise, drift due to camera motion and from cycle to cycle, and irrelevant background motion/occlusions, and these factors can c… ▽ More We consider the problem of taking a video that is comprised of multiple periods of repetitive motion, and reordering the frames of the video into a single period, producing a detailed, single cycle video of motion. This problem is challenging, as such videos often contain noise, drift due to camera motion and from cycle to cycle, and irrelevant background motion/occlusions, and these factors can confound the relevant periodic motion we seek in the video. To address these issues in a simple and efficient manner, we introduce a tracking free Eulerian approach for synthesizing a single cycle of motion. Our approach is geometric: we treat each frame as a point in high-dimensional Euclidean space, and analyze the sliding window embedding formed by this sequence of points, which yields samples along a topological loop regardless of the type of periodic motion. We combine tools from topological data analysis and spectral geometric analysis to estimate the phase of each window, and we exploit the sliding window structure to robustly reorder frames. We show quantitative results that highlight the robustness of our technique to camera shake, noise, and occlusions, and qualitative results of single-cycle motion synthesis across a variety of scenarios. △ Less

Submitted 15 May, 2018; originally announced May 2018.

Comments: 9 pages, 5 Figures. IEEE International Conference on Image Processing, 2018

ACM Class: H.5.1; I.3.3; I.4.m; G.2.2

arXiv:1711.08569 [pdf, other]

Geometric Cross-Modal Comparison of Heterogeneous Sensor Data

Authors: Christopher J. Tralie, Abraham Smith, Nathan Borggren, Jay Hineman, Paul Bendich, Peter Zulch, John Harer

Abstract: In this work, we address the problem of cross-modal comparison of aerial data streams. A variety of simulated automobile trajectories are sensed using two different modalities: full-motion video, and radio-frequency (RF) signals received by detectors at various locations. The information represented by the two modalities is compared using self-similarity matrices (SSMs) corresponding to time-order… ▽ More In this work, we address the problem of cross-modal comparison of aerial data streams. A variety of simulated automobile trajectories are sensed using two different modalities: full-motion video, and radio-frequency (RF) signals received by detectors at various locations. The information represented by the two modalities is compared using self-similarity matrices (SSMs) corresponding to time-ordered point clouds in feature spaces of each of these data sources; we note that these feature spaces can be of entirely different scale and dimensionality. Several metrics for comparing SSMs are explored, including a cutting-edge time-war** technique that can simultaneously handle local time war** and partial matches, while also controlling for the change in geometry between feature spaces of the two modalities. We note that this technique is quite general, and does not depend on the choice of modalities. In this particular setting, we demonstrate that the cross-modal distance between SSMs corresponding to the same trajectory type is smaller than the cross-modal distance between SSMs corresponding to distinct trajectory types, and we formalize this observation via precision-recall metrics in experiments. Finally, we comment on promising implications of these ideas for future integration into multiple-hypothesis tracking systems. △ Less

Submitted 22 November, 2017; originally announced November 2017.

Comments: 10 pages, 13 figures, Proceedings of IEEE Aeroconf 2017

ACM Class: I.5.4; I.4.9; J.2

arXiv:1711.07513 [pdf, other]

Self-Similarity Based Time War**

Authors: Christopher J. Tralie

Abstract: In this work, we explore the problem of aligning two time-ordered point clouds which are spatially transformed and re-parameterized versions of each other. This has a diverse array of applications such as cross modal time series synchronization (e.g. MOCAP to video) and alignment of discretized curves in images. Most other works that address this problem attempt to jointly uncover a spatial alignm… ▽ More In this work, we explore the problem of aligning two time-ordered point clouds which are spatially transformed and re-parameterized versions of each other. This has a diverse array of applications such as cross modal time series synchronization (e.g. MOCAP to video) and alignment of discretized curves in images. Most other works that address this problem attempt to jointly uncover a spatial alignment and correspondences between the two point clouds, or to derive local invariants to spatial transformations such as curvature before computing correspondences. By contrast, we sidestep spatial alignment completely by using self-similarity matrices (SSMs) as a proxy to the time-ordered point clouds, since self-similarity matrices are blind to isometries and respect global geometry. Our algorithm, dubbed "Isometry Blind Dynamic Time War**" (IBDTW), is simple and general, and we show that its associated dissimilarity measure lower bounds the L1 Gromov-Hausdorff distance between the two point sets when restricted to war** paths. We also present a local, partial alignment extension of IBDTW based on the Smith Waterman algorithm. This eliminates the need for tedious manual crop** of time series, which is ordinarily necessary for global alignment algorithms to function properly. △ Less

Submitted 20 November, 2017; originally announced November 2017.

Comments: 10 pages, 11 figures

ACM Class: I.4.9; I.5.4; H.5.1; H.5.5

arXiv:1707.04680 [pdf, other]

Early MFCC And HPCP Fusion for Robust Cover Song Identification

Authors: Christopher J. Tralie

Abstract: While most schemes for automatic cover song identification have focused on note-based features such as HPCP and chord profiles, a few recent papers surprisingly showed that local self-similarities of MFCC-based features also have classification power for this task. Since MFCC and HPCP capture complementary information, we design an unsupervised algorithm that combines normalized, beat-synchronous… ▽ More While most schemes for automatic cover song identification have focused on note-based features such as HPCP and chord profiles, a few recent papers surprisingly showed that local self-similarities of MFCC-based features also have classification power for this task. Since MFCC and HPCP capture complementary information, we design an unsupervised algorithm that combines normalized, beat-synchronous blocks of these features using cross-similarity fusion before attempting to locally align a pair of songs. As an added bonus, our scheme naturally incorporates structural information in each song to fill in alignment gaps where both feature sets fail. We show a striking jump in performance over MFCC and HPCP alone, achieving a state of the art mean reciprocal rank of 0.87 on the Covers80 dataset. We also introduce a new medium-sized hand designed benchmark dataset called "Covers 1000," which consists of 395 cliques of cover songs for a total of 1000 songs, and we show that our algorithm achieves an MRR of 0.9 on this dataset for the first correctly identified song in a clique. We provide the precomputed HPCP and MFCC features, as well as beat intervals, for all songs in the Covers 1000 dataset for use in further research. △ Less

Submitted 14 July, 2017; originally announced July 2017.

Comments: 11 pages, 7 figures, Proceedings of The International Society for Music Information Retrieval (ISMIR) 2017

ACM Class: H.5.5

arXiv:1704.08382 [pdf, other]

(Quasi)Periodicity Quantification in Video Data, Using Topology

Authors: Christopher J. Tralie, Jose A. Perea

Abstract: This work introduces a novel framework for quantifying the presence and strength of recurrent dynamics in video data. Specifically, we provide continuous measures of periodicity (perfect repetition) and quasiperiodicity (superposition of periodic modes with non-commensurate periods), in a way which does not require segmentation, training, object tracking or 1-dimensional surrogate signals. Our met… ▽ More This work introduces a novel framework for quantifying the presence and strength of recurrent dynamics in video data. Specifically, we provide continuous measures of periodicity (perfect repetition) and quasiperiodicity (superposition of periodic modes with non-commensurate periods), in a way which does not require segmentation, training, object tracking or 1-dimensional surrogate signals. Our methodology operates directly on video data. The approach combines ideas from nonlinear time series analysis (delay embeddings) and computational topology (persistent homology), by translating the problem of finding recurrent dynamics in video data, into the problem of determining the circularity or toroidality of an associated geometric space. Through extensive testing, we show the robustness of our scores with respect to several noise models/levels, we show that our periodicity score is superior to other methods when compared to human-generated periodicity rankings, and furthermore, we show that our quasiperiodicity score clearly indicates the presence of biphonation in videos of vibrating vocal folds, which has never before been accomplished end to end quantitatively. △ Less

Submitted 21 January, 2018; v1 submitted 26 April, 2017; originally announced April 2017.

Comments: 27 pages, 1 column, 23 figures, SIAM Journal on Imaging Sciences, 2018

ACM Class: I.2.10

arXiv:1602.06245 [pdf, other]

Scaffoldings and Spines: Organizing High-Dimensional Data Using Cover Trees, Local Principal Component Analysis, and Persistent Homology

Authors: Paul Bendich, Ellen Gasparovic, Christopher J. Tralie, John Harer

Abstract: We propose a flexible and multi-scale method for organizing, visualizing, and understanding datasets sampled from or near stratified spaces. The first part of the algorithm produces a cover tree using adaptive thresholds based on a combination of multi-scale local principal component analysis and topological data analysis. The resulting cover tree nodes consist of points within or near the same st… ▽ More We propose a flexible and multi-scale method for organizing, visualizing, and understanding datasets sampled from or near stratified spaces. The first part of the algorithm produces a cover tree using adaptive thresholds based on a combination of multi-scale local principal component analysis and topological data analysis. The resulting cover tree nodes consist of points within or near the same stratum of the stratified space. They are then connected to form a \emph{scaffolding} graph, which is then simplified and collapsed down into a \emph{spine} graph. From this latter graph the stratified structure becomes apparent. We demonstrate our technique on several synthetic point cloud examples and we use it to understand song structure in musical audio data. △ Less

Submitted 27 February, 2016; v1 submitted 19 February, 2016; originally announced February 2016.

Comments: 14 pages

arXiv:1507.05143 [pdf, other]

Cover Song Identification with Timbral Shape Sequences

Authors: Christopher J. Tralie, Paul Bendich

Abstract: We introduce a novel low level feature for identifying cover songs which quantifies the relative changes in the smoothed frequency spectrum of a song. Our key insight is that a sliding window representation of a chunk of audio can be viewed as a time-ordered point cloud in high dimensions. For corresponding chunks of audio between different versions of the same song, these point clouds are approxi… ▽ More We introduce a novel low level feature for identifying cover songs which quantifies the relative changes in the smoothed frequency spectrum of a song. Our key insight is that a sliding window representation of a chunk of audio can be viewed as a time-ordered point cloud in high dimensions. For corresponding chunks of audio between different versions of the same song, these point clouds are approximately rotated, translated, and scaled copies of each other. If we treat MFCC embeddings as point clouds and cast the problem as a relative shape sequence, we are able to correctly identify 42/80 cover songs in the "Covers 80" dataset. By contrast, all other work to date on cover songs exclusively relies on matching note sequences from Chroma derived features. △ Less

Submitted 17 July, 2015; originally announced July 2015.

Showing 1–15 of 15 results for author: Tralie, C