-
A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons
Authors:
Tzu-Yun Hung,
Jui-Te Wu,
Yu-Chia Kuo,
Yo-Wei Hsiao,
Ting-Wei Lin,
Li Su
Abstract:
Expressive music synthesis (EMS) for violin performance is a challenging task due to the disagreement among music performers in the interpretation of expressive musical terms (EMTs), scarcity of labeled recordings, and limited generalization ability of the synthesis model. These challenges create trade-offs between model effectiveness, diversity of generated results, and controllability of the syn…
▽ More
Expressive music synthesis (EMS) for violin performance is a challenging task due to the disagreement among music performers in the interpretation of expressive musical terms (EMTs), scarcity of labeled recordings, and limited generalization ability of the synthesis model. These challenges create trade-offs between model effectiveness, diversity of generated results, and controllability of the synthesis system, making it essential to conduct a comparative study on EMS model design. This paper explores two violin EMS approaches. The end-to-end approach is a modification of a state-of-the-art text-to-speech generator. The parameter-controlled approach is based on a simple parameter sampling process that can render note lengths and other parameters compatible with MIDI-DDSP. We study these two approaches (in total, three model variants) through objective and subjective experiments and discuss several key issues of EMS based on the results.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
Authors:
Yu-Fen Huang,
Nikki Moran,
Simon Coleman,
Jon Kelly,
Shun-Hwa Wei,
Po-Yin Chen,
Yun-Hsin Huang,
Tsung-** Chen,
Yu-Chia Kuo,
Yu-Chi Wei,
Chih-Hsuan Li,
Da-Yu Huang,
Hsuan-Kai Kao,
Ting-Wei Lin,
Li Su
Abstract:
In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music m…
▽ More
In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music mOtion with Semantic Annotation) dataset, which contains high quality 3-D motion capture data, aligned audio recordings, and note-by-note semantic annotations of pitch, beat, phrase, dynamic, articulation, and harmony for 742 professional music performances by 23 professional musicians, comprising more than 30 hours and 570 K notes of data. To our knowledge, this is the largest cross-modal music dataset with note-level annotations to date. To demonstrate the usage of the MOSA dataset, we present several innovative cross-modal music information retrieval (MIR) and musical content generation tasks, including the detection of beats, downbeats, phrase, and expressive contents from audio, video and motion data, and the generation of musicians' body motion from given music audio. The dataset and codes are available alongside this publication (https://github.com/yufenhuang/MOSA-Music-mOtion-and-Semantic-Annotation-dataset).
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Multi-Objective Optimization-based Transmit Beamforming for Multi-Target and Multi-User MIMO-ISAC Systems
Authors:
Chunwei Meng,
Zhiqing Wei,
Dingyou Ma,
Wanli Ni,
Liyan Su,
Zhiyong Feng
Abstract:
Integrated sensing and communication (ISAC) is an enabling technology for the sixth-generation mobile communications, which equips the wireless communication networks with sensing capabilities. In this paper, we investigate transmit beamforming design for multiple-input and multiple-output (MIMO)-ISAC systems in scenarios with multiple radar targets and communication users. A general form of multi…
▽ More
Integrated sensing and communication (ISAC) is an enabling technology for the sixth-generation mobile communications, which equips the wireless communication networks with sensing capabilities. In this paper, we investigate transmit beamforming design for multiple-input and multiple-output (MIMO)-ISAC systems in scenarios with multiple radar targets and communication users. A general form of multi-target sensing mutual information (MI) is derived, along with its upper bound, which can be interpreted as the sum of individual single-target sensing MI. Additionally, this upper bound can be achieved by suppressing the cross-correlation among reflected signals from different targets, which aligns with the principles of adaptive MIMO radar. Then, we propose a multi-objective optimization framework based on the signal-to-interference-plus-noise ratio of each user and the tight upper bound of sensing MI, introducing the Pareto boundary to characterize the achievable communication-sensing performance boundary of the proposed ISAC system. To achieve the Pareto boundary, the max-min system utility function method is employed, while considering the fairness between communication users and radar targets. Subsequently, the bisection search method is employed to find a specific Pareto optimal solution by solving a series of convex feasible problems. Finally, simulation results validate that the proposed method achieves a better tradeoff between multi-user communication and multi-target sensing performance. Additionally, utilizing the tight upper bound of sensing MI as a performance metric can enhance the multi-target resolution capability and angle estimation accuracy.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Learning Correction Errors via Frequency-Self Attention for Blind Image Super-Resolution
Authors:
Haochen Sun,
Yan Yuan,
Lijuan Su,
Haotian Shao
Abstract:
Previous approaches for blind image super-resolution (SR) have relied on degradation estimation to restore high-resolution (HR) images from their low-resolution (LR) counterparts. However, accurate degradation estimation poses significant challenges. The SR model's incompatibility with degradation estimation methods, particularly the Correction Filter, may significantly impair performance as a res…
▽ More
Previous approaches for blind image super-resolution (SR) have relied on degradation estimation to restore high-resolution (HR) images from their low-resolution (LR) counterparts. However, accurate degradation estimation poses significant challenges. The SR model's incompatibility with degradation estimation methods, particularly the Correction Filter, may significantly impair performance as a result of correction errors. In this paper, we introduce a novel blind SR approach that focuses on Learning Correction Errors (LCE). Our method employs a lightweight Corrector to obtain a corrected low-resolution (CLR) image. Subsequently, within an SR network, we jointly optimize SR performance by utilizing both the original LR image and the frequency learning of the CLR image. Additionally, we propose a new Frequency-Self Attention block (FSAB) that enhances the global information utilization ability of Transformer. This block integrates both self-attention and frequency spatial attention mechanisms. Extensive ablation and comparison experiments conducted across various settings demonstrate the superiority of our method in terms of visual quality and accuracy. Our approach effectively addresses the challenges associated with degradation estimation and correction errors, paving the way for more accurate blind image SR.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer
Authors:
Chih-Cheng Chang,
Li Su
Abstract:
Many deep learning models have achieved dominant performance on the offline beat tracking task. However, online beat tracking, in which only the past and present input features are available, still remains challenging. In this paper, we propose BEAt tracking Streaming Transformer (BEAST), an online joint beat and downbeat tracking system based on the streaming Transformer. To deal with online scen…
▽ More
Many deep learning models have achieved dominant performance on the offline beat tracking task. However, online beat tracking, in which only the past and present input features are available, still remains challenging. In this paper, we propose BEAt tracking Streaming Transformer (BEAST), an online joint beat and downbeat tracking system based on the streaming Transformer. To deal with online scenarios, BEAST applies contextual block processing in the Transformer encoder. Moreover, we adopt relative positional encoding in the attention layer of the streaming Transformer encoder to capture relative timing position which is critically important information in music. Carrying out beat and downbeat experiments on benchmark datasets for a low latency scenario with maximum latency under 50 ms, BEAST achieves an F1-measure of 80.04% in beat and 46.78% in downbeat, which is a substantial improvement of about 5 percentage points over the state-of-the-art online beat tracking model.
△ Less
Submitted 23 April, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Adapting pretrained speech model for Mandarin lyrics transcription and alignment
Authors:
Jun-You Wang,
Chon-In Leong,
Yu-Chen Lin,
Li Su,
Jyh-Shing Roger Jang
Abstract:
The tasks of automatic lyrics transcription and lyrics alignment have witnessed significant performance improvements in the past few years. However, most of the previous works only focus on English in which large-scale datasets are available. In this paper, we address lyrics transcription and alignment of polyphonic Mandarin pop music in a low-resource setting. To deal with the data scarcity issue…
▽ More
The tasks of automatic lyrics transcription and lyrics alignment have witnessed significant performance improvements in the past few years. However, most of the previous works only focus on English in which large-scale datasets are available. In this paper, we address lyrics transcription and alignment of polyphonic Mandarin pop music in a low-resource setting. To deal with the data scarcity issue, we adapt pretrained Whisper model and fine-tune it on a monophonic Mandarin singing dataset. With the use of data augmentation and source separation model, results show that the proposed method achieves a character error rate of less than 18% on a Mandarin polyphonic dataset for lyrics transcription, and a mean absolute error of 0.071 seconds for lyrics alignment. Our results demonstrate the potential of adapting a pretrained speech model for lyrics transcription and alignment in low-resource scenarios.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Enhancing Motor Imagery Decoding in Brain Computer Interfaces using Riemann Tangent Space Map** and Cross Frequency Coupling
Authors:
Xiong Xiong,
Li Su,
**guo Huang,
Guixia Kang
Abstract:
Objective: Motor Imagery (MI) serves as a crucial experimental paradigm within the realm of Brain Computer Interfaces (BCIs), aiming to decoding motor intentions from electroencephalogram (EEG) signals. Method: Drawing inspiration from Riemannian geometry and Cross-Frequency Coupling (CFC), this paper introduces a novel approach termed Riemann Tangent Space Map** using Dichotomous Filter Bank wi…
▽ More
Objective: Motor Imagery (MI) serves as a crucial experimental paradigm within the realm of Brain Computer Interfaces (BCIs), aiming to decoding motor intentions from electroencephalogram (EEG) signals. Method: Drawing inspiration from Riemannian geometry and Cross-Frequency Coupling (CFC), this paper introduces a novel approach termed Riemann Tangent Space Map** using Dichotomous Filter Bank with Convolutional Neural Network (DFBRTS) to enhance the representation quality and decoding capability pertaining to MI features. DFBRTS first initiates the process by meticulously filtering EEG signals through a Dichotomous Filter Bank, structured in the fashion of a complete binary tree. Subsequently, it employs Riemann Tangent Space Map** to extract salient EEG signal features within each sub-band. Finally, a lightweight convolutional neural network is employed for further feature extraction and classification, operating under the joint supervision of cross-entropy and center loss. To validate the efficacy, extensive experiments were conducted using DFBRTS on two well-established benchmark datasets: the BCI competition IV 2a (BCIC-IV-2a) dataset and the OpenBMI dataset. The performance of DFBRTS was benchmarked against several state-of-the-art MI decoding methods, alongside other Riemannian geometry-based MI decoding approaches. Results: DFBRTS significantly outperforms other MI decoding algorithms on both datasets, achieving a remarkable classification accuracy of 78.16% for four-class and 71.58% for two-class hold-out classification, as compared to the existing benchmarks.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Coherent Compensation based ISAC Signal Processing for Long-range Sensing
Authors:
Lin Wang,
Zhiqing Wei,
Liyan Su,
Zhiyong Feng,
Huici Wu,
Dongsheng Xue
Abstract:
Integrated sensing and communication (ISAC) will greatly enhance the efficiency of physical resource utilization. The design of ISAC signal based on the orthogonal frequency division multiplex (OFDM) signal is the mainstream. However, when detecting the long-range target, the delay of echo signal exceeds CP duration, which will result in inter-symbol interference (ISI) and inter-carrier interferen…
▽ More
Integrated sensing and communication (ISAC) will greatly enhance the efficiency of physical resource utilization. The design of ISAC signal based on the orthogonal frequency division multiplex (OFDM) signal is the mainstream. However, when detecting the long-range target, the delay of echo signal exceeds CP duration, which will result in inter-symbol interference (ISI) and inter-carrier interference (ICI), limiting the sensing range. Facing the above problem, we propose to increase useful signal power through coherent compensation and improve the signal to interference plus noise power ratio (SINR) of each OFDM block. Compared with the traditional 2D-FFT algorithm, the improvement of SINR of range-doppler map (RDM) is verified by simulation, which will expand the sensing range.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
A Novel Black Box Process Quality Optimization Approach based on Hit Rate
Authors:
Yang Yang,
Jian Wu,
Xiangman Song,
Derun Wu,
Lijie Su,
Lixin Tang
Abstract:
Hit rate is a key performance metric in predicting process product quality in integrated industrial processes. It represents the percentage of products accepted by downstream processes within a controlled range of quality. However, optimizing hit rate is a non-convex and challenging problem. To address this issue, we propose a data-driven quasi-convex approach that combines factorial hidden Markov…
▽ More
Hit rate is a key performance metric in predicting process product quality in integrated industrial processes. It represents the percentage of products accepted by downstream processes within a controlled range of quality. However, optimizing hit rate is a non-convex and challenging problem. To address this issue, we propose a data-driven quasi-convex approach that combines factorial hidden Markov models, multitask elastic net, and quasi-convex optimization. Our approach converts the original non-convex problem into a set of convex feasible problems, achieving an optimal hit rate. We verify the convex optimization property and quasi-convex frontier through Monte Carlo simulations and real-world experiments in steel production. Results demonstrate that our approach outperforms classical models, improving hit rates by at least 41.11% and 31.01% on two real datasets. Furthermore, the quasi-convex frontier provides a reference explanation and visualization for the deterioration of solutions obtained by conventional models.
△ Less
Submitted 2 June, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
MicroSegNet: A Deep Learning Approach for Prostate Segmentation on Micro-Ultrasound Images
Authors:
Hongxu Jiang,
Muhammad Imran,
Preethika Muralidharan,
Anjali Patel,
Jake Pensa,
Muxuan Liang,
Tarik Benidir,
Joseph R. Grajo,
Jason P. Joseph,
Russell Terry,
John Michael DiBianco,
Li-Ming Su,
Yuyin Zhou,
Wayne G. Brisbane,
Wei Shao
Abstract:
Micro-ultrasound (micro-US) is a novel 29-MHz ultrasound technique that provides 3-4 times higher resolution than traditional ultrasound, potentially enabling low-cost, accurate diagnosis of prostate cancer. Accurate prostate segmentation is crucial for prostate volume measurement, cancer diagnosis, prostate biopsy, and treatment planning. However, prostate segmentation on micro-US is challenging…
▽ More
Micro-ultrasound (micro-US) is a novel 29-MHz ultrasound technique that provides 3-4 times higher resolution than traditional ultrasound, potentially enabling low-cost, accurate diagnosis of prostate cancer. Accurate prostate segmentation is crucial for prostate volume measurement, cancer diagnosis, prostate biopsy, and treatment planning. However, prostate segmentation on micro-US is challenging due to artifacts and indistinct borders between the prostate, bladder, and urethra in the midline. This paper presents MicroSegNet, a multi-scale annotation-guided transformer UNet model designed specifically to tackle these challenges. During the training process, MicroSegNet focuses more on regions that are hard to segment (hard regions), characterized by discrepancies between expert and non-expert annotations. We achieve this by proposing an annotation-guided binary cross entropy (AG-BCE) loss that assigns a larger weight to prediction errors in hard regions and a lower weight to prediction errors in easy regions. The AG-BCE loss was seamlessly integrated into the training process through the utilization of multi-scale deep supervision, enabling MicroSegNet to capture global contextual dependencies and local information at various scales. We trained our model using micro-US images from 55 patients, followed by evaluation on 20 patients. Our MicroSegNet model achieved a Dice coefficient of 0.939 and a Hausdorff distance of 2.02 mm, outperforming several state-of-the-art segmentation methods, as well as three human annotators with different experience levels. Our code is publicly available at https://github.com/mirthAI/MicroSegNet and our dataset is publicly available at https://zenodo.org/records/10475293.
△ Less
Submitted 25 January, 2024; v1 submitted 31 May, 2023;
originally announced May 2023.
-
Image Registration of In Vivo Micro-Ultrasound and Ex Vivo Pseudo-Whole Mount Histopathology Images of the Prostate: A Proof-of-Concept Study
Authors:
Muhammad Imran,
Brianna Nguyen,
Jake Pensa,
Sara M. Falzarano,
Anthony E. Sisk,
Muxuan Liang,
John Michael DiBianco,
Li-Ming Su,
Yuyin Zhou,
Wayne G. Brisbane,
Wei Shao
Abstract:
Early diagnosis of prostate cancer significantly improves a patient's 5-year survival rate. Biopsy of small prostate cancers is improved with image-guided biopsy. MRI-ultrasound fusion-guided biopsy is sensitive to smaller tumors but is underutilized due to the high cost of MRI and fusion equipment. Micro-ultrasound (micro-US), a novel high-resolution ultrasound technology, provides a cost-effecti…
▽ More
Early diagnosis of prostate cancer significantly improves a patient's 5-year survival rate. Biopsy of small prostate cancers is improved with image-guided biopsy. MRI-ultrasound fusion-guided biopsy is sensitive to smaller tumors but is underutilized due to the high cost of MRI and fusion equipment. Micro-ultrasound (micro-US), a novel high-resolution ultrasound technology, provides a cost-effective alternative to MRI while delivering comparable diagnostic accuracy. However, the interpretation of micro-US is challenging due to subtle gray scale changes indicating cancer vs normal tissue. This challenge can be addressed by training urologists with a large dataset of micro-US images containing the ground truth cancer outlines. Such a dataset can be mapped from surgical specimens (histopathology) onto micro-US images via image registration. In this paper, we present a semi-automated pipeline for registering in vivo micro-US images with ex vivo whole-mount histopathology images. Our pipeline begins with the reconstruction of pseudo-whole-mount histopathology images and a 3-dimensional (3D) micro-US volume. Each pseudo-whole-mount histopathology image is then registered with the corresponding axial micro-US slice using a two-stage approach that estimates an affine transformation followed by a deformable transformation. We evaluated our registration pipeline using micro-US and histopathology images from 18 patients who underwent radical prostatectomy. The results showed a Dice coefficient of 0.94 and a landmark error of 2.7 mm, indicating the accuracy of our registration pipeline. This proof-of-concept study demonstrates the feasibility of accurately aligning micro-US and histopathology images. To promote transparency and collaboration in research, we will make our code and dataset publicly available.
△ Less
Submitted 16 June, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
Steady-state analysis of networked epidemic models
Authors:
Sei Zhen Khong,
Lanlan Su
Abstract:
Compartmental epidemic models with dynamics that evolve over a graph network have gained considerable importance in recent years but analysis of
these models is in general difficult due to their complexity. In this paper, we develop two positive feedback frameworks that are applicable to the
study of steady-state values in a wide range of compartmental epidemic models, including both group and…
▽ More
Compartmental epidemic models with dynamics that evolve over a graph network have gained considerable importance in recent years but analysis of
these models is in general difficult due to their complexity. In this paper, we develop two positive feedback frameworks that are applicable to the
study of steady-state values in a wide range of compartmental epidemic models, including both group and networked
processes.
In the case of a group (resp. networked) model, we show that the convergence limit of the susceptible proportion of the population (resp. the
susceptible proportion in at least one of the subgroups) is upper bounded by the reciprocal of the basic reproduction number (BRN) of the model. The
BRN, when it is greater than unity, thus demonstrates the level of penetration into a subpopulation by the disease. Both non-strict and strict
bounds on the convergence limits are derived and shown to correspond to substantially distinct scenarios in the epidemic processes, one in the
presence of the endemic state and another without. Formulae for calculating the limits are provided in the latter case. We apply the developed
framework to examining various group and networked epidemic models commonly seen in the literature to verify the validity of our conclusions.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
A Phoneme-Informed Neural Network Model for Note-Level Singing Transcription
Authors:
Sangeon Yong,
Li Su,
Juhan Nam
Abstract:
Note-level automatic music transcription is one of the most representative music information retrieval (MIR) tasks and has been studied for various instruments to understand music. However, due to the lack of high-quality labeled data, transcription of many instruments is still a challenging task. In particular, in the case of singing, it is difficult to find accurate notes due to its expressivene…
▽ More
Note-level automatic music transcription is one of the most representative music information retrieval (MIR) tasks and has been studied for various instruments to understand music. However, due to the lack of high-quality labeled data, transcription of many instruments is still a challenging task. In particular, in the case of singing, it is difficult to find accurate notes due to its expressiveness in pitch, timbre, and dynamics. In this paper, we propose a method of finding note onsets of singing voice more accurately by leveraging the linguistic characteristics of singing, which are not seen in other instruments. The proposed model uses mel-scaled spectrogram and phonetic posteriorgram (PPG), a frame-wise likelihood of phoneme, as an input of the onset detection network while PPG is generated by the pre-trained network with singing and speech data. To verify how linguistic features affect onset detection, we compare the evaluation results through the dataset with different languages and divide onset types for detailed analysis. Our approach substantially improves the performance of singing transcription and therefore emphasizes the importance of linguistic features in singing analysis.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
On the exponential convergence of input-output signals of nonlinear feedback systems
Authors:
Lanlan Su,
Di Zhao,
Sei Zhen Khong
Abstract:
This note studies the exponential convergence of input-output signals of discrete-time nonlinear systems composed of a feedback interconnection of a linear time-invariant system and a nonlinear uncertainty. Both the open-loop subsystems are allowed to be unbounded. Integral-quadratic-constraint-based conditions are proposed for these uncertain feedback systems, including the Lurye type, to exhibit…
▽ More
This note studies the exponential convergence of input-output signals of discrete-time nonlinear systems composed of a feedback interconnection of a linear time-invariant system and a nonlinear uncertainty. Both the open-loop subsystems are allowed to be unbounded. Integral-quadratic-constraint-based conditions are proposed for these uncertain feedback systems, including the Lurye type, to exhibit the property that the endogenous input-output signals enjoy an exponential convergence rate for all initial conditions of the linear time-invariant subsystem. The conditions are established via a combination of tools, including integral quadratic constraints, directed gap, and exponential weightings.
△ Less
Submitted 12 June, 2024; v1 submitted 4 June, 2022;
originally announced June 2022.
-
On the Necessity and Sufficiency of Discrete-Time O'Shea-Zames-Falb Multipliers
Authors:
Lanlan Su,
Peter Seiler,
Joaquin Carrasco,
Sei Zhen Khong
Abstract:
This paper considers the robust stability of a discrete-time Lurye system consisting of the feedback interconnection between a linear system and a bounded and monotone nonlinearity. It has been conjectured that the existence of a suitable linear time-invariant (LTI) O'Shea-Zames-Falb multiplier is not only sufficient but also necessary. Roughly speaking, a successful proof of the conjecture would…
▽ More
This paper considers the robust stability of a discrete-time Lurye system consisting of the feedback interconnection between a linear system and a bounded and monotone nonlinearity. It has been conjectured that the existence of a suitable linear time-invariant (LTI) O'Shea-Zames-Falb multiplier is not only sufficient but also necessary. Roughly speaking, a successful proof of the conjecture would require: (a) a conic parameterization of a set of multipliers that describes exactly the set of nonlinearities, (b) a lossless S-procedure to show that the non-existence of a multiplier implies that the Lurye system is not uniformly robustly stable over the set of nonlinearities, and (c) the existence of a multiplier in the set of multipliers used in (a) implies the existence of an LTI multiplier. We investigate these three steps, showing the current bottlenecks for proving this conjecture. In addition, we provide an extension of the class of multipliers which may be used to disprove the conjecture.
△ Less
Submitted 14 December, 2021;
originally announced December 2021.
-
Actions Speak Louder than Listening: Evaluating Music Style Transfer based on Editing Experience
Authors:
Wei-Tsung Lu,
Meng-Hsuan Wu,
Yuh-Ming Chiu,
Li Su
Abstract:
The subjective evaluation of music generation techniques has been mostly done with questionnaire-based listening tests while ignoring the perspectives from music composition, arrangement, and soundtrack editing. In this paper, we propose an editing test to evaluate users' editing experience of music generation models in a systematic way. To do this, we design a new music style transfer model combi…
▽ More
The subjective evaluation of music generation techniques has been mostly done with questionnaire-based listening tests while ignoring the perspectives from music composition, arrangement, and soundtrack editing. In this paper, we propose an editing test to evaluate users' editing experience of music generation models in a systematic way. To do this, we design a new music style transfer model combining the non-chronological inference architecture, autoregressive models and the Transformer, which serves as an improvement from the baseline model on the same style transfer task. Then, we compare the performance of the two models with a conventional listening test and the proposed editing test, in which the quality of generated samples is assessed by the amount of effort (e.g., the number of required keyboard and mouse actions) spent by users to polish a music clip. Results on two target styles indicate that the improvement over the baseline model can be reflected by the editing test quantitatively. Also, the editing test provides profound insights which are not accessible from usual listening tests. The major contribution of this paper is the systematic presentation of the editing test and the corresponding insights, while the proposed music style transfer model based on state-of-the-art neural networks represents another contribution.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data
Authors:
Kin Wai Cheuk,
Dorien Herremans,
Li Su
Abstract:
Most of the current supervised automatic music transcription (AMT) models lack the ability to generalize. This means that they have trouble transcribing real-world music recordings from diverse musical genres that are not presented in the labelled training data. In this paper, we propose a semi-supervised framework, ReconVAT, which solves this issue by leveraging the huge amount of available unlab…
▽ More
Most of the current supervised automatic music transcription (AMT) models lack the ability to generalize. This means that they have trouble transcribing real-world music recordings from diverse musical genres that are not presented in the labelled training data. In this paper, we propose a semi-supervised framework, ReconVAT, which solves this issue by leveraging the huge amount of available unlabelled music recordings. The proposed ReconVAT uses reconstruction loss and virtual adversarial training. When combined with existing U-net models for AMT, ReconVAT achieves competitive results on common benchmark datasets such as MAPS and MusicNet. For example, in the few-shot setting for the string part version of MusicNet, ReconVAT achieves F1-scores of 61.0% and 41.6% for the note-wise and note-with-offset-wise metrics respectively, which translates into an improvement of 22.2% and 62.5% compared to the supervised baseline model. Our proposed framework also demonstrates the potential of continual learning on new data, which could be useful in real-world applications whereby new data is constantly available.
△ Less
Submitted 29 July, 2021; v1 submitted 10 July, 2021;
originally announced July 2021.
-
Omnizart: A General Toolbox for Automatic Music Transcription
Authors:
Yu-Te Wu,
Yin-Jyun Luo,
Tsung-** Chen,
I-Chieh Wei,
Jui-Yang Hsu,
Yi-Chin Chuang,
Li Su
Abstract:
We present and release Omnizart, a new Python library that provides a streamlined solution to automatic music transcription (AMT). Omnizart encompasses modules that construct the life-cycle of deep learning-based AMT, and is designed for ease of use with a compact command-line interface. To the best of our knowledge, Omnizart is the first transcription toolkit which offers models covering a wide c…
▽ More
We present and release Omnizart, a new Python library that provides a streamlined solution to automatic music transcription (AMT). Omnizart encompasses modules that construct the life-cycle of deep learning-based AMT, and is designed for ease of use with a compact command-line interface. To the best of our knowledge, Omnizart is the first transcription toolkit which offers models covering a wide class of instruments ranging from solo, instrument ensembles, percussion instruments to vocal, as well as models for chord recognition and beat/downbeat tracking, two music information retrieval (MIR) tasks highly related to AMT.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
Who is in Control? Practical Physical Layer Attack and Defense for mmWave based Sensing in Autonomous Vehicles
Authors:
Zhi Sun,
Sarankumar Balakrishnan,
Lu Su,
Arupjyoti Bhuyan,
Pu Wang,
Chunming Qiao
Abstract:
With the wide bandwidths in millimeter wave (mmWave) frequency band that results in unprecedented accuracy, mmWave sensing has become vital for many applications, especially in autonomous vehicles (AVs). In addition, mmWave sensing has superior reliability compared to other sensing counterparts such as camera and LiDAR, which is essential for safety-critical driving. Therefore, it is critical to u…
▽ More
With the wide bandwidths in millimeter wave (mmWave) frequency band that results in unprecedented accuracy, mmWave sensing has become vital for many applications, especially in autonomous vehicles (AVs). In addition, mmWave sensing has superior reliability compared to other sensing counterparts such as camera and LiDAR, which is essential for safety-critical driving. Therefore, it is critical to understand the security vulnerabilities and improve the security and reliability of mmWave sensing in AVs. To this end, we perform the end-to-end security analysis of a mmWave-based sensing system in AVs, by designing and implementing practical physical layer attack and defense strategies in a state-of-the-art mmWave testbed and an AV testbed in real-world settings. Various strategies are developed to take control of the victim AV by spoofing its mmWave sensing module, including adding fake obstacles at arbitrary locations and faking the locations of existing obstacles. Five real-world attack scenarios are constructed to spoof the victim AV and force it to make dangerous driving decisions leading to a fatal crash. Field experiments are conducted to study the impact of the various attack scenarios using a Lincoln MKZ-based AV testbed, which validate that the attacker can indeed assume control of the victim AV to compromise its security and safety. To defend the attacks, we design and implement a challenge-response authentication scheme and a RF fingerprinting scheme to reliably detect aforementioned spoofing attacks.
△ Less
Submitted 22 November, 2020;
originally announced November 2020.
-
Toward Expressive Singing Voice Correction: On Perceptual Validity of Evaluation Metrics for Vocal Melody Extraction
Authors:
Yin-Jyun Luo,
Yuen-Jen Lin,
Li Su
Abstract:
Singing voice correction (SVC) is an appealing application for amateur singers. Commercial products automate SVC by snap** pitch contours to equal-tempered scales, which could lead to deadpan modifications. Together with the neglect of rhythmic errors, extensive manual corrections are still necessary. In this paper, we present a streamlined system to automate expressive SVC for both pitch and rh…
▽ More
Singing voice correction (SVC) is an appealing application for amateur singers. Commercial products automate SVC by snap** pitch contours to equal-tempered scales, which could lead to deadpan modifications. Together with the neglect of rhythmic errors, extensive manual corrections are still necessary. In this paper, we present a streamlined system to automate expressive SVC for both pitch and rhythmic errors. Particularly, we extend a previous work by integrating advanced techniques for singing voice separation (SVS) and vocal melody extraction. SVC is achieved by temporally aligning the source-target pair, followed by replacing pitch and rhythm of the source with those of the target. We evaluate the framework by a comparative study for melody extraction which involves both subjective and objective evaluations, whereby we investigate perceptual validity of the standard metrics through the lens of SVC. The results suggest that the high pitch accuracy obtained by the metrics does not signify good perceptual scores.
△ Less
Submitted 23 October, 2020;
originally announced October 2020.
-
Robust Monotonic Convergent Iterative Learning Control Design: an LMI-based Method
Authors:
Lanlan Su
Abstract:
This work investigates robust monotonic convergent iterative learning control (ILC) for uncertain linear systems in both time and frequency domains, and the ILC algorithm optimizing the convergence speed in terms of $l_{2}$ norm of error signals is derived. Firstly, it is shown that the robust monotonic convergence of the ILC system can be established equivalently by the positive definiteness of a…
▽ More
This work investigates robust monotonic convergent iterative learning control (ILC) for uncertain linear systems in both time and frequency domains, and the ILC algorithm optimizing the convergence speed in terms of $l_{2}$ norm of error signals is derived. Firstly, it is shown that the robust monotonic convergence of the ILC system can be established equivalently by the positive definiteness of a matrix polynomial over some set. Then, a necessary and sufficient condition in the form of sum of squares (SOS) for the positive definiteness is proposed, which is amendable to the feasibility of linear matrix inequalities (LMIs). Based on such a condition, the optimal ILC algorithm that maximizes the convergence speed is obtained by solving a set of convex optimization problems. Moreover, the order of the learning function can be chosen arbitrarily so that the designers have the flexibility to decide the complexity of the learning algorithm.
△ Less
Submitted 15 January, 2021; v1 submitted 28 September, 2020;
originally announced September 2020.
-
On the Necessity and Sufficiency of the Zames-Falb Multipliers for Bounded Operators
Authors:
Sei Zhen Khong,
Lanlan Su
Abstract:
This paper analyzes the robust feedback stability of a single-input-single-output stable linear time-invariant (LTI) system against four different classes of nonlinear systems using the Zames-Falb multipliers. The contribution is fourfold. Firstly, we present a generalised S-procedure lossless theorem that involves a countably infinite number of quadratic forms. Secondly, we identify a class of un…
▽ More
This paper analyzes the robust feedback stability of a single-input-single-output stable linear time-invariant (LTI) system against four different classes of nonlinear systems using the Zames-Falb multipliers. The contribution is fourfold. Firstly, we present a generalised S-procedure lossless theorem that involves a countably infinite number of quadratic forms. Secondly, we identify a class of uncertain systems over which the robust feedback stability implies the existence of an appropriate Zames-Falb multiplier based on the generalised S-procedure lossless theorem. Meanwhile, we show that the existence of such a Zames-Falb multiplier is sufficient for the robust feedback stability over a smaller class of uncertain systems. Thirdly, when restricted to be static (a.k.a. memoryless), the second class of systems coincides with the class of sloped-restricted monotone nonlinearities, and the classical result of using the Zames-Falb multipliers to ensure feedback stability is recovered. Lastly, when restricted to be LTI, the second class is demonstrated to be a subset of the third, and the existence of a Zames-Falb multiplier is shown to be sufficient but not necessary for the robust feedback stability.
△ Less
Submitted 18 August, 2021; v1 submitted 28 September, 2020;
originally announced September 2020.
-
Temporally Guided Music-to-Body-Movement Generation
Authors:
Hsuan-Kai Kao,
Li Su
Abstract:
This paper presents a neural network model to generate virtual violinist's 3-D skeleton movements from music audio. Improved from the conventional recurrent neural network models for generating 2-D skeleton data in previous works, the proposed model incorporates an encoder-decoder architecture, as well as the self-attention mechanism to model the complicated dynamics in body movement sequences. To…
▽ More
This paper presents a neural network model to generate virtual violinist's 3-D skeleton movements from music audio. Improved from the conventional recurrent neural network models for generating 2-D skeleton data in previous works, the proposed model incorporates an encoder-decoder architecture, as well as the self-attention mechanism to model the complicated dynamics in body movement sequences. To facilitate the optimization of self-attention model, beat tracking is applied to determine effective sizes and boundaries of the training examples. The decoder is accompanied with a refining network and a bowing attack inference mechanism to emphasize the right-hand behavior and bowing attack timing. Both objective and subjective evaluations reveal that the proposed model outperforms the state-of-the-art methods. To the best of our knowledge, this work represents the first attempt to generate 3-D violinists' body movements considering key features in musical body movement.
△ Less
Submitted 16 September, 2020;
originally announced September 2020.
-
Semi-supervised learning using teacher-student models for vocal melody extraction
Authors:
Sangeun Kum,
**g-Hua Lin,
Li Su,
Juhan Nam
Abstract:
The lack of labeled data is a major obstacle in many music information retrieval tasks such as melody extraction, where labeling is extremely laborious or costly. Semi-supervised learning (SSL) provides a solution to alleviate the issue by leveraging a large amount of unlabeled data. In this paper, we propose an SSL method using teacher-student models for vocal melody extraction. The teacher model…
▽ More
The lack of labeled data is a major obstacle in many music information retrieval tasks such as melody extraction, where labeling is extremely laborious or costly. Semi-supervised learning (SSL) provides a solution to alleviate the issue by leveraging a large amount of unlabeled data. In this paper, we propose an SSL method using teacher-student models for vocal melody extraction. The teacher model is pre-trained with labeled data and guides the student model to make identical predictions given unlabeled input in a self-training setting. We examine three setups of teacher-student models with different data augmentation schemes and loss functions. Also, considering the scarcity of labeled data in the test phase, we artificially generate large-scale testing data with pitch labels from unlabeled data using an analysis-synthesis method. The results show that the SSL method significantly increases the performance against supervised learning only and the improvement depends on the teacher-student models, the size of unlabeled data, the number of self-training iterations, and other training details. We also find that it is essential to ensure that the unlabeled audio has vocal parts. Finally, we show that the proposed SSL method enables a baseline convolutional recurrent neural network model to achieve performance comparable to state-of-the-arts.
△ Less
Submitted 14 August, 2020;
originally announced August 2020.
-
Road Grade Estimation Using Crowd-Sourced Smartphone Data
Authors:
Abhishek Gupta,
Shaohan Hu,
Weida Zhong,
Adel Sadek,
Lu Su,
Chunming Qiao
Abstract:
Estimates of road grade/slope can add another dimension of information to existing 2D digital road maps. Integration of road grade information will widen the scope of digital map's applications, which is primarily used for navigation, by enabling driving safety and efficiency applications such as Advanced Driver Assistance Systems (ADAS), eco-driving, etc. The huge scale and dynamic nature of road…
▽ More
Estimates of road grade/slope can add another dimension of information to existing 2D digital road maps. Integration of road grade information will widen the scope of digital map's applications, which is primarily used for navigation, by enabling driving safety and efficiency applications such as Advanced Driver Assistance Systems (ADAS), eco-driving, etc. The huge scale and dynamic nature of road networks make sensing road grade a challenging task. Traditional methods oftentimes suffer from limited scalability and update frequency, as well as poor sensing accuracy. To overcome these problems, we propose a cost-effective and scalable road grade estimation framework using sensor data from smartphones. Based on our understanding of the error characteristics of smartphone sensors, we intelligently combine data from accelerometer, gyroscope and vehicle speed data from OBD-II/smartphone's GPS to estimate road grade. To improve accuracy and robustness of the system, the estimations of road grade from multiple sources/vehicles are crowd-sourced to compensate for the effects of varying quality of sensor data from different sources. Extensive experimental evaluation on a test route of ~9km demonstrates the superior performance of our proposed method, achieving $5\times$ improvement on road grade estimation accuracy over baselines, with 90\% of errors below 0.3$^\circ$.
△ Less
Submitted 5 June, 2020;
originally announced June 2020.
-
Learning Enabled Dense Space-division Multiplexing through a Single Multimode Fibre
Authors:
Pengfei Fan,
Michael Ruddlesden,
Yufei Wang,
Luming Zhao,
Chao Lu,
Lei Su
Abstract:
Space-division multiplexing is a promising technology in optical fibre communication to improve the transmission capacity of a single optical fibre. However, the number of channels that can be multiplexed is limited by the crosstalks between channels, and the multiplexing is only applied to few-mode or multi-core fibres. Here, we propose a high-spatial-density channel multiplexing framework employ…
▽ More
Space-division multiplexing is a promising technology in optical fibre communication to improve the transmission capacity of a single optical fibre. However, the number of channels that can be multiplexed is limited by the crosstalks between channels, and the multiplexing is only applied to few-mode or multi-core fibres. Here, we propose a high-spatial-density channel multiplexing framework employing deep learning for standard multimode fibres (MMF). We present a proof-of-concept experimental system, consisting of a single light source, a single digital-micromirror-device modulator, a single detection camera, and a deep convolutional neural network (CNN) to demonstrate up to 400-channel simultaneous data transmission with accuracy close to 100% over MMFs of different types, diameters and lengths. A novel scalable semi-supervised learning model is proposed to adapt the CNN to the time-varying MMF information channels in real-time, to overcome the environmental changes such as temperature variations and vibrations, and to reconstruct the input data from complex crosstalks among hundreds of channels. This deep-learning based approach is promising to maximize the use of the spatial dimension of MMFs, and to break the present number-of-channel limit in space-division multiplexing for future high-capacity MMF transmission data links.
△ Less
Submitted 5 February, 2020;
originally announced February 2020.
-
Analysis of Two-Dimensional Feedback Systems over Networks Using Dissipativity
Authors:
Yang Yan,
Lanlan Su,
Vijay Gupta,
Panos Antsaklis
Abstract:
This paper investigates the closed-loop $\mathcal{L}_2$ stability of two-dimensional (2-D) feedback systems across a digital communication network by introducing the tool of dissipativity. First, sampling of a continuous 2-D system is considered and an analytical characterization of the $QSR$-dissipativity of the sampled system is presented. Next, the input-feedforward output-feedback passivity (I…
▽ More
This paper investigates the closed-loop $\mathcal{L}_2$ stability of two-dimensional (2-D) feedback systems across a digital communication network by introducing the tool of dissipativity. First, sampling of a continuous 2-D system is considered and an analytical characterization of the $QSR$-dissipativity of the sampled system is presented. Next, the input-feedforward output-feedback passivity (IF-OFP), a simplified form of $QSR$-dissipativity, is utilized to study the framework of feedback interconnection of two 2-D systems over networks. Then, the effects of signal quantization in communication links on dissipativity degradation of the 2-D feedback quantized system is analyzed. Additionally, an event-triggered mechanism is developed for 2-D networked control systems while maintaining $\mathcal{L}_2$ stability of the closed-loop system. In the end, an illustrative example is provided.
△ Less
Submitted 5 August, 2019;
originally announced August 2019.
-
Stabilization of Linear Systems Across a Time-Varying AWGN Fading Channel
Authors:
Lanlan Su,
Vijay Gupta,
Graziano Chesi
Abstract:
This technical note investigates the minimum average transmit power required for mean-square stabilization of a discrete-time linear process across a time-varying additive white Gaussian noise (AWGN) fading channel that is presented between the sensor and the controller. We assume channel state information at both the transmitter and the receiver, and allow the transmit power to vary with the chan…
▽ More
This technical note investigates the minimum average transmit power required for mean-square stabilization of a discrete-time linear process across a time-varying additive white Gaussian noise (AWGN) fading channel that is presented between the sensor and the controller. We assume channel state information at both the transmitter and the receiver, and allow the transmit power to vary with the channel state to obtain the minimum required average transmit power via optimal power adaptation. We consider both the case of independent and identically distributed fading and fading subject to a Markov chain. Based on the proposed necessary and sufficient conditions for mean-square stabilization, we show that the minimum average transmit power to ensure stabilizability can be obtained by solving a geometric program.
△ Less
Submitted 31 July, 2019; v1 submitted 30 July, 2019;
originally announced July 2019.
-
Distributed Resource Allocation over Time-varying Balanced Digraphs with Discrete-time Communication
Authors:
Lanlan Su,
Mengmou Li,
Vijay Gupta,
Graziano Chesi
Abstract:
This work is concerned with the problem of distributed resource allocation in continuous-time setting but with discrete-time communication over infinitely jointly connected and balanced digraphs. We provide a passivity-based perspective for the continuous-time algorithm, based on which an intermittent communication scheme is developed. Particularly, a periodic communication scheme is first derived…
▽ More
This work is concerned with the problem of distributed resource allocation in continuous-time setting but with discrete-time communication over infinitely jointly connected and balanced digraphs. We provide a passivity-based perspective for the continuous-time algorithm, based on which an intermittent communication scheme is developed. Particularly, a periodic communication scheme is first derived through analyzing the passivity degradation over output sampling of the distributed dynamics at each node. Then, an asynchronous distributed event-triggered scheme is further developed. The sampled-based event-triggered communication scheme is exempt from Zeno behavior as the minimum inter-event time is lower bounded by the sampling period. The parameters in the proposed algorithm rely only on local information of each individual nodes, which can be designed in a truly distributed fashion
△ Less
Submitted 15 January, 2021; v1 submitted 30 July, 2019;
originally announced July 2019.
-
Feedback Passivation of Linear Systems with Fixed-Structured Controllers
Authors:
Lanlan Su,
Vijay Gupta,
Panos Antsaklis
Abstract:
This paper addresses the problem of designing an optimal output feedback controller with a specified controller structure for linear time-invariant (LTI) systems to maximize the passivity level for the closed-loop system, in both continuous-time (CT) and discrete-time (DT). Specifically, the set of controllers under consideration is linearly parameterized with constrained parameters. Both input fe…
▽ More
This paper addresses the problem of designing an optimal output feedback controller with a specified controller structure for linear time-invariant (LTI) systems to maximize the passivity level for the closed-loop system, in both continuous-time (CT) and discrete-time (DT). Specifically, the set of controllers under consideration is linearly parameterized with constrained parameters. Both input feedforward passivity (IFP) and output feedback passivity (OFP) indices are used to capture the level of passivity. Given a set of stabilizing controllers, a necessary and sufficient condition is proposed for the existence of such fixed-structured output feedback controllers that can passivate the closed-loop system. Moreover, it is shown that the condition can be used to obtain the controller that maximizes the IFP or the OFP index by solving a convex optimization problem.
△ Less
Submitted 30 July, 2019;
originally announced July 2019.
-
Multi-layered Cepstrum for Instantaneous Frequency Estimation
Authors:
Chin-Yun Yu,
Li Su
Abstract:
We propose the multi-layered cepstrum (MLC) method to estimate multiple fundamental frequencies (MF0) of a signal under challenging contamination such as high-pass filter noise. Taking the operation of cepstrum (i.e., Fourier transform, filtering, and nonlinear activation) recursively, MLC is shown as an efficient method to enhance MF0 saliency in a step-by-step manner. Evaluation on a real-world…
▽ More
We propose the multi-layered cepstrum (MLC) method to estimate multiple fundamental frequencies (MF0) of a signal under challenging contamination such as high-pass filter noise. Taking the operation of cepstrum (i.e., Fourier transform, filtering, and nonlinear activation) recursively, MLC is shown as an efficient method to enhance MF0 saliency in a step-by-step manner. Evaluation on a real-world polyphonic music dataset under both normal and low-fidelity conditions demonstrates the potential of MLC.
△ Less
Submitted 1 February, 2019;
originally announced February 2019.
-
Play as You Like: Timbre-enhanced Multi-modal Music Style Transfer
Authors:
Chien-Yu Lu,
Min-Xin Xue,
Chia-Che Chang,
Che-Rung Lee,
Li Su
Abstract:
Style transfer of polyphonic music recordings is a challenging task when considering the modeling of diverse, imaginative, and reasonable music pieces in the style different from their original one. To achieve this, learning stable multi-modal representations for both domain-variant (i.e., style) and domain-invariant (i.e., content) information of music in an unsupervised manner is critical. In th…
▽ More
Style transfer of polyphonic music recordings is a challenging task when considering the modeling of diverse, imaginative, and reasonable music pieces in the style different from their original one. To achieve this, learning stable multi-modal representations for both domain-variant (i.e., style) and domain-invariant (i.e., content) information of music in an unsupervised manner is critical. In this paper, we propose an unsupervised music style transfer method without the need for parallel data. Besides, to characterize the multi-modal distribution of music pieces, we employ the Multi-modal Unsupervised Image-to-Image Translation (MUNIT) framework in the proposed system. This allows one to generate diverse outputs from the learned latent distributions representing contents and styles. Moreover, to better capture the granularity of sound, such as the perceptual dimensions of timbre and the nuance in instrument-specific performance, cognitively plausible features including mel-frequency cepstral coefficients (MFCC), spectral difference, and spectral envelope, are combined with the widely-used mel-spectrogram into a timber-enhanced multi-channel input representation. The Relativistic average Generative Adversarial Networks (RaGAN) is also utilized to achieve fast convergence and high stability. We conduct experiments on bilateral style transfer tasks among three different genres, namely piano solo, guitar solo, and string quartet. Results demonstrate the advantages of the proposed method in music style transfer with improved sound quality and in allowing users to manipulate the output.
△ Less
Submitted 28 November, 2018;
originally announced November 2018.
-
A Streamlined Encoder/Decoder Architecture for Melody Extraction
Authors:
Tsung-Han Hsieh,
Li Su,
Yi-Hsuan Yang
Abstract:
Melody extraction in polyphonic musical audio is important for music signal processing. In this paper, we propose a novel streamlined encoder/decoder network that is designed for the task. We make two technical contributions. First, drawing inspiration from a state-of-the-art model for semantic pixel-wise segmentation, we pass through the pooling indices between pooling and un-pooling layers to lo…
▽ More
Melody extraction in polyphonic musical audio is important for music signal processing. In this paper, we propose a novel streamlined encoder/decoder network that is designed for the task. We make two technical contributions. First, drawing inspiration from a state-of-the-art model for semantic pixel-wise segmentation, we pass through the pooling indices between pooling and un-pooling layers to localize the melody in frequency. We can achieve result close to the state-of-the-art with much fewer convolutional layers and simpler convolution modules. Second, we propose a way to use the bottleneck layer of the network to estimate the existence of a melody line for each time frame, and make it possible to use a simple argmax function instead of ad-hoc thresholding to get the final estimation of the melody line. Our experiments on both vocal melody extraction and general melody extraction validate the effectiveness of the proposed model.
△ Less
Submitted 18 February, 2019; v1 submitted 30 October, 2018;
originally announced October 2018.
-
Single-shot image retrieval through a multimode fiber using a genetic algorithm
Authors:
Michael Ruddlesden,
**shuai Zhang,
Tianrui Zhao,
Wen Wang,
Lei Su
Abstract:
In this letter, we present a genetic algorithm-based approach for image retrieval through a multimode fiber in a reference-less system. Due to mode interference, when an image is illuminated at one side of a multimode fiber, the transmitted light forms a noise-like speckle pattern at the other end. With the use of a prior-measured transmission matrix of the fiber, a speckle pattern is calculated u…
▽ More
In this letter, we present a genetic algorithm-based approach for image retrieval through a multimode fiber in a reference-less system. Due to mode interference, when an image is illuminated at one side of a multimode fiber, the transmitted light forms a noise-like speckle pattern at the other end. With the use of a prior-measured transmission matrix of the fiber, a speckle pattern is calculated using a random input mask. By optimizing the input mask to achieve a high correlation coefficient of experimental and calculated patterns, the input mask is optimized into an image with high similarity to the original image.
△ Less
Submitted 26 October, 2018;
originally announced October 2018.
-
Finite-time Guarantees for Byzantine-Resilient Distributed State Estimation with Noisy Measurements
Authors:
Lili Su,
Shahin Shahrampour
Abstract:
This work considers resilient, cooperative state estimation in unreliable multi-agent networks. A network of agents aims to collaboratively estimate the value of an unknown vector parameter, while an {\em unknown} subset of agents suffer Byzantine faults. Faulty agents malfunction arbitrarily and may send out {\em highly unstructured} messages to other agents in the network. As opposed to fault-fr…
▽ More
This work considers resilient, cooperative state estimation in unreliable multi-agent networks. A network of agents aims to collaboratively estimate the value of an unknown vector parameter, while an {\em unknown} subset of agents suffer Byzantine faults. Faulty agents malfunction arbitrarily and may send out {\em highly unstructured} messages to other agents in the network. As opposed to fault-free networks, reaching agreement in the presence of Byzantine faults is far from trivial. In this paper, we propose a computationally-efficient algorithm that is provably robust to Byzantine faults. At each iteration of the algorithm, a good agent (1) performs a gradient descent update based on noisy local measurements, (2) exchanges its update with other agents in its neighborhood, and (3) robustly aggregates the received messages using coordinate-wise trimmed means. Under mild technical assumptions, we establish that good agents learn the true parameter asymptotically in almost sure sense. We further complement our analysis by proving (high probability) {\em finite-time} convergence rate, encapsulating network characteristics.
△ Less
Submitted 16 October, 2018;
originally announced October 2018.
-
FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices
Authors:
Shuochao Yao,
Yiran Zhao,
Huajie Shao,
Shengzhong Liu,
Dongxin Liu,
Lu Su,
Tarek Abdelzaher
Abstract:
Deep neural networks show great potential as solutions to many sensing application problems, but their excessive resource demand slows down execution time, pausing a serious impediment to deployment on low-end devices. To address this challenge, recent literature focused on compressing neural network size to improve performance. We show that changing neural network size does not proportionally aff…
▽ More
Deep neural networks show great potential as solutions to many sensing application problems, but their excessive resource demand slows down execution time, pausing a serious impediment to deployment on low-end devices. To address this challenge, recent literature focused on compressing neural network size to improve performance. We show that changing neural network size does not proportionally affect performance attributes of interest, such as execution time. Rather, extreme run-time nonlinearities exist over the network configuration space. Hence, we propose a novel framework, called FastDeepIoT, that uncovers the non-linear relation between neural network structure and execution time, then exploits that understanding to find network configurations that significantly improve the trade-off between execution time and accuracy on mobile and embedded devices. FastDeepIoT makes two key contributions. First, FastDeepIoT automatically learns an accurate and highly interpretable execution time model for deep neural networks on the target device. This is done without prior knowledge of either the hardware specifications or the detailed implementation of the used deep learning library. Second, FastDeepIoT informs a compression algorithm how to minimize execution time on the profiled device without impacting accuracy. We evaluate FastDeepIoT using three different sensing-related tasks on two mobile devices: Nexus 5 and Galaxy Nexus. FastDeepIoT further reduces the neural network execution time by $48\%$ to $78\%$ and energy consumption by $37\%$ to $69\%$ compared with the state-of-the-art compression algorithms.
△ Less
Submitted 18 September, 2018;
originally announced September 2018.
-
Vocal melody extraction using patch-based CNN
Authors:
Li Su
Abstract:
A patch-based convolutional neural network (CNN) model presented in this paper for vocal melody extraction in polyphonic music is inspired from object detection in image processing. The input of the model is a novel time-frequency representation which enhances the pitch contours and suppresses the harmonic components of a signal. This succinct data representation and the patch-based CNN model enab…
▽ More
A patch-based convolutional neural network (CNN) model presented in this paper for vocal melody extraction in polyphonic music is inspired from object detection in image processing. The input of the model is a novel time-frequency representation which enhances the pitch contours and suppresses the harmonic components of a signal. This succinct data representation and the patch-based CNN model enable an efficient training process with limited labeled data. Experiments on various datasets show excellent speed and competitive accuracy comparing to other deep learning approaches.
△ Less
Submitted 24 April, 2018;
originally announced April 2018.
-
Singing voice correction using canonical time war**
Authors:
Yin-Jyun Luo,
Ming-Tso Chen,
Tai-Shih Chi,
Li Su
Abstract:
Expressive singing voice correction is an appealing but challenging problem. A robust time-war** algorithm which synchronizes two singing recordings can provide a promising solution. We thereby propose to address the problem by canonical time war** (CTW) which aligns amateur singing recordings to professional ones. A new pitch contour is generated given the alignment information, and a pitch-c…
▽ More
Expressive singing voice correction is an appealing but challenging problem. A robust time-war** algorithm which synchronizes two singing recordings can provide a promising solution. We thereby propose to address the problem by canonical time war** (CTW) which aligns amateur singing recordings to professional ones. A new pitch contour is generated given the alignment information, and a pitch-corrected singing is synthesized back through the vocoder. The objective evaluation shows that CTW is robust against pitch-shifting and time-stretching effects, and the subjective test demonstrates that CTW prevails the other methods including DTW and the commercial auto-tuning software. Finally, we demonstrate the applicability of the proposed method in a practical, real-world scenario.
△ Less
Submitted 23 November, 2017;
originally announced November 2017.
-
VehSense: Slippery Road Detection Using Smartphones
Authors:
Yunfei Hou,
Abhishek Gupta,
Tong Guan,
Shaohan Hu,
Lu Su,
Chunming Qiao
Abstract:
This paper investigates a new application of vehicular sensing: detecting and reporting the slippery road conditions. We describe a system and associated algorithm to monitor vehicle skidding events using smartphones and OBD-II (On board Diagnostics) adaptors. This system, which we call the VehSense, gathers data from smartphone inertial sensors and vehicle wheel speed sensors, and processes the d…
▽ More
This paper investigates a new application of vehicular sensing: detecting and reporting the slippery road conditions. We describe a system and associated algorithm to monitor vehicle skidding events using smartphones and OBD-II (On board Diagnostics) adaptors. This system, which we call the VehSense, gathers data from smartphone inertial sensors and vehicle wheel speed sensors, and processes the data to monitor slippery road conditions in real-time. Specifically, two speed readings are collected: 1) ground speed, which is estimated by vehicle acceleration and rotation, and 2) wheel speed, which is retrieved from the OBD-II interface. The mismatch between these two speeds is used to infer a skidding event. Without tap** into vehicle manufactures' proprietary data (e.g., antilock braking system), VehSense is compatible with most of the passenger vehicles, and thus can be easily deployed. We evaluate our system on snow-covered roads at Buffalo, and show that it can detect vehicle skidding effectively.
△ Less
Submitted 10 May, 2017;
originally announced May 2017.