-
A Distributed Model Identification Algorithm for Multi-Agent Systems
Authors:
Vivek Khatana,
Chin-Yao Chang,
Wenbo Wang
Abstract:
In this study, we investigate agent-based approach for system model identification with an emphasis on power distribution system applications. Departing from conventional practices of relying on historical data for offline model identification, we adopt an online update approach utilizing real-time data by employing the latest data points for gradient computation. This methodology offers advantage…
▽ More
In this study, we investigate agent-based approach for system model identification with an emphasis on power distribution system applications. Departing from conventional practices of relying on historical data for offline model identification, we adopt an online update approach utilizing real-time data by employing the latest data points for gradient computation. This methodology offers advantages including a large reduction in the communication network's bandwidth requirements by minimizing the data exchanged at each iteration and enabling the model to adapt in real-time to disturbances. Furthermore, we extend our model identification process from linear frameworks to more complex non-linear convex models. This extension is validated through numerical studies demonstrating improved control performance for a synthetic IEEE test case.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
A Deep Learning Approach to Radar-based QPE
Authors:
Ting-Shuo Yo,
Shih-Hao Su,
Jung-Lien Chu,
Chiao-Wei Chang,
Hung-Chi Kuo
Abstract:
In this study, we propose a volume-to-point framework for quantitative precipitation estimation (QPE) based on the Quantitative Precipitation Estimation and Segregation Using Multiple Sensor (QPESUMS) Mosaic Radar data set. With a data volume consisting of the time series of gridded radar reflectivities over the Taiwan area, we used machine learning algorithms to establish a statistical model for…
▽ More
In this study, we propose a volume-to-point framework for quantitative precipitation estimation (QPE) based on the Quantitative Precipitation Estimation and Segregation Using Multiple Sensor (QPESUMS) Mosaic Radar data set. With a data volume consisting of the time series of gridded radar reflectivities over the Taiwan area, we used machine learning algorithms to establish a statistical model for QPE in weather stations. The model extracts spatial and temporal features from the input data volume and then associates these features with the location-specific precipitations. In contrast to QPE methods based on the Z-R relation, we leverage the machine learning algorithms to automatically detect the evolution and movement of weather systems and associate these patterns to a location with specific topographic attributes. Specifically, we evaluated this framework with the hourly precipitation data of 45 weather stations in Taipei during 2013-2016. In comparison to the operational QPE scheme used by the Central Weather Bureau, the volume-to-point framework performed comparably well in general cases and excelled in detecting heavy-rainfall events. By using the current results as the reference benchmark, the proposed method can integrate the heterogeneous data sources and potentially improve the forecast in extreme precipitation scenarios.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Towards Non-Robocentric Dynamic Landing of Quadrotor UAVs
Authors:
Li-Yu Lo,
Boyang Li,
Chih-Yung Wen,
Ching-Wei Chang
Abstract:
In this work, we propose a dynamic landing solution without the need for onboard exteroceptive sensors and an expensive computation unit, where all localization and control modules are carried out on the ground in a non-inertial frame. Our system starts with a relative state estimator of the aerial robot from the perspective of the landing platform, where the state tracking of the UAV is done thro…
▽ More
In this work, we propose a dynamic landing solution without the need for onboard exteroceptive sensors and an expensive computation unit, where all localization and control modules are carried out on the ground in a non-inertial frame. Our system starts with a relative state estimator of the aerial robot from the perspective of the landing platform, where the state tracking of the UAV is done through a set of onboard LED markers and an on-ground camera; the state is expressed geometrically on manifold, and is returned by Iterated Extended Kalman filter (IEKF) algorithm. Subsequently, a motion planning module is developed to guide the landing process, formulating it as a minimum jerk trajectory by applying the differential flatness property. Considering visibility and dynamic constraints, the problem is solved using quadratic programming, and the final motion primitive is expressed through piecewise polynomials. Through a series of experiments, the applicability of this approach is validated by successfully landing 18 cm x 18 cm quadrotor on a 43 cm x 43 cm platform, exhibiting performance comparable to conventional methods. Finally, we provide comprehensive hardware and software details to the research community for future reference.
△ Less
Submitted 21 January, 2024;
originally announced January 2024.
-
BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer
Authors:
Chih-Cheng Chang,
Li Su
Abstract:
Many deep learning models have achieved dominant performance on the offline beat tracking task. However, online beat tracking, in which only the past and present input features are available, still remains challenging. In this paper, we propose BEAt tracking Streaming Transformer (BEAST), an online joint beat and downbeat tracking system based on the streaming Transformer. To deal with online scen…
▽ More
Many deep learning models have achieved dominant performance on the offline beat tracking task. However, online beat tracking, in which only the past and present input features are available, still remains challenging. In this paper, we propose BEAt tracking Streaming Transformer (BEAST), an online joint beat and downbeat tracking system based on the streaming Transformer. To deal with online scenarios, BEAST applies contextual block processing in the Transformer encoder. Moreover, we adopt relative positional encoding in the attention layer of the streaming Transformer encoder to capture relative timing position which is critically important information in music. Carrying out beat and downbeat experiments on benchmark datasets for a low latency scenario with maximum latency under 50 ms, BEAST achieves an F1-measure of 80.04% in beat and 46.78% in downbeat, which is a substantial improvement of about 5 percentage points over the state-of-the-art online beat tracking model.
△ Less
Submitted 23 April, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Hybrid Aerodynamics-Based Model Predictive Control for a Tail-Sitter UAV
Authors:
Bailun Jiang,
Boyang Li,
Ching-Wei Chang,
Chih-Yung Wen
Abstract:
It is challenging to model and control a tail-sitter unmanned aerial vehicle (UAV) because its blended wing body generates complicated nonlinear aerodynamic effects, such as wing lift, fuselage drag, and propeller-wing interactions. We therefore devised a hybrid aerodynamic modeling method and model predictive control (MPC) design for a quadrotor tail-sitter UAV. The hybrid model consists of the N…
▽ More
It is challenging to model and control a tail-sitter unmanned aerial vehicle (UAV) because its blended wing body generates complicated nonlinear aerodynamic effects, such as wing lift, fuselage drag, and propeller-wing interactions. We therefore devised a hybrid aerodynamic modeling method and model predictive control (MPC) design for a quadrotor tail-sitter UAV. The hybrid model consists of the Newton-Euler equation, which describes quadrotor dynamics, and a feedforward neural network, which learns residual aerodynamic effects. This hybrid model exhibits high predictive accuracy at a low computational cost and was used to implement hybrid MPC, which optimizes the throttle, pitch angle, and roll angle for position tracking. The controller performance was validated in real-world experiments, which obtained a 57% tracking error reduction compared with conventional nonlinear MPC. External wind disturbance was also introduced and the experimental results confirmed the robustness of the controller to these conditions.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
All Attention U-NET for Semantic Segmentation of Intracranial Hemorrhages In Head CT Images
Authors:
Chia Shuo Chang,
Tian Sheuan Chang,
Jiun Lin Yan,
Li Ko
Abstract:
Intracranial hemorrhages in head CT scans serve as a first line tool to help specialists diagnose different types. However, their types have diverse shapes in the same type but similar confusing shape, size and location between types. To solve this problem, this paper proposes an all attention U-Net. It uses channel attentions in the U-Net encoder side to enhance class specific feature extraction,…
▽ More
Intracranial hemorrhages in head CT scans serve as a first line tool to help specialists diagnose different types. However, their types have diverse shapes in the same type but similar confusing shape, size and location between types. To solve this problem, this paper proposes an all attention U-Net. It uses channel attentions in the U-Net encoder side to enhance class specific feature extraction, and space and channel attentions in the U-Net decoder side for more accurate shape extraction and type classification. The simulation results show up to a 31.8\% improvement compared to baseline, ResNet50 + U-Net, and better performance than in cases with limited attention.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
SSVEP-DAN: A Data Alignment Network for SSVEP-based Brain Computer Interfaces
Authors:
Sung-Yu Chen,
Chi-Min Chang,
Kuan-Jung Chiang,
Chun-Shu Wei
Abstract:
Steady-state visual-evoked potential (SSVEP)-based brain-computer interfaces (BCIs) offer a non-invasive means of communication through high-speed speller systems. However, their efficiency heavily relies on individual training data obtained during time-consuming calibration sessions. To address the challenge of data insufficiency in SSVEP-based BCIs, we present SSVEP-DAN, the first dedicated neur…
▽ More
Steady-state visual-evoked potential (SSVEP)-based brain-computer interfaces (BCIs) offer a non-invasive means of communication through high-speed speller systems. However, their efficiency heavily relies on individual training data obtained during time-consuming calibration sessions. To address the challenge of data insufficiency in SSVEP-based BCIs, we present SSVEP-DAN, the first dedicated neural network model designed for aligning SSVEP data across different domains, which can encompass various sessions, subjects, or devices. Our experimental results across multiple cross-domain scenarios demonstrate SSVEP-DAN's capability to transform existing source SSVEP data into supplementary calibration data, significantly enhancing SSVEP decoding accuracy in scenarios with limited calibration data. We envision SSVEP-DAN as a catalyst for practical SSVEP-based BCI applications with minimal calibration. The source codes in this work are available at: https://github.com/CECNL/SSVEP-DAN.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Image-Domain Material Decomposition for Dual-energy CT using Unsupervised Learning with Data-fidelity Loss
Authors:
Junbo Peng,
Chih-Wei Chang,
Huiqiao Xie,
Richard L. J. Qiu,
Justin Roper,
Tonghe Wang,
Beth Bradshaw,
Xiangyang Tang,
Xiaofeng Yang
Abstract:
Background: Dual-energy CT (DECT) and material decomposition play vital roles in quantitative medical imaging. However, the decomposition process may suffer from significant noise amplification, leading to severely degraded image signal-to-noise ratios (SNRs). While existing iterative algorithms perform noise suppression using different image priors, these heuristic image priors cannot accurately…
▽ More
Background: Dual-energy CT (DECT) and material decomposition play vital roles in quantitative medical imaging. However, the decomposition process may suffer from significant noise amplification, leading to severely degraded image signal-to-noise ratios (SNRs). While existing iterative algorithms perform noise suppression using different image priors, these heuristic image priors cannot accurately represent the features of the target image manifold. Although deep learning-based decomposition methods have been reported, these methods are in the supervised-learning framework requiring paired data for training, which is not readily available in clinical settings.
Purpose: This work aims to develop an unsupervised-learning framework with data-measurement consistency for image-domain material decomposition in DECT.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
AI-Enabled Unmanned Vehicle-Assisted Reconfigurable Intelligent Surfaces: Deployment, Prototy**, Experiments, and Opportunities
Authors:
Li-Hsiang Shen,
Kai-Ten Feng,
Ta-Sung Lee,
Yuan-Chun Lin,
Shih-Cheng Lin,
Chia-Chan Chang,
Sheng-Fuh Chang
Abstract:
The requirement of wireless data demands is increasingly high as the sixth-generation (6G) technology evolves. Reconfigurable intelligent surface (RIS) is promisingly deemed to be one of 6G techniques for extending service coverage, reducing power consumption, and enhancing spectral efficiency. In this article, we have provided some fundamentals of RIS deployment in theory and hardware perspective…
▽ More
The requirement of wireless data demands is increasingly high as the sixth-generation (6G) technology evolves. Reconfigurable intelligent surface (RIS) is promisingly deemed to be one of 6G techniques for extending service coverage, reducing power consumption, and enhancing spectral efficiency. In this article, we have provided some fundamentals of RIS deployment in theory and hardware perspectives as well as utilization of artificial intelligence (AI) and machine learning. We conducted an intelligent deployment of RIS (i-Dris) prototype, including dual-band auto-guided vehicle (AGV) assisted RISs associated with an mmWave base station (BS) and a receiver. The RISs are deployed on the AGV with configured incident/reflection angles. While, both the mmWave BS and receiver are associated with an edge server monitoring downlink packets for obtaining system throughput. We have designed a federated multi-agent reinforcement learning scheme associated with several AGV-RIS agents and sub-agents per AGV-RIS consisting of the deployment of position, height, orientation and elevation angles. The experimental results presented the stationary measurement in different aspects and scenarios. The i-Dris can reach up to 980 Mbps transmission throughput under a bandwidth of 100 MHz with comparably low complexity as well as rapid deployment, which outperforms the other existing works. At last, we highlight some opportunities and future issues in leveraging RIS-empowered wireless communication networks.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Leveraging sinusoidal representation networks to predict fMRI signals from EEG
Authors:
Yamin Li,
Ange Lou,
Ziyuan Xu,
Shiyu Wang,
Catie Chang
Abstract:
In modern neuroscience, functional magnetic resonance imaging (fMRI) has been a crucial and irreplaceable tool that provides a non-invasive window into the dynamics of whole-brain activity. Nevertheless, fMRI is limited by hemodynamic blurring as well as high cost, immobility, and incompatibility with metal implants. Electroencephalography (EEG) is complementary to fMRI and can directly record the…
▽ More
In modern neuroscience, functional magnetic resonance imaging (fMRI) has been a crucial and irreplaceable tool that provides a non-invasive window into the dynamics of whole-brain activity. Nevertheless, fMRI is limited by hemodynamic blurring as well as high cost, immobility, and incompatibility with metal implants. Electroencephalography (EEG) is complementary to fMRI and can directly record the cortical electrical activity at high temporal resolution, but has more limited spatial resolution and is unable to recover information about deep subcortical brain structures. The ability to obtain fMRI information from EEG would enable cost-effective, imaging across a wider set of brain regions. Further, beyond augmenting the capabilities of EEG, cross-modality models would facilitate the interpretation of fMRI signals. However, as both EEG and fMRI are high-dimensional and prone to artifacts, it is currently challenging to model fMRI from EEG. To address this challenge, we propose a novel architecture that can predict fMRI signals directly from multi-channel EEG without explicit feature engineering. Our model achieves this by implementing a Sinusoidal Representation Network (SIREN) to learn frequency information in brain dynamics from EEG, which serves as the input to a subsequent encoder-decoder to effectively reconstruct the fMRI signal from a specific brain region. We evaluate our model using a simultaneous EEG-fMRI dataset with 8 subjects and investigate its potential for predicting subcortical fMRI signals. The present results reveal that our model outperforms a recent state-of-the-art model, and indicates the potential of leveraging periodic activation functions in deep neural networks to model functional neuroimaging data.
△ Less
Submitted 24 January, 2024; v1 submitted 5 November, 2023;
originally announced November 2023.
-
Full-dose Whole-body PET Synthesis from Low-dose PET Using High-efficiency Denoising Diffusion Probabilistic Model: PET Consistency Model
Authors:
Shaoyan Pan,
Elham Abouei,
Junbo Peng,
Joshua Qian,
Jacob F Wynne,
Tonghe Wang,
Chih-Wei Chang,
Justin Roper,
Jonathon A Nye,
Hui Mao,
Xiaofeng Yang
Abstract:
Objective: Positron Emission Tomography (PET) has been a commonly used imaging modality in broad clinical applications. One of the most important tradeoffs in PET imaging is between image quality and radiation dose: high image quality comes with high radiation exposure. Improving image quality is desirable for all clinical applications while minimizing radiation exposure is needed to reduce risk t…
▽ More
Objective: Positron Emission Tomography (PET) has been a commonly used imaging modality in broad clinical applications. One of the most important tradeoffs in PET imaging is between image quality and radiation dose: high image quality comes with high radiation exposure. Improving image quality is desirable for all clinical applications while minimizing radiation exposure is needed to reduce risk to patients. Approach: We introduce PET Consistency Model (PET-CM), an efficient diffusion-based method for generating high-quality full-dose PET images from low-dose PET images. It employs a two-step process, adding Gaussian noise to full-dose PET images in the forward diffusion, and then denoising them using a PET Shifted-window Vision Transformer (PET-VIT) network in the reverse diffusion. The PET-VIT network learns a consistency function that enables direct denoising of Gaussian noise into clean full-dose PET images. PET-CM achieves state-of-the-art image quality while requiring significantly less computation time than other methods. Results: In experiments comparing eighth-dose to full-dose images, PET-CM demonstrated impressive performance with NMAE of 1.278+/-0.122%, PSNR of 33.783+/-0.824dB, SSIM of 0.964+/-0.009, NCC of 0.968+/-0.011, HRS of 4.543, and SUV Error of 0.255+/-0.318%, with an average generation time of 62 seconds per patient. This is a significant improvement compared to the state-of-the-art diffusion-based model with PET-CM reaching this result 12x faster. Similarly, in the quarter-dose to full-dose image experiments, PET-CM delivered competitive outcomes, achieving an NMAE of 0.973+/-0.066%, PSNR of 36.172+/-0.801dB, SSIM of 0.984+/-0.004, NCC of 0.990+/-0.005, HRS of 4.428, and SUV Error of 0.151+/-0.192% using the same generation process, which underlining its high quantitative and clinical precision in both denoising scenario.
△ Less
Submitted 16 April, 2024; v1 submitted 24 August, 2023;
originally announced August 2023.
-
SALC: Skeleton-Assisted Learning-Based Clustering for Time-Varying Indoor Localization
Authors:
An-Hung Hsiao,
Li-Hsiang Shen,
Chen-Yi Chang,
Chun-Jie Chiu,
Kai-Ten Feng
Abstract:
Wireless indoor localization has attracted significant amount of attention in recent years. Using received signal strength (RSS) obtained from WiFi access points (APs) for establishing fingerprinting database is a widely utilized method in indoor localization. However, the time-variant problem for indoor positioning systems is not well-investigated in existing literature. Compared to conventional…
▽ More
Wireless indoor localization has attracted significant amount of attention in recent years. Using received signal strength (RSS) obtained from WiFi access points (APs) for establishing fingerprinting database is a widely utilized method in indoor localization. However, the time-variant problem for indoor positioning systems is not well-investigated in existing literature. Compared to conventional static fingerprinting, the dynamicallyreconstructed database can adapt to a highly-changing environment, which achieves sustainability of localization accuracy. To deal with the time-varying issue, we propose a skeleton-assisted learning-based clustering localization (SALC) system, including RSS-oriented map-assisted clustering (ROMAC), cluster-based online database establishment (CODE), and cluster-scaled location estimation (CsLE). The SALC scheme jointly considers similarities from the skeleton-based shortest path (SSP) and the time-varying RSS measurements across the reference points (RPs). ROMAC clusters RPs into different feature sets and therefore selects suitable monitor points (MPs) for enhancing location estimation. Moreover, the CODE algorithm aims for establishing adaptive fingerprint database to alleviate the timevarying problem. Finally, CsLE is adopted to acquire the target position by leveraging the benefits of clustering information and estimated signal variations in order to rescale the weights fromweighted k-nearest neighbors (WkNN) method. Both simulation and experimental results demonstrate that the proposed SALC system can effectively reconstruct the fingerprint database with an enhanced location estimation accuracy, which outperforms the other existing schemes in the open literature.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data
Authors:
Kai Chieh Chang,
Mark Hasegawa-Johnson,
Nancy L. McElwain,
Bashima Islam
Abstract:
Infant sleep is critical to brain and behavioral development. Prior studies on infant sleep/wake classification have been largely limited to reliance on expensive and burdensome polysomnography (PSG) tests in the laboratory or wearable devices that collect single-modality data. To facilitate data collection and accuracy of detection, we aimed to advance this field of study by using a multi-modal w…
▽ More
Infant sleep is critical to brain and behavioral development. Prior studies on infant sleep/wake classification have been largely limited to reliance on expensive and burdensome polysomnography (PSG) tests in the laboratory or wearable devices that collect single-modality data. To facilitate data collection and accuracy of detection, we aimed to advance this field of study by using a multi-modal wearable device, LittleBeats (LB), to collect audio, electrocardiogram (ECG), and inertial measurement unit (IMU) data among a cohort of 28 infants. We employed a 3-branch (audio/ECG/IMU) large scale transformer-based neural network (NN) to demonstrate the potential of such multi-modal data. We pretrained each branch independently with its respective modality, then finetuned the model by fusing the pretrained transformer layers with cross-attention. We show that multi-modal data significantly improves sleep/wake classification (accuracy = 0.880), compared with use of a single modality (accuracy = 0.732). Our approach to multi-modal mid-level fusion may be adaptable to a diverse range of architectures and tasks, expanding future directions of infant behavioral research.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
Weakly Supervised Lesion Detection and Diagnosis for Breast Cancers with Partially Annotated Ultrasound Images
Authors:
Jian Wang,
Liang Qiao,
Shichong Zhou,
** Zhou,
Jun Wang,
Juncheng Li,
Shihui Ying,
Cai Chang,
Jun Shi
Abstract:
Deep learning (DL) has proven highly effective for ultrasound-based computer-aided diagnosis (CAD) of breast cancers. In an automaticCAD system, lesion detection is critical for the following diagnosis. However, existing DL-based methods generally require voluminous manually-annotated region of interest (ROI) labels and class labels to train both the lesion detection and diagnosis models. In clini…
▽ More
Deep learning (DL) has proven highly effective for ultrasound-based computer-aided diagnosis (CAD) of breast cancers. In an automaticCAD system, lesion detection is critical for the following diagnosis. However, existing DL-based methods generally require voluminous manually-annotated region of interest (ROI) labels and class labels to train both the lesion detection and diagnosis models. In clinical practice, the ROI labels, i.e. ground truths, may not always be optimal for the classification task due to individual experience of sonologists, resulting in the issue of coarse annotation that limits the diagnosis performance of a CAD model. To address this issue, a novel Two-Stage Detection and Diagnosis Network (TSDDNet) is proposed based on weakly supervised learning to enhance diagnostic accuracy of the ultrasound-based CAD for breast cancers. In particular, all the ROI-level labels are considered as coarse labels in the first training stage, and then a candidate selection mechanism is designed to identify optimallesion areas for both the fully and partially annotated samples. It refines the current ROI-level labels in the fully annotated images and the detected ROIs in the partially annotated samples with a weakly supervised manner under the guidance of class labels. In the second training stage, a self-distillation strategy further is further proposed to integrate the detection network and classification network into a unified framework as the final CAD model for joint optimization, which then further improves the diagnosis performance. The proposed TSDDNet is evaluated on a B-mode ultrasound dataset, and the experimental results show that it achieves the best performance on both lesion detection and diagnosis tasks, suggesting promising application potential.
△ Less
Submitted 12 June, 2023;
originally announced June 2023.
-
Learning in Domain Randomization via Continuous Time Non-Stochastic Control
Authors:
**gwei Li,
**g Dong,
Can Chang,
Baoxiang Wang,
**gzhao Zhang
Abstract:
Domain randomization is a popular method for robustly training agents to adapt to diverse environments and real-world tasks. In this paper, we examine how to train an agent in domain randomization environments from a nonstochastic control perspective. We first theoretically study online control of continuous-time linear systems under nonstochastic noises. We present a novel two-level online algori…
▽ More
Domain randomization is a popular method for robustly training agents to adapt to diverse environments and real-world tasks. In this paper, we examine how to train an agent in domain randomization environments from a nonstochastic control perspective. We first theoretically study online control of continuous-time linear systems under nonstochastic noises. We present a novel two-level online algorithm, by integrating a higher-level learning strategy and a lower-level feedback control strategy. This method offers a practical solution, and for the first time achieves sublinear regret in continuous-time nonstochastic systems. Compared to standard online learning algorithms, our algorithm features a stack and skip procedure. By applying stack and skip to the SAC (Soft Actor-Critic) algorithm, we achieved improved results in multiple reinforcement learning tasks within domain randomization environments. Our work provides new insights into nonasymptotic analyses of controlling continuous-time systems. Further, our work justifies the importance of stacked and skipped in controller learning under nonstochastic environments.
△ Less
Submitted 14 December, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Synthetic CT Generation from MRI using 3D Transformer-based Denoising Diffusion Model
Authors:
Shaoyan Pan,
Elham Abouei,
Jacob Wynne,
Tonghe Wang,
Richard L. J. Qiu,
Yuheng Li,
Chih-Wei Chang,
Junbo Peng,
Justin Roper,
Pretesh Patel,
David S. Yu,
Hui Mao,
Xiaofeng Yang
Abstract:
Magnetic resonance imaging (MRI)-based synthetic computed tomography (sCT) simplifies radiation therapy treatment planning by eliminating the need for CT simulation and error-prone image registration, ultimately reducing patient radiation dose and setup uncertainty. We propose an MRI-to-CT transformer-based denoising diffusion probabilistic model (MC-DDPM) to transform MRI into high-quality sCT to…
▽ More
Magnetic resonance imaging (MRI)-based synthetic computed tomography (sCT) simplifies radiation therapy treatment planning by eliminating the need for CT simulation and error-prone image registration, ultimately reducing patient radiation dose and setup uncertainty. We propose an MRI-to-CT transformer-based denoising diffusion probabilistic model (MC-DDPM) to transform MRI into high-quality sCT to facilitate radiation treatment planning. MC-DDPM implements diffusion processes with a shifted-window transformer network to generate sCT from MRI. The proposed model consists of two processes: a forward process which adds Gaussian noise to real CT scans, and a reverse process in which a shifted-window transformer V-net (Swin-Vnet) denoises the noisy CT scans conditioned on the MRI from the same patient to produce noise-free CT scans. With an optimally trained Swin-Vnet, the reverse diffusion process was used to generate sCT scans matching MRI anatomy. We evaluated the proposed method by generating sCT from MRI on a brain dataset and a prostate dataset. Qualitative evaluation was performed using the mean absolute error (MAE) of Hounsfield unit (HU), peak signal to noise ratio (PSNR), multi-scale Structure Similarity index (MS-SSIM) and normalized cross correlation (NCC) indexes between ground truth CTs and sCTs. MC-DDPM generated brain sCTs with state-of-the-art quantitative results with MAE 43.317 HU, PSNR 27.046 dB, SSIM 0.965, and NCC 0.983. For the prostate dataset, MC-DDPM achieved MAE 59.953 HU, PSNR 26.920 dB, SSIM 0.849, and NCC 0.948. In conclusion, we have developed and validated a novel approach for generating CT images from routine MRIs using a transformer-based DDPM. This model effectively captures the complex relationship between CT and MRI images, allowing for robust and high-quality synthetic CT (sCT) images to be generated in minutes.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Cycle-guided Denoising Diffusion Probability Model for 3D Cross-modality MRI Synthesis
Authors:
Shaoyan Pan,
Chih-Wei Chang,
Junbo Peng,
Jiahan Zhang,
Richard L. J. Qiu,
Tonghe Wang,
Justin Roper,
Tian Liu,
Hui Mao,
Xiaofeng Yang
Abstract:
This study aims to develop a novel Cycle-guided Denoising Diffusion Probability Model (CG-DDPM) for cross-modality MRI synthesis. The CG-DDPM deploys two DDPMs that condition each other to generate synthetic images from two different MRI pulse sequences. The two DDPMs exchange random latent noise in the reverse processes, which helps to regularize both DDPMs and generate matching images in two mod…
▽ More
This study aims to develop a novel Cycle-guided Denoising Diffusion Probability Model (CG-DDPM) for cross-modality MRI synthesis. The CG-DDPM deploys two DDPMs that condition each other to generate synthetic images from two different MRI pulse sequences. The two DDPMs exchange random latent noise in the reverse processes, which helps to regularize both DDPMs and generate matching images in two modalities. This improves image-to-image translation ac-curacy. We evaluated the CG-DDPM quantitatively using mean absolute error (MAE), multi-scale structural similarity index measure (MSSIM), and peak sig-nal-to-noise ratio (PSNR), as well as the network synthesis consistency, on the BraTS2020 dataset. Our proposed method showed high accuracy and reliable consistency for MRI synthesis. In addition, we compared the CG-DDPM with several other state-of-the-art networks and demonstrated statistically significant improvements in the image quality of synthetic MRIs. The proposed method enhances the capability of current multimodal MRI synthesis approaches, which could contribute to more accurate diagnosis and better treatment planning for patients by synthesizing additional MRI modalities.
△ Less
Submitted 28 April, 2023;
originally announced May 2023.
-
An Efficient Hash-based Data Structure for Dynamic Vision Sensors and its Application to Low-energy Low-memory Noise Filtering
Authors:
Pradeep Kumar Gopalakrishnan,
Chip-Hong Chang,
Arindam Basu
Abstract:
Events generated by the Dynamic Vision Sensor (DVS) are generally stored and processed in two-dimensional data structures whose memory complexity and energy-per-event scale proportionately with increasing sensor dimensions. In this paper, we propose a new two-dimensional data structure (BF_2) that takes advantage of the sparsity of events and enables compact storage of data using hash functions. I…
▽ More
Events generated by the Dynamic Vision Sensor (DVS) are generally stored and processed in two-dimensional data structures whose memory complexity and energy-per-event scale proportionately with increasing sensor dimensions. In this paper, we propose a new two-dimensional data structure (BF_2) that takes advantage of the sparsity of events and enables compact storage of data using hash functions. It overcomes the saturation issue in the Bloom Filter (BF) and the memory reset issue in other hash-based arrays by using a second dimension to clear 1 out of D rows at regular intervals. A hardware-friendly, low-power, and low-memory-footprint noise filter for DVS is demonstrated using BF_2. For the tested datasets, the performance of the filter matches those of state-of-the-art filters like the BAF/STCF while consuming less than 10% and 15% of their memory and energy-per-event, respectively, for a correlation time constant Tau = 5 ms. The memory and energy advantages of the proposed filter increase with increasing sensor sizes. The proposed filter compares favourably with other hardware-friendly, event-based filters in hardware complexity, memory requirement and energy-per-event - as demonstrated through its implementation on an FPGA. The parameters of the data structure can be adjusted for trade-offs between performance and memory consumption, based on application requirements.
△ Less
Submitted 28 April, 2023;
originally announced April 2023.
-
Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations
Authors:
Yu-Hui Chen,
Raman Sarokin,
Juhyun Lee,
Jiuqiang Tang,
Chuo-Ling Chang,
Andrei Kulik,
Matthias Grundmann
Abstract:
The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, commo…
▽ More
The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, common large diffusion models have over 1 billion parameters and pose challenges due to restricted computational and memory resources on devices. We present a series of implementation optimizations for large diffusion models that achieve the fastest reported inference latency to-date (under 12 seconds for Stable Diffusion 1.4 without int8 quantization on Samsung S23 Ultra for a 512x512 image with 20 iterations) on GPU-equipped mobile devices. These enhancements broaden the applicability of generative AI and improve the overall user experience across a wide range of devices.
△ Less
Submitted 16 June, 2023; v1 submitted 21 April, 2023;
originally announced April 2023.
-
Attention-based Learning for Sleep Apnea and Limb Movement Detection using Wi-Fi CSI Signals
Authors:
Chi-Che Chang,
An-Hung Hsiao,
Li-Hsiang Shen,
Kai-Ten Feng,
Chia-Yu Chen
Abstract:
Wi-Fi channel state information (CSI) has become a promising solution for non-invasive breathing and body motion monitoring during sleep. Sleep disorders of apnea and periodic limb movement disorder (PLMD) are often unconscious and fatal. The existing researches detect abnormal sleep disorders in impractically controlled environments. Moreover, it leads to compelling challenges to classify complex…
▽ More
Wi-Fi channel state information (CSI) has become a promising solution for non-invasive breathing and body motion monitoring during sleep. Sleep disorders of apnea and periodic limb movement disorder (PLMD) are often unconscious and fatal. The existing researches detect abnormal sleep disorders in impractically controlled environments. Moreover, it leads to compelling challenges to classify complex macro- and micro-scales of sleep movements as well as entangled similar waveforms of cases of apnea and PLMD. In this paper, we propose the attention-based learning for sleep apnea and limb movement detection (ALESAL) system that can jointly detect sleep apnea and PLMD under different sleep postures across a variety of patients. ALESAL contains antenna-pair and time attention mechanisms for mitigating the impact of modest antenna pairs and emphasizing the duration of interest, respectively. Performance results show that our proposed ALESAL system can achieve a weighted F1-score of 84.33, outperforming the other existing non-attention based methods of support vector machine and deep multilayer perceptron.
△ Less
Submitted 26 March, 2023;
originally announced April 2023.
-
A Privacy Preserving Distributed Model Identification Algorithm for Power Distribution Systems
Authors:
Chin-Yao Chang
Abstract:
Distributed control/optimization is a promising approach for network systems due to its advantages over centralized schemes, such as robustness, cost-effectiveness, and improved privacy. However, distributed methods can have drawbacks, such as slower convergence rates due to limited knowledge of the overall network model. Additionally, ensuring privacy in the communication of sensitive information…
▽ More
Distributed control/optimization is a promising approach for network systems due to its advantages over centralized schemes, such as robustness, cost-effectiveness, and improved privacy. However, distributed methods can have drawbacks, such as slower convergence rates due to limited knowledge of the overall network model. Additionally, ensuring privacy in the communication of sensitive information can pose implementation challenges. To address this issue, we propose a distributed model identification algorithm that enables each agent to identify the sub-model that characterizes the relationship between its local control and the overall system outputs. The proposed algorithm maintains the privacy of local agents by only communicating through dummy variables. We demonstrate the efficacy of our algorithm in the context of power distribution systems by applying it to the voltage regulation of a modified IEEE distribution system. The proposed algorithm is well-suited to the needs of power distribution controls and offers an effective solution to the challenges of distributed model identification in network systems.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Compressing Transformer-based self-supervised models for speech processing
Authors:
Tzu-Quan Lin,
Tsung-Huan Yang,
Chun-Yao Chang,
Kuang-Ming Chen,
Tzu-hsun Feng,
Hung-yi Lee,
Hao Tang
Abstract:
Despite the success of Transformers in self- supervised learning with applications to various downstream tasks, the computational cost of training and inference remains a major challenge for applying these models to a wide spectrum of devices. Several isolated attempts have been made to compress Transformers, but the settings and metrics are different across studies. Trade-off at various compressi…
▽ More
Despite the success of Transformers in self- supervised learning with applications to various downstream tasks, the computational cost of training and inference remains a major challenge for applying these models to a wide spectrum of devices. Several isolated attempts have been made to compress Transformers, but the settings and metrics are different across studies. Trade-off at various compression rates are also largely missing in prior work, making it difficult to compare compression techniques. In this work, we aim to provide context for the isolated results, studying several commonly used compression techniques, including weight pruning, head pruning, low-rank approximation, and knowledge distillation. We report trade- off at various compression rate, including wall-clock time, the number of parameters, and the number of multiply-accumulate operations. Our results show that compared to recent approaches, basic compression techniques are strong baselines. We further present several applications of our results, revealing properties of Transformers, such as the significance of diagonal attention heads. In addition, our results lead to a simple combination of compression techniques that improves trade-off over recent approaches. We hope the results would promote more diverse comparisons among model compression techniques and promote the use of model compression as a tool for analyzing models. Our code of compressing speech self-supervised model is available at https://github.com/nervjack2/Speech-SSL-Compression/.
△ Less
Submitted 26 January, 2024; v1 submitted 17 November, 2022;
originally announced November 2022.
-
Controlling Commercial Cooling Systems Using Reinforcement Learning
Authors:
Jerry Luo,
Cosmin Paduraru,
Octavian Voicu,
Yuri Chervonyi,
Scott Munns,
Jerry Li,
Crystal Qian,
Praneet Dutta,
Jared Quincy Davis,
Ningjia Wu,
Xingwei Yang,
Chu-Ming Chang,
Ted Li,
Rob Rose,
Mingyan Fan,
Hootan Nakhost,
Tinglin Liu,
Brian Kirkman,
Frank Altamura,
Lee Cline,
Patrick Tonker,
Joel Gouker,
Dave Uden,
Warren Buddy Bryan,
Jason Law
, et al. (11 additional authors not shown)
Abstract:
This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments ha…
▽ More
This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites.
△ Less
Submitted 14 December, 2022; v1 submitted 11 November, 2022;
originally announced November 2022.
-
Learned Video Compression for YUV 4:2:0 Content Using Flow-based Conditional Inter-frame Coding
Authors:
Yung-Han Ho,
Chih-Hsuan Lin,
Peng-Yu Chen,
Mu-Jung Chen,
Chih-Peng Chang,
Wen-Hsiao Peng,
Hsueh-Ming Hang
Abstract:
This paper proposes a learning-based video compression framework for variable-rate coding on YUV 4:2:0 content. Most existing learning-based video compression models adopt the traditional hybrid-based coding architecture, which involves temporal prediction followed by residual coding. However, recent studies have shown that residual coding is sub-optimal from the information-theoretic perspective.…
▽ More
This paper proposes a learning-based video compression framework for variable-rate coding on YUV 4:2:0 content. Most existing learning-based video compression models adopt the traditional hybrid-based coding architecture, which involves temporal prediction followed by residual coding. However, recent studies have shown that residual coding is sub-optimal from the information-theoretic perspective. In addition, most existing models are optimized with respect to RGB content. Furthermore, they require separate models for variable-rate coding. To address these issues, this work presents an attempt to incorporate the conditional inter-frame coding for YUV 4:2:0 content. We introduce a conditional flow-based inter-frame coder to improve the inter-frame coding efficiency. To adapt our codec to YUV 4:2:0 content, we adopt a simple strategy of using space-to-depth and depth-to-space conversions. Lastly, we employ a rate-adaption net to achieve variable-rate coding without training multiple models. Experimental results show that our model performs better than x265 on UVG and MCL-JCV datasets in terms of PSNR-YUV. However, on the more challenging datasets from ISCAS'22 GC, there is still ample room for improvement. This insufficient performance is due to the lack of inter-frame coding capability at a large GOP size and can be mitigated by increasing the model capacity and applying an error propagation-aware training strategy.
△ Less
Submitted 15 October, 2022;
originally announced October 2022.
-
Learnable Mixed-precision and Dimension Reduction Co-design for Low-storage Activation
Authors:
Yu-Shan Tai,
Cheng-Yang Chang,
Chieh-Fang Teng,
AnYeu,
Wu
Abstract:
Recently, deep convolutional neural networks (CNNs) have achieved many eye-catching results. However, deploying CNNs on resource-constrained edge devices is constrained by limited memory bandwidth for transmitting large intermediated data during inference, i.e., activation. Existing research utilizes mixed-precision and dimension reduction to reduce computational complexity but pays less attention…
▽ More
Recently, deep convolutional neural networks (CNNs) have achieved many eye-catching results. However, deploying CNNs on resource-constrained edge devices is constrained by limited memory bandwidth for transmitting large intermediated data during inference, i.e., activation. Existing research utilizes mixed-precision and dimension reduction to reduce computational complexity but pays less attention to its application for activation compression. To further exploit the redundancy in activation, we propose a learnable mixed-precision and dimension reduction co-design system, which separates channels into groups and allocates specific compression policies according to their importance. In addition, the proposed dynamic searching technique enlarges search space and finds out the optimal bit-width allocation automatically. Our experimental results show that the proposed methods improve 3.54%/1.27% in accuracy and save 0.18/2.02 bits per value over existing mixed-precision methods on ResNet18 and MobileNetv2, respectively.
△ Less
Submitted 18 July, 2022; v1 submitted 16 July, 2022;
originally announced July 2022.
-
CANF-VC: Conditional Augmented Normalizing Flows for Video Compression
Authors:
Yung-Han Ho,
Chih-Peng Chang,
Peng-Yu Chen,
Alessandro Gnutti,
Wen-Hsiao Peng
Abstract:
This paper presents an end-to-end learning-based video compression system, termed CANF-VC, based on conditional augmented normalizing flows (CANF). Most learned video compression systems adopt the same hybrid-based coding architecture as the traditional codecs. Recent research on conditional coding has shown the sub-optimality of the hybrid-based coding and opens up opportunities for deep generati…
▽ More
This paper presents an end-to-end learning-based video compression system, termed CANF-VC, based on conditional augmented normalizing flows (CANF). Most learned video compression systems adopt the same hybrid-based coding architecture as the traditional codecs. Recent research on conditional coding has shown the sub-optimality of the hybrid-based coding and opens up opportunities for deep generative models to take a key role in creating new coding frameworks. CANF-VC represents a new attempt that leverages the conditional ANF to learn a video generative model for conditional inter-frame coding. We choose ANF because it is a special type of generative model, which includes variational autoencoder as a special case and is able to achieve better expressiveness. CANF-VC also extends the idea of conditional coding to motion coding, forming a purely conditional coding framework. Extensive experimental results on commonly used datasets confirm the superiority of CANF-VC to the state-of-the-art methods. The source code of CANF-VC is available at https://github.com/NYCU-MAPL/CANF-VC.
△ Less
Submitted 14 August, 2022; v1 submitted 12 July, 2022;
originally announced July 2022.
-
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation
Authors:
Chih-Chiang Chang,
Hung-yi Lee
Abstract:
Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech before the complete input is observed. A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write. While recent works had proposed various strategies to improve the pre-decision, they mainly adopt the fixe…
▽ More
Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech before the complete input is observed. A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write. While recent works had proposed various strategies to improve the pre-decision, they mainly adopt the fixed wait-k policy, leaving the adaptive policies rarely explored. This paper proposes to model the adaptive policy by adapting the Continuous Integrate-and-Fire (CIF). Compared with monotonic multihead attention (MMA), our method has the advantage of simpler computation, superior quality at low latency, and better generalization to long utterances. We conduct experiments on the MuST-C V2 dataset and show the effectiveness of our approach.
△ Less
Submitted 3 October, 2022; v1 submitted 22 March, 2022;
originally announced April 2022.
-
Wi-Fi and Bluetooth Contact Tracing Without User Intervention
Authors:
Brosnan Yuen,
Yifeng Bie,
Duncan Cairns,
Geoffrey Harper,
Jason Xu,
Charles Chang,
Xiaodai Dong,
Tao Lu
Abstract:
Previous contact tracing systems required the users to perform many manual actions, such as installing smartphone applications, joining wireless networks, or carrying custom user devices. This increases the barrier to entry and lowers the user adoption rate. As a result, the contact tracing effectiveness is reduced. Unlike the systems above, we propose a new privacy preserving Wi-Fi and Bluetooth…
▽ More
Previous contact tracing systems required the users to perform many manual actions, such as installing smartphone applications, joining wireless networks, or carrying custom user devices. This increases the barrier to entry and lowers the user adoption rate. As a result, the contact tracing effectiveness is reduced. Unlike the systems above, we propose a new privacy preserving Wi-Fi and Bluetooth (BLE) contact tracing system that does not require smartphone applications, joining wireless networks, or custom user devices. Our specially built routers seamlessly track smartphones, laptops, smartwatches, BLE headphones, and tablets without any user action, but do not trace user identity. Map** between devices and users is only carried out for confirmed cases and suspected contacts. Moreover, we can track the absolute positions of user devices within 1.0 m due to using bidirectional long short-term memory neural networks that are trained with data pre-collected by an autonomous robot. This allows public health authorities to track indirect droplet and surface transmissions that other contact tracing systems often overlook.
△ Less
Submitted 23 July, 2022; v1 submitted 30 March, 2022;
originally announced April 2022.
-
The Dark Side: Security Concerns in Machine Learning for EDA
Authors:
Zhiyao Xie,
**gyu Pan,
Chen-Chia Chang,
Yiran Chen
Abstract:
The growing IC complexity has led to a compelling need for design efficiency improvement through new electronic design automation (EDA) methodologies. In recent years, many unprecedented efficient EDA methods have been enabled by machine learning (ML) techniques. While ML demonstrates its great potential in circuit design, however, the dark side about security problems, is seldomly discussed. This…
▽ More
The growing IC complexity has led to a compelling need for design efficiency improvement through new electronic design automation (EDA) methodologies. In recent years, many unprecedented efficient EDA methods have been enabled by machine learning (ML) techniques. While ML demonstrates its great potential in circuit design, however, the dark side about security problems, is seldomly discussed. This paper gives a comprehensive and impartial summary of all security concerns we have observed in ML for EDA. Many of them are hidden or neglected by practitioners in this field. In this paper, we first provide our taxonomy to define four major types of security concerns, then we analyze different application scenarios and special properties in ML for EDA. After that, we present our detailed analysis of each security concern with experiments.
△ Less
Submitted 20 March, 2022;
originally announced March 2022.
-
On the predictability in reversible steganography
Authors:
Ching-Chun Chang,
Xu Wang,
Sisheng Chen,
Hitoshi Kiya,
Isao Echizen
Abstract:
Artificial neural networks have advanced the frontiers of reversible steganography. The core strength of neural networks is the ability to render accurate predictions for a bewildering variety of data. Residual modulation is recognised as the most advanced reversible steganographic algorithm for digital images. The pivot of this algorithm is predictive analytics in which pixel intensities are pred…
▽ More
Artificial neural networks have advanced the frontiers of reversible steganography. The core strength of neural networks is the ability to render accurate predictions for a bewildering variety of data. Residual modulation is recognised as the most advanced reversible steganographic algorithm for digital images. The pivot of this algorithm is predictive analytics in which pixel intensities are predicted given some pixel-wise contextual information. This task can be perceived as a low-level vision problem and hence neural networks for addressing a similar class of problems can be deployed. On top of the prior art, this paper investigates predictability of pixel intensities based on supervised and unsupervised learning frameworks. Predictability analysis enables adaptive data embedding, which in turn leads to a better trade-off between capacity and imperceptibility. While conventional methods estimate predictability by the statistics of local image patterns, learning-based frameworks consider further the degree to which correct predictions can be made by a designated predictor. Not only should the image patterns be taken into account but also the predictor in use. Experimental results show that steganographic performance can be significantly improved by incorporating the learning-based predictability analysers into a reversible steganographic system.
△ Less
Submitted 7 March, 2023; v1 submitted 5 February, 2022;
originally announced February 2022.
-
Instrumented shoulder functional assessment using inertial measurement units for frozen shoulder
Authors:
Ting-Yang Lu,
Kai-Chun Liu,
Chia-Yeh Hsieh,
Chih-Ya Chang,
Yu Tsao,
Chia-Tai Chan
Abstract:
Frozen shoulder (FS) is a shoulder condition that leads to pain and loss of shoulder range of motion. FS patients have difficulties in independently performing daily activities. Inertial measurement units (IMUs) have been developed to objectively measure upper limb range of motion (ROM) and shoulder function. In this work, we propose an IMU-based shoulder functional task assessment with kinematic…
▽ More
Frozen shoulder (FS) is a shoulder condition that leads to pain and loss of shoulder range of motion. FS patients have difficulties in independently performing daily activities. Inertial measurement units (IMUs) have been developed to objectively measure upper limb range of motion (ROM) and shoulder function. In this work, we propose an IMU-based shoulder functional task assessment with kinematic parameters (e.g., smoothness, power, speed, and duration) in FS patients and analyze the functional performance on complete shoulder tasks and subtasks. Twenty FS patients and twenty healthy subjects were recruited in this study. Five shoulder functional tasks are performed by participants, such as washing hair (WH), washing upper back (WUB), washing lower back (WLB), placing an object on a high shelf (POH), and removing an object from back pocket (ROP). The results demonstrate that the used smoothness features can reflect the differences of movement fluency between FS patients and healthy controls (p < 0.05 and effect size > 0.8). Moreover, features of subtasks provided subtle information related to clinical conditions that have not been revealed in features of a complete task, especially the defined subtask 1 and 2 of each task.
△ Less
Submitted 25 November, 2021;
originally announced November 2021.
-
Music Score Expansion with Variable-Length Infilling
Authors:
Chih-Pin Tan,
Chin-Jui Chang,
Alvin W. Y. Su,
Yi-Hsuan Yang
Abstract:
In this paper, we investigate using the variable-length infilling (VLI) model, which is originally proposed to infill missing segments, to "prolong" existing musical segments at musical boundaries. Specifically, as a case study, we expand 20 musical segments from 12 bars to 16 bars, and examine the degree to which the VLI model preserves musical boundaries in the expanded results using a few objec…
▽ More
In this paper, we investigate using the variable-length infilling (VLI) model, which is originally proposed to infill missing segments, to "prolong" existing musical segments at musical boundaries. Specifically, as a case study, we expand 20 musical segments from 12 bars to 16 bars, and examine the degree to which the VLI model preserves musical boundaries in the expanded results using a few objective metrics, including the Register Histogram Similarity we newly propose. The results show that the VLI model has the potential to address the expansion task.
△ Less
Submitted 10 November, 2021;
originally announced November 2021.
-
Compression-aware Projection with Greedy Dimension Reduction for Convolutional Neural Network Activations
Authors:
Yu-Shan Tai,
Chieh-Fang Teng,
Cheng-Yang Chang,
An-Yeu Wu
Abstract:
Convolutional neural networks (CNNs) achieve remarkable performance in a wide range of fields. However, intensive memory access of activations introduces considerable energy consumption, impeding deployment of CNNs on resourceconstrained edge devices. Existing works in activation compression propose to transform feature maps for higher compressibility, thus enabling dimension reduction. Neverthele…
▽ More
Convolutional neural networks (CNNs) achieve remarkable performance in a wide range of fields. However, intensive memory access of activations introduces considerable energy consumption, impeding deployment of CNNs on resourceconstrained edge devices. Existing works in activation compression propose to transform feature maps for higher compressibility, thus enabling dimension reduction. Nevertheless, in the case of aggressive dimension reduction, these methods lead to severe accuracy drop. To improve the trade-off between classification accuracy and compression ratio, we propose a compression-aware projection system, which employs a learnable projection to compensate for the reconstruction loss. In addition, a greedy selection metric is introduced to optimize the layer-wise compression ratio allocation by considering both accuracy and #bits reduction simultaneously. Our test results show that the proposed methods effectively reduce 2.91x~5.97x memory access with negligible accuracy drop on MobileNetV2/ResNet18/VGG16.
△ Less
Submitted 17 October, 2021;
originally announced October 2021.
-
Intelligent Traffic Control System by Using Image Information
Authors:
Zong-Ming Lin,
Cheng-Yang Chang,
Chin-Yu Hu,
Yung-Yuan Chen
Abstract:
This paper implements a traffic signal control system by using real-time traffic flow feedback. This system is designed to deal with two-lane intersections. We construct an experiment field similar to the roads and drivers in Taiwan using an autonomous simulation software called Virtual Test Drive (VTD) released by MSC Software. We erect four cameras on the side of the roads to get the image of th…
▽ More
This paper implements a traffic signal control system by using real-time traffic flow feedback. This system is designed to deal with two-lane intersections. We construct an experiment field similar to the roads and drivers in Taiwan using an autonomous simulation software called Virtual Test Drive (VTD) released by MSC Software. We erect four cameras on the side of the roads to get the image of the intersection, then transfer the image information into traffic flow information. Analyze the traffic information in each lane by using Greenshields traffic flow model. Control the traffic signals by using Webster's method to increase the performance and soothe the traffic.
△ Less
Submitted 21 September, 2021;
originally announced September 2021.
-
BERT-like Pre-training for Symbolic Piano Music Classification Tasks
Authors:
Yi-Hui Chou,
I-Chun Chen,
Chin-Jui Chang,
Joann Ching,
Yi-Hsuan Yang
Abstract:
This article presents a benchmark study of symbolic piano music classification using the masked language modelling approach of the Bidirectional Encoder Representations from Transformers (BERT). Specifically, we consider two types of MIDI data: MIDI scores, which are musical scores rendered directly into MIDI with no dynamics and precisely aligned with the metrical grid notated by its composer and…
▽ More
This article presents a benchmark study of symbolic piano music classification using the masked language modelling approach of the Bidirectional Encoder Representations from Transformers (BERT). Specifically, we consider two types of MIDI data: MIDI scores, which are musical scores rendered directly into MIDI with no dynamics and precisely aligned with the metrical grid notated by its composer and MIDI performances, which are MIDI encodings of human performances of musical scoresheets. With five public-domain datasets of single-track piano MIDI files, we pre-train two 12-layer Transformer models using the BERT approach, one for MIDI scores and the other for MIDI performances, and fine-tune them for four downstream classification tasks. These include two note-level classification tasks (melody extraction and velocity prediction) and two sequence-level classification tasks (style classification and emotion classification). Our evaluation shows that the BERT approach leads to higher classification accuracy than recurrent neural network (RNN)-based baselines.
△ Less
Submitted 13 April, 2024; v1 submitted 12 July, 2021;
originally announced July 2021.
-
Deep Learning for Predictive Analytics in Reversible Steganography
Authors:
Ching-Chun Chang,
Xu Wang,
Sisheng Chen,
Isao Echizen,
Victor Sanchez,
Chang-Tsun Li
Abstract:
Deep learning is regarded as a promising solution for reversible steganography. There is an accelerating trend of representing a reversible steo-system by monolithic neural networks, which bypass intermediate operations in traditional pipelines of reversible steganography. This end-to-end paradigm, however, suffers from imperfect reversibility. By contrast, the modular paradigm that incorporates n…
▽ More
Deep learning is regarded as a promising solution for reversible steganography. There is an accelerating trend of representing a reversible steo-system by monolithic neural networks, which bypass intermediate operations in traditional pipelines of reversible steganography. This end-to-end paradigm, however, suffers from imperfect reversibility. By contrast, the modular paradigm that incorporates neural networks into modules of traditional pipelines can stably guarantee reversibility with mathematical explainability. Prediction-error modulation is a well-established reversible steganography pipeline for digital images. It consists of a predictive analytics module and a reversible coding module. Given that reversibility is governed independently by the coding module, we narrow our focus to the incorporation of neural networks into the analytics module, which serves the purpose of predicting pixel intensities and a pivotal role in determining capacity and imperceptibility. The objective of this study is to evaluate the impacts of different training configurations upon predictive accuracy of neural networks and provide practical insights. In particular, we investigate how different initialisation strategies for input images may affect the learning process and how different training strategies for dual-layer prediction respond to the problem of distributional shift. Furthermore, we compare steganographic performance of various model architectures with different loss functions.
△ Less
Submitted 7 March, 2023; v1 submitted 13 June, 2021;
originally announced June 2021.
-
Closed-loop Control Design and Motor Allocation for a Lower-limb Cable-driven Exoskeleton: A Switched Systems Approach
Authors:
Chen-Hao Chang,
Jonathan Casas,
Victor H. Duenas
Abstract:
Powered lower-limb exoskeletons provide assistive torques to coordinate limb motion during walking in individuals with movement disorders. Advances in sensing and actuation have improved the wearability and portability of state-of-the-art exoskeletons for walking. Cable-driven exoskeletons offload the actuators away from the user, thus rendering light-weight devices to facilitate locomotion traini…
▽ More
Powered lower-limb exoskeletons provide assistive torques to coordinate limb motion during walking in individuals with movement disorders. Advances in sensing and actuation have improved the wearability and portability of state-of-the-art exoskeletons for walking. Cable-driven exoskeletons offload the actuators away from the user, thus rendering light-weight devices to facilitate locomotion training. However, cable-driven mechanisms experience a slacking behavior if tension is not accurately controlled. Moreover, counteracting forces can arise between the agonist and antagonist motors yielding undesired joint motion. In this paper, the strategy is to develop two control layers to improve the performance of a cable-driven exoskeleton. First, a joint tracking controller is designed using a high-gain robust approach to track desired knee and hip trajectories. Second, a motor synchronization objective is developed to mitigate the effects of cable slacking for a pair of electric motors that actuate each joint. A sliding-mode robust controller is designed for the motor synchronization objective. A Lyapunov-based stability analysis is developed to guarantee a uniformly ultimately bounded result for joint tracking and exponential tracking for the motor synchronization objective. Moreover, an average dwell time analysis provides a bound on the number of motor switches when allocating the control between motors that actuate each joint. An experimental result with an able-bodied individual illustrates the feasibility of the developed control methods.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information
Authors:
Yen-Ju Lu,
Chia-Yu Chang,
Cheng Yu,
Ching-Feng Liu,
Jeih-weih Hung,
Shinji Watanabe,
Yu Tsao
Abstract:
Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the broad phonetic properties of the input speech when performing enhancement to attain performance improvements. In this paper, we explore the contextual information of articulatory attributes as additional information t…
▽ More
Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the broad phonetic properties of the input speech when performing enhancement to attain performance improvements. In this paper, we explore the contextual information of articulatory attributes as additional information to further benefit SE. More specifically, we propose to improve the SE performance by leveraging losses from an end-to-end automatic speech recognition (E2E-ASR) model that predicts the sequence of broad phonetic classes (BPCs). We also developed multi-objective training with ASR and perceptual losses to train the SE system based on a BPC-based E2E-ASR. Experimental results from speech denoising, speech dereverberation, and impaired speech enhancement tasks confirmed that contextual BPC information improves SE performance. Moreover, the SE model trained with the BPC-based E2E-ASR outperforms that with the phoneme-based E2E-ASR. The results suggest that objectives with misclassification of phonemes by the ASR system may lead to imperfect feedback, and BPC could be a potentially better choice. Finally, it is noted that combining the most-confusable phonetic targets into the same BPC when calculating the additional objective can effectively improve the SE performance.
△ Less
Submitted 18 June, 2023; v1 submitted 14 November, 2020;
originally announced November 2020.
-
Using Convolutional Variational Autoencoders to Predict Post-Trauma Health Outcomes from Actigraphy Data
Authors:
Ayse S. Cakmak,
Nina Thigpen,
Garrett Honke,
Erick Perez Alday,
Ali Bahrami Rad,
Rebecca Adaimi,
Chia Jung Chang,
Qiao Li,
Pramod Gupta,
Thomas Neylan,
Samuel A. McLean,
Gari D. Clifford
Abstract:
Depression and post-traumatic stress disorder (PTSD) are psychiatric conditions commonly associated with experiencing a traumatic event. Estimating mental health status through non-invasive techniques such as activity-based algorithms can help to identify successful early interventions. In this work, we used locomotor activity captured from 1113 individuals who wore a research grade smartwatch pos…
▽ More
Depression and post-traumatic stress disorder (PTSD) are psychiatric conditions commonly associated with experiencing a traumatic event. Estimating mental health status through non-invasive techniques such as activity-based algorithms can help to identify successful early interventions. In this work, we used locomotor activity captured from 1113 individuals who wore a research grade smartwatch post-trauma. A convolutional variational autoencoder (VAE) architecture was used for unsupervised feature extraction from four weeks of actigraphy data. By using VAE latent variables and the participant's pre-trauma physical health status as features, a logistic regression classifier achieved an area under the receiver operating characteristic curve (AUC) of 0.64 to estimate mental health outcomes. The results indicate that the VAE model is a promising approach for actigraphy data analysis for mental health outcomes in long-term studies.
△ Less
Submitted 19 November, 2020; v1 submitted 14 November, 2020;
originally announced November 2020.
-
Enabling DER Participation in Frequency Regulation Markets
Authors:
Priyank Srivastava,
Chin-Yao Chang,
Jorge Cortes
Abstract:
Distributed energy resources (DERs) are playing an increasing role in ancillary services for the bulk grid, particularly in frequency regulation. In this paper, we propose a framework for collections of DERs, combined to form microgrids and controlled by aggregators, to participate in frequency regulation markets. Our approach covers both the identification of bids for the market clearing stage an…
▽ More
Distributed energy resources (DERs) are playing an increasing role in ancillary services for the bulk grid, particularly in frequency regulation. In this paper, we propose a framework for collections of DERs, combined to form microgrids and controlled by aggregators, to participate in frequency regulation markets. Our approach covers both the identification of bids for the market clearing stage and the mechanisms for the real-time allocation of the regulation signal. The proposed framework is hierarchical, consisting of a top layer and a bottom layer. The top layer consists of the aggregators communicating in a distributed fashion to optimally disaggregate the regulation signal requested by the system operator. The bottom layer consists of the DERs inside each microgrid whose power levels are adjusted so that the tie line power matches the output of the corresponding aggregator in the top layer. The coordination at the top layer requires the knowledge of cost functions, ramp rates and capacity bounds of the aggregators. We develop meaningful abstractions for these quantities respecting the power flow constraints and taking into account the load uncertainties, and propose a provably correct distributed algorithm for optimal disaggregation of regulation signal amongst the microgrids.
△ Less
Submitted 29 January, 2021; v1 submitted 8 November, 2020;
originally announced November 2020.
-
Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion
Authors:
Che-Jui Chang
Abstract:
Cross-lingual voice conversion (VC) is a task that aims to synthesize target voices with the same content while source and target speakers speak in different languages. Its challenge lies in the fact that the source and target data are naturally non-parallel, and it is even difficult to bridge the gaps between languages with no transcriptions provided. In this paper, we focus on knowledge transfer…
▽ More
Cross-lingual voice conversion (VC) is a task that aims to synthesize target voices with the same content while source and target speakers speak in different languages. Its challenge lies in the fact that the source and target data are naturally non-parallel, and it is even difficult to bridge the gaps between languages with no transcriptions provided. In this paper, we focus on knowledge transfer from monolin-gual ASR to cross-lingual VC, in order to address the con-tent mismatch problem. To achieve this, we first train a monolingual acoustic model for the source language, use it to extract phonetic features for all the speech in the VC dataset, and then train a Seq2Seq conversion model to pre-dict the mel-spectrograms. We successfully address cross-lingual VC without any transcription or language-specific knowledge for foreign speech. We experiment this on Voice Conversion Challenge 2020 datasets and show that our speaker-dependent conversion model outperforms the zero-shot baseline, achieving MOS of 3.83 and 3.54 in speech quality and speaker similarity for cross-lingual conversion. When compared to Cascade ASR-TTS method, our proposed one significantly reduces the MOS drop be-tween intra- and cross-lingual conversion.
△ Less
Submitted 30 September, 2020;
originally announced September 2020.
-
Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging
Authors:
Chun-Chieh Chang,
Chieh-Chi Kao,
Ming Sun,
Chao Wang
Abstract:
Knowledge Distillation (KD) is a popular area of research for reducing the size of large models while still maintaining good performance. The outputs of larger teacher models are used to guide the training of smaller student models. Given the repetitive nature of acoustic events, we propose to leverage this information to regulate the KD training for Audio Tagging. This novel KD method, "Intra-Utt…
▽ More
Knowledge Distillation (KD) is a popular area of research for reducing the size of large models while still maintaining good performance. The outputs of larger teacher models are used to guide the training of smaller student models. Given the repetitive nature of acoustic events, we propose to leverage this information to regulate the KD training for Audio Tagging. This novel KD method, "Intra-Utterance Similarity Preserving KD" (IUSP), shows promising results for the audio tagging task. It is motivated by the previously published KD method: "Similarity Preserving KD" (SP). However, instead of preserving the pairwise similarities between inputs within a mini-batch, our method preserves the pairwise similarities between the frames of a single input utterance. Our proposed KD method, IUSP, shows consistent improvements over SP across student models of different sizes on the DCASE 2019 Task 5 dataset for audio tagging. There is a 27.1% to 122.4% percent increase in improvement of micro AUPRC over the baseline relative to SP's improvement of over the baseline.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Task-Projected Hyperdimensional Computing for Multi-Task Learning
Authors:
Cheng-Yang Chang,
Yu-Chuan Chuang,
An-Yeu Wu
Abstract:
Brain-inspired Hyperdimensional (HD) computing is an emerging technique for cognitive tasks in the field of low-power design. As a fast-learning and energy-efficient computational paradigm, HD computing has shown great success in many real-world applications. However, an HD model incrementally trained on multiple tasks suffers from the negative impacts of catastrophic forgetting. The model forgets…
▽ More
Brain-inspired Hyperdimensional (HD) computing is an emerging technique for cognitive tasks in the field of low-power design. As a fast-learning and energy-efficient computational paradigm, HD computing has shown great success in many real-world applications. However, an HD model incrementally trained on multiple tasks suffers from the negative impacts of catastrophic forgetting. The model forgets the knowledge learned from previous tasks and only focuses on the current one. To the best of our knowledge, no study has been conducted to investigate the feasibility of applying multi-task learning to HD computing. In this paper, we propose Task-Projected Hyperdimensional Computing (TP-HDC) to make the HD model simultaneously support multiple tasks by exploiting the redundant dimensionality in the hyperspace. To mitigate the interferences between different tasks, we project each task into a separate subspace for learning. Compared with the baseline method, our approach efficiently utilizes the unused capacity in the hyperspace and shows a 12.8% improvement in averaged accuracy with negligible memory overhead.
△ Less
Submitted 29 April, 2020;
originally announced April 2020.
-
Co-simulation Platform for Develo** InfoRich Energy-Efficient Connected and Automated Vehicles
Authors:
Shunsuke Aoki,
Lung En Jan,
Junfeng Zhao,
Anand Bhat,
Ragunathan,
Rajkumar,
Chen-Fang Chang
Abstract:
With advances in sensing, computing, and communication technologies, Connected and Automated Vehicles (CAVs) are becoming feasible. The advent of CAVs presents new opportunities to improve the energy efficiency of individual vehicles. However, testing and verifying energy-efficient autonomous driving systems are difficult due to safety considerations and repeatability. In this paper, we present a…
▽ More
With advances in sensing, computing, and communication technologies, Connected and Automated Vehicles (CAVs) are becoming feasible. The advent of CAVs presents new opportunities to improve the energy efficiency of individual vehicles. However, testing and verifying energy-efficient autonomous driving systems are difficult due to safety considerations and repeatability. In this paper, we present a co-simulation platform to develop and test novel vehicle eco-autonomous driving technologies named InfoRich, which incorporates the information from on-board sensors, V2X communications, and map database. The co-simulation platform includes eco-autonomous driving software, vehicle dynamics and powertrain (VD&PT) model, and a traffic environment simulator. Also, we utilize synthetic drive cycles derived from real-world driving data to test the strategies under realistic driving scenarios. To build road networks from the real-world driving data, we develop an Automated Parser and Calculator for Map/Scenario named AutoPASCAL. Overall, the simulation platform provides a realistic vehicle model, powertrain model, sensor model, traffic model, and road-network model to enable the evaluation of the energy efficiency of eco-autonomous driving.
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
On the Computational Viability of Quantum Optimization for PMU Placement
Authors:
Eric B. Jones,
Eliot Kapit,
Chin-Yao Chang,
David Biagioni,
Deepthi Vaidhynathan,
Peter Graf,
Wesley Jones
Abstract:
Using optimal phasor measurement unit placement as a prototypical problem, we assess the computational viability of the current generation D-Wave Systems 2000Q quantum annealer for power systems design problems. We reformulate minimum dominating set for the annealer hardware, solve the reformulation for a standard set of IEEE test systems, and benchmark solution quality and time to solution agains…
▽ More
Using optimal phasor measurement unit placement as a prototypical problem, we assess the computational viability of the current generation D-Wave Systems 2000Q quantum annealer for power systems design problems. We reformulate minimum dominating set for the annealer hardware, solve the reformulation for a standard set of IEEE test systems, and benchmark solution quality and time to solution against the CPLEX Optimizer and simulated annealing. For some problem instances the 2000Q outpaces CPLEX. For instances where the 2000Q underperforms with respect to CPLEX and simulated annealing, we suggest hardware improvements for the next generation of quantum annealers.
△ Less
Submitted 13 January, 2020;
originally announced January 2020.
-
Joint Beamforming and Computation Offloading for Multi-user Mobile-Edge Computing
Authors:
Changfeng Ding,
Jun-Bo Wang,
Ming Cheng,
Chuanwen Chang,
**-Yuan Wang,
Min Lin
Abstract:
Mobile edge computing (MEC) is considered as an efficient method to relieve the computation burden of mobile devices. In order to reduce the energy consumption and time delay of mobile devices (MDs) in MEC, multiple users multiple input and multiple output (MU-MIMO) communications is considered to be applied to the MEC system. The purpose of this paper is to minimize the weighted sum of energy con…
▽ More
Mobile edge computing (MEC) is considered as an efficient method to relieve the computation burden of mobile devices. In order to reduce the energy consumption and time delay of mobile devices (MDs) in MEC, multiple users multiple input and multiple output (MU-MIMO) communications is considered to be applied to the MEC system. The purpose of this paper is to minimize the weighted sum of energy consumption and time delay of MDs by jointly considering the offloading decision and MU-MIMO beamforming problems. And the resulting optimization problem is a mixed-integer non-linear programming problem, which is NP-hard. To solve the optimization problem, a semidefinite relaxation based algorithm is proposed to solve the offloading decision problem. Then, the MU-MIMO beamforming design problem is handled with a newly proposed fractional programming method. Simulation results show that the proposed algorithms can effectively reduce the energy consumption and time delay of the computation offloading.
△ Less
Submitted 4 January, 2020;
originally announced January 2020.
-
A Reinforcement Learning Approach for the Multichannel Rendezvous Problem
Authors:
Jen-Hung Wang,
**-En Lu,
Cheng-Shang Chang,
Duan-Shin Lee
Abstract:
In this paper, we consider the multichannel rendezvous problem in cognitive radio networks (CRNs) where the probability that two users hop** on the same channel have a successful rendezvous is a function of channel states. The channel states are modelled by two-state Markov chains that have a good state and a bad state. These channel states are not observable by the users. For such a multichanne…
▽ More
In this paper, we consider the multichannel rendezvous problem in cognitive radio networks (CRNs) where the probability that two users hop** on the same channel have a successful rendezvous is a function of channel states. The channel states are modelled by two-state Markov chains that have a good state and a bad state. These channel states are not observable by the users. For such a multichannel rendezvous problem, we are interested in finding the optimal policy to minimize the expected time-to-rendezvous (ETTR) among the class of {\em dynamic blind rendezvous policies}, i.e., at the $t^{th}$ time slot each user selects channel $i$ independently with probability $p_i(t)$, $i=1,2, \ldots, N$. By formulating such a multichannel rendezvous problem as an adversarial bandit problem, we propose using a reinforcement learning approach to learn the channel selection probabilities $p_i(t)$, $i=1,2, \ldots, N$. Our experimental results show that the reinforcement learning approach is very effective and yields comparable ETTRs when comparing to various approximation policies in the literature.
△ Less
Submitted 5 July, 2019; v1 submitted 2 July, 2019;
originally announced July 2019.
-
Learned Image Compression with Soft Bit-based Rate-Distortion Optimization
Authors:
David Alexandre,
Chih-Peng Chang,
Wen-Hsiao Peng,
Hsueh-Ming Hang
Abstract:
This paper introduces the notion of soft bits to address the rate-distortion optimization for learning-based image compression. Recent methods for such compression train an autoencoder end-to-end with an objective to strike a balance between distortion and rate. They are faced with the zero gradient issue due to quantization and the difficulty of estimating the rate accurately. Inspired by soft qu…
▽ More
This paper introduces the notion of soft bits to address the rate-distortion optimization for learning-based image compression. Recent methods for such compression train an autoencoder end-to-end with an objective to strike a balance between distortion and rate. They are faced with the zero gradient issue due to quantization and the difficulty of estimating the rate accurately. Inspired by soft quantization, we represent quantization indices of feature maps with differentiable soft bits. This allows us to couple tightly the rate estimation with context-adaptive binary arithmetic coding. It also provides a differentiable distortion objective function. Experimental results show that our approach achieves the state-of-the-art compression performance among the learning-based schemes in terms of MS-SSIM and PSNR.
△ Less
Submitted 1 May, 2019;
originally announced May 2019.
-
Receiver Operating Characteristics for a Prototype Quantum Two-Mode Squeezing Radar
Authors:
David Luong,
C. W. Sandbo Chang,
A. M. Vadiraj,
Anthony Damini,
C. M. Wilson,
Bhashyam Balaji
Abstract:
We have built and evaluated a prototype quantum radar, which we call a quantum two-mode squeezing radar (QTMS radar), in the laboratory. It operates solely at microwave frequencies; there is no downconversion from optical frequencies. Because the signal generation process relies on quantum mechanical principles, the system is considered to contain a quantum-enhanced radar transmitter. This transmi…
▽ More
We have built and evaluated a prototype quantum radar, which we call a quantum two-mode squeezing radar (QTMS radar), in the laboratory. It operates solely at microwave frequencies; there is no downconversion from optical frequencies. Because the signal generation process relies on quantum mechanical principles, the system is considered to contain a quantum-enhanced radar transmitter. This transmitter generates a pair of entangled microwave signals and transmits one of them through free space, where the signal is measured using a simple and rudimentary receiver.
At the heart of the transmitter is a device called a Josephson parametric amplifier (JPA), which generates a pair of entangled signals called two-mode squeezed vacuum (TMSV) at 6.1445 GHz and 7.5376 GHz. These are then sent through a chain of amplifiers. The 7.5376 GHz beam passes through 0.5 m of free space; the 6.1445 GHz signal is measured directly after amplification. The two measurement results are correlated in order to distinguish signal from noise.
We compare our QTMS radar to a classical radar setup using conventional components, which we call a two-mode noise radar (TMN radar), and find that there is a significant gain when both systems broadcast signals at -82 dBm. This is shown via a comparison of receiver operator characteristic (ROC) curves. In particular, we find that the quantum radar requires 8 times fewer integrated samples compared to its classical counterpart to achieve the same performance.
△ Less
Submitted 28 February, 2019;
originally announced March 2019.
-
Play as You Like: Timbre-enhanced Multi-modal Music Style Transfer
Authors:
Chien-Yu Lu,
Min-Xin Xue,
Chia-Che Chang,
Che-Rung Lee,
Li Su
Abstract:
Style transfer of polyphonic music recordings is a challenging task when considering the modeling of diverse, imaginative, and reasonable music pieces in the style different from their original one. To achieve this, learning stable multi-modal representations for both domain-variant (i.e., style) and domain-invariant (i.e., content) information of music in an unsupervised manner is critical. In th…
▽ More
Style transfer of polyphonic music recordings is a challenging task when considering the modeling of diverse, imaginative, and reasonable music pieces in the style different from their original one. To achieve this, learning stable multi-modal representations for both domain-variant (i.e., style) and domain-invariant (i.e., content) information of music in an unsupervised manner is critical. In this paper, we propose an unsupervised music style transfer method without the need for parallel data. Besides, to characterize the multi-modal distribution of music pieces, we employ the Multi-modal Unsupervised Image-to-Image Translation (MUNIT) framework in the proposed system. This allows one to generate diverse outputs from the learned latent distributions representing contents and styles. Moreover, to better capture the granularity of sound, such as the perceptual dimensions of timbre and the nuance in instrument-specific performance, cognitively plausible features including mel-frequency cepstral coefficients (MFCC), spectral difference, and spectral envelope, are combined with the widely-used mel-spectrogram into a timber-enhanced multi-channel input representation. The Relativistic average Generative Adversarial Networks (RaGAN) is also utilized to achieve fast convergence and high stability. We conduct experiments on bilateral style transfer tasks among three different genres, namely piano solo, guitar solo, and string quartet. Results demonstrate the advantages of the proposed method in music style transfer with improved sound quality and in allowing users to manipulate the output.
△ Less
Submitted 28 November, 2018;
originally announced November 2018.