Search | arXiv e-print repository

A Distributed Model Identification Algorithm for Multi-Agent Systems

Authors: Vivek Khatana, Chin-Yao Chang, Wenbo Wang

Abstract: In this study, we investigate agent-based approach for system model identification with an emphasis on power distribution system applications. Departing from conventional practices of relying on historical data for offline model identification, we adopt an online update approach utilizing real-time data by employing the latest data points for gradient computation. This methodology offers advantage… ▽ More In this study, we investigate agent-based approach for system model identification with an emphasis on power distribution system applications. Departing from conventional practices of relying on historical data for offline model identification, we adopt an online update approach utilizing real-time data by employing the latest data points for gradient computation. This methodology offers advantages including a large reduction in the communication network's bandwidth requirements by minimizing the data exchanged at each iteration and enabling the model to adapt in real-time to disturbances. Furthermore, we extend our model identification process from linear frameworks to more complex non-linear convex models. This extension is validated through numerical studies demonstrating improved control performance for a synthetic IEEE test case. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 6 pages, 4 figures

arXiv:2402.09846 [pdf]

doi 10.1029/2020EA001340

A Deep Learning Approach to Radar-based QPE

Authors: Ting-Shuo Yo, Shih-Hao Su, Jung-Lien Chu, Chiao-Wei Chang, Hung-Chi Kuo

Abstract: In this study, we propose a volume-to-point framework for quantitative precipitation estimation (QPE) based on the Quantitative Precipitation Estimation and Segregation Using Multiple Sensor (QPESUMS) Mosaic Radar data set. With a data volume consisting of the time series of gridded radar reflectivities over the Taiwan area, we used machine learning algorithms to establish a statistical model for… ▽ More In this study, we propose a volume-to-point framework for quantitative precipitation estimation (QPE) based on the Quantitative Precipitation Estimation and Segregation Using Multiple Sensor (QPESUMS) Mosaic Radar data set. With a data volume consisting of the time series of gridded radar reflectivities over the Taiwan area, we used machine learning algorithms to establish a statistical model for QPE in weather stations. The model extracts spatial and temporal features from the input data volume and then associates these features with the location-specific precipitations. In contrast to QPE methods based on the Z-R relation, we leverage the machine learning algorithms to automatically detect the evolution and movement of weather systems and associate these patterns to a location with specific topographic attributes. Specifically, we evaluated this framework with the hourly precipitation data of 45 weather stations in Taipei during 2013-2016. In comparison to the operational QPE scheme used by the Central Weather Bureau, the volume-to-point framework performed comparably well in general cases and excelled in detecting heavy-rainfall events. By using the current results as the reference benchmark, the proposed method can integrate the heterogeneous data sources and potentially improve the forecast in extreme precipitation scenarios. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 22 pages, 11 figures. Published in Earth and Space Science

Journal ref: Earth Space Sci. 2021, 8, e2020EA001340

arXiv:2401.11445 [pdf, other]

Towards Non-Robocentric Dynamic Landing of Quadrotor UAVs

Authors: Li-Yu Lo, Boyang Li, Chih-Yung Wen, Ching-Wei Chang

Abstract: In this work, we propose a dynamic landing solution without the need for onboard exteroceptive sensors and an expensive computation unit, where all localization and control modules are carried out on the ground in a non-inertial frame. Our system starts with a relative state estimator of the aerial robot from the perspective of the landing platform, where the state tracking of the UAV is done thro… ▽ More In this work, we propose a dynamic landing solution without the need for onboard exteroceptive sensors and an expensive computation unit, where all localization and control modules are carried out on the ground in a non-inertial frame. Our system starts with a relative state estimator of the aerial robot from the perspective of the landing platform, where the state tracking of the UAV is done through a set of onboard LED markers and an on-ground camera; the state is expressed geometrically on manifold, and is returned by Iterated Extended Kalman filter (IEKF) algorithm. Subsequently, a motion planning module is developed to guide the landing process, formulating it as a minimum jerk trajectory by applying the differential flatness property. Considering visibility and dynamic constraints, the problem is solved using quadratic programming, and the final motion primitive is expressed through piecewise polynomials. Through a series of experiments, the applicability of this approach is validated by successfully landing 18 cm x 18 cm quadrotor on a 43 cm x 43 cm platform, exhibiting performance comparable to conventional methods. Finally, we provide comprehensive hardware and software details to the research community for future reference. △ Less

Submitted 21 January, 2024; originally announced January 2024.

arXiv:2312.17156 [pdf, other]

BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer

Authors: Chih-Cheng Chang, Li Su

Abstract: Many deep learning models have achieved dominant performance on the offline beat tracking task. However, online beat tracking, in which only the past and present input features are available, still remains challenging. In this paper, we propose BEAt tracking Streaming Transformer (BEAST), an online joint beat and downbeat tracking system based on the streaming Transformer. To deal with online scen… ▽ More Many deep learning models have achieved dominant performance on the offline beat tracking task. However, online beat tracking, in which only the past and present input features are available, still remains challenging. In this paper, we propose BEAt tracking Streaming Transformer (BEAST), an online joint beat and downbeat tracking system based on the streaming Transformer. To deal with online scenarios, BEAST applies contextual block processing in the Transformer encoder. Moreover, we adopt relative positional encoding in the attention layer of the streaming Transformer encoder to capture relative timing position which is critically important information in music. Carrying out beat and downbeat experiments on benchmark datasets for a low latency scenario with maximum latency under 50 ms, BEAST achieves an F1-measure of 80.04% in beat and 46.78% in downbeat, which is a substantial improvement of about 5 percentage points over the state-of-the-art online beat tracking model. △ Less

Submitted 23 April, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP 2024

arXiv:2312.14453 [pdf, other]

Hybrid Aerodynamics-Based Model Predictive Control for a Tail-Sitter UAV

Authors: Bailun Jiang, Boyang Li, Ching-Wei Chang, Chih-Yung Wen

Abstract: It is challenging to model and control a tail-sitter unmanned aerial vehicle (UAV) because its blended wing body generates complicated nonlinear aerodynamic effects, such as wing lift, fuselage drag, and propeller-wing interactions. We therefore devised a hybrid aerodynamic modeling method and model predictive control (MPC) design for a quadrotor tail-sitter UAV. The hybrid model consists of the N… ▽ More It is challenging to model and control a tail-sitter unmanned aerial vehicle (UAV) because its blended wing body generates complicated nonlinear aerodynamic effects, such as wing lift, fuselage drag, and propeller-wing interactions. We therefore devised a hybrid aerodynamic modeling method and model predictive control (MPC) design for a quadrotor tail-sitter UAV. The hybrid model consists of the Newton-Euler equation, which describes quadrotor dynamics, and a feedforward neural network, which learns residual aerodynamic effects. This hybrid model exhibits high predictive accuracy at a low computational cost and was used to implement hybrid MPC, which optimizes the throttle, pitch angle, and roll angle for position tracking. The controller performance was validated in real-world experiments, which obtained a 57% tracking error reduction compared with conventional nonlinear MPC. External wind disturbance was also introduced and the experimental results confirmed the robustness of the controller to these conditions. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.10483 [pdf, other]

doi 10.1109/BioCAS54905.2022.9948588

All Attention U-NET for Semantic Segmentation of Intracranial Hemorrhages In Head CT Images

Authors: Chia Shuo Chang, Tian Sheuan Chang, Jiun Lin Yan, Li Ko

Abstract: Intracranial hemorrhages in head CT scans serve as a first line tool to help specialists diagnose different types. However, their types have diverse shapes in the same type but similar confusing shape, size and location between types. To solve this problem, this paper proposes an all attention U-Net. It uses channel attentions in the U-Net encoder side to enhance class specific feature extraction,… ▽ More Intracranial hemorrhages in head CT scans serve as a first line tool to help specialists diagnose different types. However, their types have diverse shapes in the same type but similar confusing shape, size and location between types. To solve this problem, this paper proposes an all attention U-Net. It uses channel attentions in the U-Net encoder side to enhance class specific feature extraction, and space and channel attentions in the U-Net decoder side for more accurate shape extraction and type classification. The simulation results show up to a 31.8\% improvement compared to baseline, ResNet50 + U-Net, and better performance than in cases with limited attention. △ Less

Submitted 16 December, 2023; originally announced December 2023.

Comments: 2022 IEEE Biomedical Circuits and Systems Conference (BioCAS)

arXiv:2311.12666 [pdf, other]

SSVEP-DAN: A Data Alignment Network for SSVEP-based Brain Computer Interfaces

Authors: Sung-Yu Chen, Chi-Min Chang, Kuan-Jung Chiang, Chun-Shu Wei

Abstract: Steady-state visual-evoked potential (SSVEP)-based brain-computer interfaces (BCIs) offer a non-invasive means of communication through high-speed speller systems. However, their efficiency heavily relies on individual training data obtained during time-consuming calibration sessions. To address the challenge of data insufficiency in SSVEP-based BCIs, we present SSVEP-DAN, the first dedicated neur… ▽ More Steady-state visual-evoked potential (SSVEP)-based brain-computer interfaces (BCIs) offer a non-invasive means of communication through high-speed speller systems. However, their efficiency heavily relies on individual training data obtained during time-consuming calibration sessions. To address the challenge of data insufficiency in SSVEP-based BCIs, we present SSVEP-DAN, the first dedicated neural network model designed for aligning SSVEP data across different domains, which can encompass various sessions, subjects, or devices. Our experimental results across multiple cross-domain scenarios demonstrate SSVEP-DAN's capability to transform existing source SSVEP data into supplementary calibration data, significantly enhancing SSVEP decoding accuracy in scenarios with limited calibration data. We envision SSVEP-DAN as a catalyst for practical SSVEP-based BCI applications with minimal calibration. The source codes in this work are available at: https://github.com/CECNL/SSVEP-DAN. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.10641 [pdf]

Image-Domain Material Decomposition for Dual-energy CT using Unsupervised Learning with Data-fidelity Loss

Authors: Junbo Peng, Chih-Wei Chang, Huiqiao Xie, Richard L. J. Qiu, Justin Roper, Tonghe Wang, Beth Bradshaw, Xiangyang Tang, Xiaofeng Yang

Abstract: Background: Dual-energy CT (DECT) and material decomposition play vital roles in quantitative medical imaging. However, the decomposition process may suffer from significant noise amplification, leading to severely degraded image signal-to-noise ratios (SNRs). While existing iterative algorithms perform noise suppression using different image priors, these heuristic image priors cannot accurately… ▽ More Background: Dual-energy CT (DECT) and material decomposition play vital roles in quantitative medical imaging. However, the decomposition process may suffer from significant noise amplification, leading to severely degraded image signal-to-noise ratios (SNRs). While existing iterative algorithms perform noise suppression using different image priors, these heuristic image priors cannot accurately represent the features of the target image manifold. Although deep learning-based decomposition methods have been reported, these methods are in the supervised-learning framework requiring paired data for training, which is not readily available in clinical settings. Purpose: This work aims to develop an unsupervised-learning framework with data-measurement consistency for image-domain material decomposition in DECT. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.04241 [pdf, ps, other]

AI-Enabled Unmanned Vehicle-Assisted Reconfigurable Intelligent Surfaces: Deployment, Prototy**, Experiments, and Opportunities

Authors: Li-Hsiang Shen, Kai-Ten Feng, Ta-Sung Lee, Yuan-Chun Lin, Shih-Cheng Lin, Chia-Chan Chang, Sheng-Fuh Chang

Abstract: The requirement of wireless data demands is increasingly high as the sixth-generation (6G) technology evolves. Reconfigurable intelligent surface (RIS) is promisingly deemed to be one of 6G techniques for extending service coverage, reducing power consumption, and enhancing spectral efficiency. In this article, we have provided some fundamentals of RIS deployment in theory and hardware perspective… ▽ More The requirement of wireless data demands is increasingly high as the sixth-generation (6G) technology evolves. Reconfigurable intelligent surface (RIS) is promisingly deemed to be one of 6G techniques for extending service coverage, reducing power consumption, and enhancing spectral efficiency. In this article, we have provided some fundamentals of RIS deployment in theory and hardware perspectives as well as utilization of artificial intelligence (AI) and machine learning. We conducted an intelligent deployment of RIS (i-Dris) prototype, including dual-band auto-guided vehicle (AGV) assisted RISs associated with an mmWave base station (BS) and a receiver. The RISs are deployed on the AGV with configured incident/reflection angles. While, both the mmWave BS and receiver are associated with an edge server monitoring downlink packets for obtaining system throughput. We have designed a federated multi-agent reinforcement learning scheme associated with several AGV-RIS agents and sub-agents per AGV-RIS consisting of the deployment of position, height, orientation and elevation angles. The experimental results presented the stationary measurement in different aspects and scenarios. The i-Dris can reach up to 980 Mbps transmission throughput under a bandwidth of 100 MHz with comparably low complexity as well as rapid deployment, which outperforms the other existing works. At last, we highlight some opportunities and future issues in leveraging RIS-empowered wireless communication networks. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.04234 [pdf]

Leveraging sinusoidal representation networks to predict fMRI signals from EEG

Authors: Yamin Li, Ange Lou, Ziyuan Xu, Shiyu Wang, Catie Chang

Abstract: In modern neuroscience, functional magnetic resonance imaging (fMRI) has been a crucial and irreplaceable tool that provides a non-invasive window into the dynamics of whole-brain activity. Nevertheless, fMRI is limited by hemodynamic blurring as well as high cost, immobility, and incompatibility with metal implants. Electroencephalography (EEG) is complementary to fMRI and can directly record the… ▽ More In modern neuroscience, functional magnetic resonance imaging (fMRI) has been a crucial and irreplaceable tool that provides a non-invasive window into the dynamics of whole-brain activity. Nevertheless, fMRI is limited by hemodynamic blurring as well as high cost, immobility, and incompatibility with metal implants. Electroencephalography (EEG) is complementary to fMRI and can directly record the cortical electrical activity at high temporal resolution, but has more limited spatial resolution and is unable to recover information about deep subcortical brain structures. The ability to obtain fMRI information from EEG would enable cost-effective, imaging across a wider set of brain regions. Further, beyond augmenting the capabilities of EEG, cross-modality models would facilitate the interpretation of fMRI signals. However, as both EEG and fMRI are high-dimensional and prone to artifacts, it is currently challenging to model fMRI from EEG. To address this challenge, we propose a novel architecture that can predict fMRI signals directly from multi-channel EEG without explicit feature engineering. Our model achieves this by implementing a Sinusoidal Representation Network (SIREN) to learn frequency information in brain dynamics from EEG, which serves as the input to a subsequent encoder-decoder to effectively reconstruct the fMRI signal from a specific brain region. We evaluate our model using a simultaneous EEG-fMRI dataset with 8 subjects and investigate its potential for predicting subcortical fMRI signals. The present results reveal that our model outperforms a recent state-of-the-art model, and indicates the potential of leveraging periodic activation functions in deep neural networks to model functional neuroimaging data. △ Less

Submitted 24 January, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

arXiv:2308.13072 [pdf]

Full-dose Whole-body PET Synthesis from Low-dose PET Using High-efficiency Denoising Diffusion Probabilistic Model: PET Consistency Model

Authors: Shaoyan Pan, Elham Abouei, Junbo Peng, Joshua Qian, Jacob F Wynne, Tonghe Wang, Chih-Wei Chang, Justin Roper, Jonathon A Nye, Hui Mao, Xiaofeng Yang

Abstract: Objective: Positron Emission Tomography (PET) has been a commonly used imaging modality in broad clinical applications. One of the most important tradeoffs in PET imaging is between image quality and radiation dose: high image quality comes with high radiation exposure. Improving image quality is desirable for all clinical applications while minimizing radiation exposure is needed to reduce risk t… ▽ More Objective: Positron Emission Tomography (PET) has been a commonly used imaging modality in broad clinical applications. One of the most important tradeoffs in PET imaging is between image quality and radiation dose: high image quality comes with high radiation exposure. Improving image quality is desirable for all clinical applications while minimizing radiation exposure is needed to reduce risk to patients. Approach: We introduce PET Consistency Model (PET-CM), an efficient diffusion-based method for generating high-quality full-dose PET images from low-dose PET images. It employs a two-step process, adding Gaussian noise to full-dose PET images in the forward diffusion, and then denoising them using a PET Shifted-window Vision Transformer (PET-VIT) network in the reverse diffusion. The PET-VIT network learns a consistency function that enables direct denoising of Gaussian noise into clean full-dose PET images. PET-CM achieves state-of-the-art image quality while requiring significantly less computation time than other methods. Results: In experiments comparing eighth-dose to full-dose images, PET-CM demonstrated impressive performance with NMAE of 1.278+/-0.122%, PSNR of 33.783+/-0.824dB, SSIM of 0.964+/-0.009, NCC of 0.968+/-0.011, HRS of 4.543, and SUV Error of 0.255+/-0.318%, with an average generation time of 62 seconds per patient. This is a significant improvement compared to the state-of-the-art diffusion-based model with PET-CM reaching this result 12x faster. Similarly, in the quarter-dose to full-dose image experiments, PET-CM delivered competitive outcomes, achieving an NMAE of 0.973+/-0.066%, PSNR of 36.172+/-0.801dB, SSIM of 0.984+/-0.004, NCC of 0.990+/-0.005, HRS of 4.428, and SUV Error of 0.151+/-0.192% using the same generation process, which underlining its high quantitative and clinical precision in both denoising scenario. △ Less

Submitted 16 April, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

arXiv:2307.07650 [pdf, ps, other]

SALC: Skeleton-Assisted Learning-Based Clustering for Time-Varying Indoor Localization

Authors: An-Hung Hsiao, Li-Hsiang Shen, Chen-Yi Chang, Chun-Jie Chiu, Kai-Ten Feng

Abstract: Wireless indoor localization has attracted significant amount of attention in recent years. Using received signal strength (RSS) obtained from WiFi access points (APs) for establishing fingerprinting database is a widely utilized method in indoor localization. However, the time-variant problem for indoor positioning systems is not well-investigated in existing literature. Compared to conventional… ▽ More Wireless indoor localization has attracted significant amount of attention in recent years. Using received signal strength (RSS) obtained from WiFi access points (APs) for establishing fingerprinting database is a widely utilized method in indoor localization. However, the time-variant problem for indoor positioning systems is not well-investigated in existing literature. Compared to conventional static fingerprinting, the dynamicallyreconstructed database can adapt to a highly-changing environment, which achieves sustainability of localization accuracy. To deal with the time-varying issue, we propose a skeleton-assisted learning-based clustering localization (SALC) system, including RSS-oriented map-assisted clustering (ROMAC), cluster-based online database establishment (CODE), and cluster-scaled location estimation (CsLE). The SALC scheme jointly considers similarities from the skeleton-based shortest path (SSP) and the time-varying RSS measurements across the reference points (RPs). ROMAC clusters RPs into different feature sets and therefore selects suitable monitor points (MPs) for enhancing location estimation. Moreover, the CODE algorithm aims for establishing adaptive fingerprint database to alleviate the timevarying problem. Finally, CsLE is adopted to acquire the target position by leveraging the benefits of clustering information and estimated signal variations in order to rescale the weights fromweighted k-nearest neighbors (WkNN) method. Both simulation and experimental results demonstrate that the proposed SALC system can effectively reconstruct the fingerprint database with an enhanced location estimation accuracy, which outperforms the other existing schemes in the open literature. △ Less

Submitted 14 July, 2023; originally announced July 2023.

arXiv:2306.15808 [pdf, other]

Classification of Infant Sleep/Wake States: Cross-Attention among Large Scale Pretrained Transformer Networks using Audio, ECG, and IMU Data

Authors: Kai Chieh Chang, Mark Hasegawa-Johnson, Nancy L. McElwain, Bashima Islam

Abstract: Infant sleep is critical to brain and behavioral development. Prior studies on infant sleep/wake classification have been largely limited to reliance on expensive and burdensome polysomnography (PSG) tests in the laboratory or wearable devices that collect single-modality data. To facilitate data collection and accuracy of detection, we aimed to advance this field of study by using a multi-modal w… ▽ More Infant sleep is critical to brain and behavioral development. Prior studies on infant sleep/wake classification have been largely limited to reliance on expensive and burdensome polysomnography (PSG) tests in the laboratory or wearable devices that collect single-modality data. To facilitate data collection and accuracy of detection, we aimed to advance this field of study by using a multi-modal wearable device, LittleBeats (LB), to collect audio, electrocardiogram (ECG), and inertial measurement unit (IMU) data among a cohort of 28 infants. We employed a 3-branch (audio/ECG/IMU) large scale transformer-based neural network (NN) to demonstrate the potential of such multi-modal data. We pretrained each branch independently with its respective modality, then finetuned the model by fusing the pretrained transformer layers with cross-attention. We show that multi-modal data significantly improves sleep/wake classification (accuracy = 0.880), compared with use of a single modality (accuracy = 0.732). Our approach to multi-modal mid-level fusion may be adaptable to a diverse range of architectures and tasks, expanding future directions of infant behavioral research. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: Preprint for APSIPA2023

arXiv:2306.06982 [pdf]

Weakly Supervised Lesion Detection and Diagnosis for Breast Cancers with Partially Annotated Ultrasound Images

Authors: Jian Wang, Liang Qiao, Shichong Zhou, ** Zhou, Jun Wang, Juncheng Li, Shihui Ying, Cai Chang, Jun Shi

Abstract: Deep learning (DL) has proven highly effective for ultrasound-based computer-aided diagnosis (CAD) of breast cancers. In an automaticCAD system, lesion detection is critical for the following diagnosis. However, existing DL-based methods generally require voluminous manually-annotated region of interest (ROI) labels and class labels to train both the lesion detection and diagnosis models. In clini… ▽ More Deep learning (DL) has proven highly effective for ultrasound-based computer-aided diagnosis (CAD) of breast cancers. In an automaticCAD system, lesion detection is critical for the following diagnosis. However, existing DL-based methods generally require voluminous manually-annotated region of interest (ROI) labels and class labels to train both the lesion detection and diagnosis models. In clinical practice, the ROI labels, i.e. ground truths, may not always be optimal for the classification task due to individual experience of sonologists, resulting in the issue of coarse annotation that limits the diagnosis performance of a CAD model. To address this issue, a novel Two-Stage Detection and Diagnosis Network (TSDDNet) is proposed based on weakly supervised learning to enhance diagnostic accuracy of the ultrasound-based CAD for breast cancers. In particular, all the ROI-level labels are considered as coarse labels in the first training stage, and then a candidate selection mechanism is designed to identify optimallesion areas for both the fully and partially annotated samples. It refines the current ROI-level labels in the fully annotated images and the detected ROIs in the partially annotated samples with a weakly supervised manner under the guidance of class labels. In the second training stage, a self-distillation strategy further is further proposed to integrate the detection network and classification network into a unified framework as the final CAD model for joint optimization, which then further improves the diagnosis performance. The proposed TSDDNet is evaluated on a B-mode ultrasound dataset, and the experimental results show that it achieves the best performance on both lesion detection and diagnosis tasks, suggesting promising application potential. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2306.01952 [pdf, other]

Learning in Domain Randomization via Continuous Time Non-Stochastic Control

Authors: **gwei Li, **g Dong, Can Chang, Baoxiang Wang, **gzhao Zhang

Abstract: Domain randomization is a popular method for robustly training agents to adapt to diverse environments and real-world tasks. In this paper, we examine how to train an agent in domain randomization environments from a nonstochastic control perspective. We first theoretically study online control of continuous-time linear systems under nonstochastic noises. We present a novel two-level online algori… ▽ More Domain randomization is a popular method for robustly training agents to adapt to diverse environments and real-world tasks. In this paper, we examine how to train an agent in domain randomization environments from a nonstochastic control perspective. We first theoretically study online control of continuous-time linear systems under nonstochastic noises. We present a novel two-level online algorithm, by integrating a higher-level learning strategy and a lower-level feedback control strategy. This method offers a practical solution, and for the first time achieves sublinear regret in continuous-time nonstochastic systems. Compared to standard online learning algorithms, our algorithm features a stack and skip procedure. By applying stack and skip to the SAC (Soft Actor-Critic) algorithm, we achieved improved results in multiple reinforcement learning tasks within domain randomization environments. Our work provides new insights into nonasymptotic analyses of controlling continuous-time systems. Further, our work justifies the importance of stacked and skipped in controller learning under nonstochastic environments. △ Less

Submitted 14 December, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

arXiv:2305.19467 [pdf]

Synthetic CT Generation from MRI using 3D Transformer-based Denoising Diffusion Model

Authors: Shaoyan Pan, Elham Abouei, Jacob Wynne, Tonghe Wang, Richard L. J. Qiu, Yuheng Li, Chih-Wei Chang, Junbo Peng, Justin Roper, Pretesh Patel, David S. Yu, Hui Mao, Xiaofeng Yang

Abstract: Magnetic resonance imaging (MRI)-based synthetic computed tomography (sCT) simplifies radiation therapy treatment planning by eliminating the need for CT simulation and error-prone image registration, ultimately reducing patient radiation dose and setup uncertainty. We propose an MRI-to-CT transformer-based denoising diffusion probabilistic model (MC-DDPM) to transform MRI into high-quality sCT to… ▽ More Magnetic resonance imaging (MRI)-based synthetic computed tomography (sCT) simplifies radiation therapy treatment planning by eliminating the need for CT simulation and error-prone image registration, ultimately reducing patient radiation dose and setup uncertainty. We propose an MRI-to-CT transformer-based denoising diffusion probabilistic model (MC-DDPM) to transform MRI into high-quality sCT to facilitate radiation treatment planning. MC-DDPM implements diffusion processes with a shifted-window transformer network to generate sCT from MRI. The proposed model consists of two processes: a forward process which adds Gaussian noise to real CT scans, and a reverse process in which a shifted-window transformer V-net (Swin-Vnet) denoises the noisy CT scans conditioned on the MRI from the same patient to produce noise-free CT scans. With an optimally trained Swin-Vnet, the reverse diffusion process was used to generate sCT scans matching MRI anatomy. We evaluated the proposed method by generating sCT from MRI on a brain dataset and a prostate dataset. Qualitative evaluation was performed using the mean absolute error (MAE) of Hounsfield unit (HU), peak signal to noise ratio (PSNR), multi-scale Structure Similarity index (MS-SSIM) and normalized cross correlation (NCC) indexes between ground truth CTs and sCTs. MC-DDPM generated brain sCTs with state-of-the-art quantitative results with MAE 43.317 HU, PSNR 27.046 dB, SSIM 0.965, and NCC 0.983. For the prostate dataset, MC-DDPM achieved MAE 59.953 HU, PSNR 26.920 dB, SSIM 0.849, and NCC 0.948. In conclusion, we have developed and validated a novel approach for generating CT images from routine MRIs using a transformer-based DDPM. This model effectively captures the complex relationship between CT and MRI images, allowing for robust and high-quality synthetic CT (sCT) images to be generated in minutes. △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.00042 [pdf]

Cycle-guided Denoising Diffusion Probability Model for 3D Cross-modality MRI Synthesis

Authors: Shaoyan Pan, Chih-Wei Chang, Junbo Peng, Jiahan Zhang, Richard L. J. Qiu, Tonghe Wang, Justin Roper, Tian Liu, Hui Mao, Xiaofeng Yang

Abstract: This study aims to develop a novel Cycle-guided Denoising Diffusion Probability Model (CG-DDPM) for cross-modality MRI synthesis. The CG-DDPM deploys two DDPMs that condition each other to generate synthetic images from two different MRI pulse sequences. The two DDPMs exchange random latent noise in the reverse processes, which helps to regularize both DDPMs and generate matching images in two mod… ▽ More This study aims to develop a novel Cycle-guided Denoising Diffusion Probability Model (CG-DDPM) for cross-modality MRI synthesis. The CG-DDPM deploys two DDPMs that condition each other to generate synthetic images from two different MRI pulse sequences. The two DDPMs exchange random latent noise in the reverse processes, which helps to regularize both DDPMs and generate matching images in two modalities. This improves image-to-image translation ac-curacy. We evaluated the CG-DDPM quantitatively using mean absolute error (MAE), multi-scale structural similarity index measure (MSSIM), and peak sig-nal-to-noise ratio (PSNR), as well as the network synthesis consistency, on the BraTS2020 dataset. Our proposed method showed high accuracy and reliable consistency for MRI synthesis. In addition, we compared the CG-DDPM with several other state-of-the-art networks and demonstrated statistically significant improvements in the image quality of synthetic MRIs. The proposed method enhances the capability of current multimodal MRI synthesis approaches, which could contribute to more accurate diagnosis and better treatment planning for patients by synthesizing additional MRI modalities. △ Less

Submitted 28 April, 2023; originally announced May 2023.

arXiv:2304.14688 [pdf, other]

An Efficient Hash-based Data Structure for Dynamic Vision Sensors and its Application to Low-energy Low-memory Noise Filtering

Authors: Pradeep Kumar Gopalakrishnan, Chip-Hong Chang, Arindam Basu

Abstract: Events generated by the Dynamic Vision Sensor (DVS) are generally stored and processed in two-dimensional data structures whose memory complexity and energy-per-event scale proportionately with increasing sensor dimensions. In this paper, we propose a new two-dimensional data structure (BF_2) that takes advantage of the sparsity of events and enables compact storage of data using hash functions. I… ▽ More Events generated by the Dynamic Vision Sensor (DVS) are generally stored and processed in two-dimensional data structures whose memory complexity and energy-per-event scale proportionately with increasing sensor dimensions. In this paper, we propose a new two-dimensional data structure (BF_2) that takes advantage of the sparsity of events and enables compact storage of data using hash functions. It overcomes the saturation issue in the Bloom Filter (BF) and the memory reset issue in other hash-based arrays by using a second dimension to clear 1 out of D rows at regular intervals. A hardware-friendly, low-power, and low-memory-footprint noise filter for DVS is demonstrated using BF_2. For the tested datasets, the performance of the filter matches those of state-of-the-art filters like the BAF/STCF while consuming less than 10% and 15% of their memory and energy-per-event, respectively, for a correlation time constant Tau = 5 ms. The memory and energy advantages of the proposed filter increase with increasing sensor sizes. The proposed filter compares favourably with other hardware-friendly, event-based filters in hardware complexity, memory requirement and energy-per-event - as demonstrated through its implementation on an FPGA. The parameters of the data structure can be adjusted for trade-offs between performance and memory consumption, based on application requirements. △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: Supplementary material can be accessed at the link provided at the end of the manuscript

arXiv:2304.11267 [pdf, other]

Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations

Authors: Yu-Hui Chen, Raman Sarokin, Juhyun Lee, Jiuqiang Tang, Chuo-Ling Chang, Andrei Kulik, Matthias Grundmann

Abstract: The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, commo… ▽ More The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, common large diffusion models have over 1 billion parameters and pose challenges due to restricted computational and memory resources on devices. We present a series of implementation optimizations for large diffusion models that achieve the fastest reported inference latency to-date (under 12 seconds for Stable Diffusion 1.4 without int8 quantization on Samsung S23 Ultra for a 512x512 image with 20 iterations) on GPU-equipped mobile devices. These enhancements broaden the applicability of generative AI and improve the overall user experience across a wide range of devices. △ Less

Submitted 16 June, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

Comments: 4 pages (not including references), 2 figures, 2 tables. Accepted to Efficient Deep Learning for Computer Vision workshop 2023

arXiv:2304.06474 [pdf, ps, other]

Attention-based Learning for Sleep Apnea and Limb Movement Detection using Wi-Fi CSI Signals

Authors: Chi-Che Chang, An-Hung Hsiao, Li-Hsiang Shen, Kai-Ten Feng, Chia-Yu Chen

Abstract: Wi-Fi channel state information (CSI) has become a promising solution for non-invasive breathing and body motion monitoring during sleep. Sleep disorders of apnea and periodic limb movement disorder (PLMD) are often unconscious and fatal. The existing researches detect abnormal sleep disorders in impractically controlled environments. Moreover, it leads to compelling challenges to classify complex… ▽ More Wi-Fi channel state information (CSI) has become a promising solution for non-invasive breathing and body motion monitoring during sleep. Sleep disorders of apnea and periodic limb movement disorder (PLMD) are often unconscious and fatal. The existing researches detect abnormal sleep disorders in impractically controlled environments. Moreover, it leads to compelling challenges to classify complex macro- and micro-scales of sleep movements as well as entangled similar waveforms of cases of apnea and PLMD. In this paper, we propose the attention-based learning for sleep apnea and limb movement detection (ALESAL) system that can jointly detect sleep apnea and PLMD under different sleep postures across a variety of patients. ALESAL contains antenna-pair and time attention mechanisms for mitigating the impact of modest antenna pairs and emphasizing the duration of interest, respectively. Performance results show that our proposed ALESAL system can achieve a weighted F1-score of 84.33, outperforming the other existing non-attention based methods of support vector machine and deep multilayer perceptron. △ Less

Submitted 26 March, 2023; originally announced April 2023.

arXiv:2304.03172 [pdf, other]

A Privacy Preserving Distributed Model Identification Algorithm for Power Distribution Systems

Authors: Chin-Yao Chang

Abstract: Distributed control/optimization is a promising approach for network systems due to its advantages over centralized schemes, such as robustness, cost-effectiveness, and improved privacy. However, distributed methods can have drawbacks, such as slower convergence rates due to limited knowledge of the overall network model. Additionally, ensuring privacy in the communication of sensitive information… ▽ More Distributed control/optimization is a promising approach for network systems due to its advantages over centralized schemes, such as robustness, cost-effectiveness, and improved privacy. However, distributed methods can have drawbacks, such as slower convergence rates due to limited knowledge of the overall network model. Additionally, ensuring privacy in the communication of sensitive information can pose implementation challenges. To address this issue, we propose a distributed model identification algorithm that enables each agent to identify the sub-model that characterizes the relationship between its local control and the overall system outputs. The proposed algorithm maintains the privacy of local agents by only communicating through dummy variables. We demonstrate the efficacy of our algorithm in the context of power distribution systems by applying it to the voltage regulation of a modified IEEE distribution system. The proposed algorithm is well-suited to the needs of power distribution controls and offers an effective solution to the challenges of distributed model identification in network systems. △ Less

Submitted 6 April, 2023; originally announced April 2023.

arXiv:2211.09949 [pdf, other]

Compressing Transformer-based self-supervised models for speech processing

Authors: Tzu-Quan Lin, Tsung-Huan Yang, Chun-Yao Chang, Kuang-Ming Chen, Tzu-hsun Feng, Hung-yi Lee, Hao Tang

Abstract: Despite the success of Transformers in self- supervised learning with applications to various downstream tasks, the computational cost of training and inference remains a major challenge for applying these models to a wide spectrum of devices. Several isolated attempts have been made to compress Transformers, but the settings and metrics are different across studies. Trade-off at various compressi… ▽ More Despite the success of Transformers in self- supervised learning with applications to various downstream tasks, the computational cost of training and inference remains a major challenge for applying these models to a wide spectrum of devices. Several isolated attempts have been made to compress Transformers, but the settings and metrics are different across studies. Trade-off at various compression rates are also largely missing in prior work, making it difficult to compare compression techniques. In this work, we aim to provide context for the isolated results, studying several commonly used compression techniques, including weight pruning, head pruning, low-rank approximation, and knowledge distillation. We report trade- off at various compression rate, including wall-clock time, the number of parameters, and the number of multiply-accumulate operations. Our results show that compared to recent approaches, basic compression techniques are strong baselines. We further present several applications of our results, revealing properties of Transformers, such as the significance of diagonal attention heads. In addition, our results lead to a simple combination of compression techniques that improves trade-off over recent approaches. We hope the results would promote more diverse comparisons among model compression techniques and promote the use of model compression as a tool for analyzing models. Our code of compressing speech self-supervised model is available at https://github.com/nervjack2/Speech-SSL-Compression/. △ Less

Submitted 26 January, 2024; v1 submitted 17 November, 2022; originally announced November 2022.

Comments: Submitted to IEEE Transactions on Audio, Speech and Language Processing (TASLP)

arXiv:2211.07357 [pdf, other]

Controlling Commercial Cooling Systems Using Reinforcement Learning

Authors: Jerry Luo, Cosmin Paduraru, Octavian Voicu, Yuri Chervonyi, Scott Munns, Jerry Li, Crystal Qian, Praneet Dutta, Jared Quincy Davis, Ningjia Wu, Xingwei Yang, Chu-Ming Chang, Ted Li, Rob Rose, Mingyan Fan, Hootan Nakhost, Tinglin Liu, Brian Kirkman, Frank Altamura, Lee Cline, Patrick Tonker, Joel Gouker, Dave Uden, Warren Buddy Bryan, Jason Law , et al. (11 additional authors not shown)

Abstract: This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments ha… ▽ More This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems. Building on expertise that began with cooling Google's data centers more efficiently, we recently conducted live experiments on two real-world facilities in partnership with Trane Technologies, a building management system provider. These live experiments had a variety of challenges in areas such as evaluation, learning from offline data, and constraint satisfaction. Our paper describes these challenges in the hope that awareness of them will benefit future applied RL work. We also describe the way we adapted our RL system to deal with these challenges, resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites. △ Less

Submitted 14 December, 2022; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: 27 pages, 11 figures

arXiv:2210.08225 [pdf, other]

Learned Video Compression for YUV 4:2:0 Content Using Flow-based Conditional Inter-frame Coding

Authors: Yung-Han Ho, Chih-Hsuan Lin, Peng-Yu Chen, Mu-Jung Chen, Chih-Peng Chang, Wen-Hsiao Peng, Hsueh-Ming Hang

Abstract: This paper proposes a learning-based video compression framework for variable-rate coding on YUV 4:2:0 content. Most existing learning-based video compression models adopt the traditional hybrid-based coding architecture, which involves temporal prediction followed by residual coding. However, recent studies have shown that residual coding is sub-optimal from the information-theoretic perspective.… ▽ More This paper proposes a learning-based video compression framework for variable-rate coding on YUV 4:2:0 content. Most existing learning-based video compression models adopt the traditional hybrid-based coding architecture, which involves temporal prediction followed by residual coding. However, recent studies have shown that residual coding is sub-optimal from the information-theoretic perspective. In addition, most existing models are optimized with respect to RGB content. Furthermore, they require separate models for variable-rate coding. To address these issues, this work presents an attempt to incorporate the conditional inter-frame coding for YUV 4:2:0 content. We introduce a conditional flow-based inter-frame coder to improve the inter-frame coding efficiency. To adapt our codec to YUV 4:2:0 content, we adopt a simple strategy of using space-to-depth and depth-to-space conversions. Lastly, we employ a rate-adaption net to achieve variable-rate coding without training multiple models. Experimental results show that our model performs better than x265 on UVG and MCL-JCV datasets in terms of PSNR-YUV. However, on the more challenging datasets from ISCAS'22 GC, there is still ample room for improvement. This insufficient performance is due to the lack of inter-frame coding capability at a large GOP size and can be mitigated by increasing the model capacity and applying an error propagation-aware training strategy. △ Less

Submitted 15 October, 2022; originally announced October 2022.

Comments: Accepted by ISCAS 2022

arXiv:2207.07931 [pdf]

Learnable Mixed-precision and Dimension Reduction Co-design for Low-storage Activation

Authors: Yu-Shan Tai, Cheng-Yang Chang, Chieh-Fang Teng, AnYeu, Wu

Abstract: Recently, deep convolutional neural networks (CNNs) have achieved many eye-catching results. However, deploying CNNs on resource-constrained edge devices is constrained by limited memory bandwidth for transmitting large intermediated data during inference, i.e., activation. Existing research utilizes mixed-precision and dimension reduction to reduce computational complexity but pays less attention… ▽ More Recently, deep convolutional neural networks (CNNs) have achieved many eye-catching results. However, deploying CNNs on resource-constrained edge devices is constrained by limited memory bandwidth for transmitting large intermediated data during inference, i.e., activation. Existing research utilizes mixed-precision and dimension reduction to reduce computational complexity but pays less attention to its application for activation compression. To further exploit the redundancy in activation, we propose a learnable mixed-precision and dimension reduction co-design system, which separates channels into groups and allocates specific compression policies according to their importance. In addition, the proposed dynamic searching technique enlarges search space and finds out the optimal bit-width allocation automatically. Our experimental results show that the proposed methods improve 3.54%/1.27% in accuracy and save 0.18/2.02 bits per value over existing mixed-precision methods on ResNet18 and MobileNetv2, respectively. △ Less

Submitted 18 July, 2022; v1 submitted 16 July, 2022; originally announced July 2022.

arXiv:2207.05315 [pdf, other]

CANF-VC: Conditional Augmented Normalizing Flows for Video Compression

Authors: Yung-Han Ho, Chih-Peng Chang, Peng-Yu Chen, Alessandro Gnutti, Wen-Hsiao Peng

Abstract: This paper presents an end-to-end learning-based video compression system, termed CANF-VC, based on conditional augmented normalizing flows (CANF). Most learned video compression systems adopt the same hybrid-based coding architecture as the traditional codecs. Recent research on conditional coding has shown the sub-optimality of the hybrid-based coding and opens up opportunities for deep generati… ▽ More This paper presents an end-to-end learning-based video compression system, termed CANF-VC, based on conditional augmented normalizing flows (CANF). Most learned video compression systems adopt the same hybrid-based coding architecture as the traditional codecs. Recent research on conditional coding has shown the sub-optimality of the hybrid-based coding and opens up opportunities for deep generative models to take a key role in creating new coding frameworks. CANF-VC represents a new attempt that leverages the conditional ANF to learn a video generative model for conditional inter-frame coding. We choose ANF because it is a special type of generative model, which includes variational autoencoder as a special case and is able to achieve better expressiveness. CANF-VC also extends the idea of conditional coding to motion coding, forming a purely conditional coding framework. Extensive experimental results on commonly used datasets confirm the superiority of CANF-VC to the state-of-the-art methods. The source code of CANF-VC is available at https://github.com/NYCU-MAPL/CANF-VC. △ Less

Submitted 14 August, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

arXiv:2204.09595 [pdf, other]

doi 10.21437/Interspeech.2022-10627

Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation

Authors: Chih-Chiang Chang, Hung-yi Lee

Abstract: Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech before the complete input is observed. A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write. While recent works had proposed various strategies to improve the pre-decision, they mainly adopt the fixe… ▽ More Simultaneous speech translation (SimulST) is a challenging task aiming to translate streaming speech before the complete input is observed. A SimulST system generally includes two components: the pre-decision that aggregates the speech information and the policy that decides to read or write. While recent works had proposed various strategies to improve the pre-decision, they mainly adopt the fixed wait-k policy, leaving the adaptive policies rarely explored. This paper proposes to model the adaptive policy by adapting the Continuous Integrate-and-Fire (CIF). Compared with monotonic multihead attention (MMA), our method has the advantage of simpler computation, superior quality at low latency, and better generalization to long utterances. We conduct experiments on the MuST-C V2 dataset and show the effectiveness of our approach. △ Less

Submitted 3 October, 2022; v1 submitted 22 March, 2022; originally announced April 2022.

Comments: INTERSPEECH 2022 camera ready

Journal ref: Proc. Interspeech 2022, 5175-5179

arXiv:2204.07446 [pdf, other]

doi 10.1109/ACCESS.2022.3201645

Wi-Fi and Bluetooth Contact Tracing Without User Intervention

Authors: Brosnan Yuen, Yifeng Bie, Duncan Cairns, Geoffrey Harper, Jason Xu, Charles Chang, Xiaodai Dong, Tao Lu

Abstract: Previous contact tracing systems required the users to perform many manual actions, such as installing smartphone applications, joining wireless networks, or carrying custom user devices. This increases the barrier to entry and lowers the user adoption rate. As a result, the contact tracing effectiveness is reduced. Unlike the systems above, we propose a new privacy preserving Wi-Fi and Bluetooth… ▽ More Previous contact tracing systems required the users to perform many manual actions, such as installing smartphone applications, joining wireless networks, or carrying custom user devices. This increases the barrier to entry and lowers the user adoption rate. As a result, the contact tracing effectiveness is reduced. Unlike the systems above, we propose a new privacy preserving Wi-Fi and Bluetooth (BLE) contact tracing system that does not require smartphone applications, joining wireless networks, or custom user devices. Our specially built routers seamlessly track smartphones, laptops, smartwatches, BLE headphones, and tablets without any user action, but do not trace user identity. Map** between devices and users is only carried out for confirmed cases and suspected contacts. Moreover, we can track the absolute positions of user devices within 1.0 m due to using bidirectional long short-term memory neural networks that are trained with data pre-collected by an autonomous robot. This allows public health authorities to track indirect droplet and surface transmissions that other contact tracing systems often overlook. △ Less

Submitted 23 July, 2022; v1 submitted 30 March, 2022; originally announced April 2022.

Report number: 2169-3536

Journal ref: IEEE Access Volume 11 (2022) 91027-91044

arXiv:2203.10597 [pdf, other]

The Dark Side: Security Concerns in Machine Learning for EDA

Authors: Zhiyao Xie, **gyu Pan, Chen-Chia Chang, Yiran Chen

Abstract: The growing IC complexity has led to a compelling need for design efficiency improvement through new electronic design automation (EDA) methodologies. In recent years, many unprecedented efficient EDA methods have been enabled by machine learning (ML) techniques. While ML demonstrates its great potential in circuit design, however, the dark side about security problems, is seldomly discussed. This… ▽ More The growing IC complexity has led to a compelling need for design efficiency improvement through new electronic design automation (EDA) methodologies. In recent years, many unprecedented efficient EDA methods have been enabled by machine learning (ML) techniques. While ML demonstrates its great potential in circuit design, however, the dark side about security problems, is seldomly discussed. This paper gives a comprehensive and impartial summary of all security concerns we have observed in ML for EDA. Many of them are hidden or neglected by practitioners in this field. In this paper, we first provide our taxonomy to define four major types of security concerns, then we analyze different application scenarios and special properties in ML for EDA. After that, we present our detailed analysis of each security concern with experiments. △ Less

Submitted 20 March, 2022; originally announced March 2022.

arXiv:2202.02518 [pdf, other]

doi 10.1007/s11235-022-00985-0

On the predictability in reversible steganography

Authors: Ching-Chun Chang, Xu Wang, Sisheng Chen, Hitoshi Kiya, Isao Echizen

Abstract: Artificial neural networks have advanced the frontiers of reversible steganography. The core strength of neural networks is the ability to render accurate predictions for a bewildering variety of data. Residual modulation is recognised as the most advanced reversible steganographic algorithm for digital images. The pivot of this algorithm is predictive analytics in which pixel intensities are pred… ▽ More Artificial neural networks have advanced the frontiers of reversible steganography. The core strength of neural networks is the ability to render accurate predictions for a bewildering variety of data. Residual modulation is recognised as the most advanced reversible steganographic algorithm for digital images. The pivot of this algorithm is predictive analytics in which pixel intensities are predicted given some pixel-wise contextual information. This task can be perceived as a low-level vision problem and hence neural networks for addressing a similar class of problems can be deployed. On top of the prior art, this paper investigates predictability of pixel intensities based on supervised and unsupervised learning frameworks. Predictability analysis enables adaptive data embedding, which in turn leads to a better trade-off between capacity and imperceptibility. While conventional methods estimate predictability by the statistics of local image patterns, learning-based frameworks consider further the degree to which correct predictions can be made by a designated predictor. Not only should the image patterns be taken into account but also the predictor in use. Experimental results show that steganographic performance can be significantly improved by incorporating the learning-based predictability analysers into a reversible steganographic system. △ Less

Submitted 7 March, 2023; v1 submitted 5 February, 2022; originally announced February 2022.

Journal ref: Telecommunication Systems (2023), vol. 82, no. 2, pp. 301-313

arXiv:2111.13312 [pdf]

doi 10.1109/BHI50953.2021.9508588

Instrumented shoulder functional assessment using inertial measurement units for frozen shoulder

Authors: Ting-Yang Lu, Kai-Chun Liu, Chia-Yeh Hsieh, Chih-Ya Chang, Yu Tsao, Chia-Tai Chan

Abstract: Frozen shoulder (FS) is a shoulder condition that leads to pain and loss of shoulder range of motion. FS patients have difficulties in independently performing daily activities. Inertial measurement units (IMUs) have been developed to objectively measure upper limb range of motion (ROM) and shoulder function. In this work, we propose an IMU-based shoulder functional task assessment with kinematic… ▽ More Frozen shoulder (FS) is a shoulder condition that leads to pain and loss of shoulder range of motion. FS patients have difficulties in independently performing daily activities. Inertial measurement units (IMUs) have been developed to objectively measure upper limb range of motion (ROM) and shoulder function. In this work, we propose an IMU-based shoulder functional task assessment with kinematic parameters (e.g., smoothness, power, speed, and duration) in FS patients and analyze the functional performance on complete shoulder tasks and subtasks. Twenty FS patients and twenty healthy subjects were recruited in this study. Five shoulder functional tasks are performed by participants, such as washing hair (WH), washing upper back (WUB), washing lower back (WLB), placing an object on a high shelf (POH), and removing an object from back pocket (ROP). The results demonstrate that the used smoothness features can reflect the differences of movement fluency between FS patients and healthy controls (p < 0.05 and effect size > 0.8). Moreover, features of subtasks provided subtle information related to clinical conditions that have not been revealed in features of a complete task, especially the defined subtask 1 and 2 of each task. △ Less

Submitted 25 November, 2021; originally announced November 2021.

Comments: 4 pages, 6 tables, 2 figures, To appear in 2021 IEEE BHI

arXiv:2111.06046 [pdf, other]

Music Score Expansion with Variable-Length Infilling

Authors: Chih-Pin Tan, Chin-Jui Chang, Alvin W. Y. Su, Yi-Hsuan Yang

Abstract: In this paper, we investigate using the variable-length infilling (VLI) model, which is originally proposed to infill missing segments, to "prolong" existing musical segments at musical boundaries. Specifically, as a case study, we expand 20 musical segments from 12 bars to 16 bars, and examine the degree to which the VLI model preserves musical boundaries in the expanded results using a few objec… ▽ More In this paper, we investigate using the variable-length infilling (VLI) model, which is originally proposed to infill missing segments, to "prolong" existing musical segments at musical boundaries. Specifically, as a case study, we expand 20 musical segments from 12 bars to 16 bars, and examine the degree to which the VLI model preserves musical boundaries in the expanded results using a few objective metrics, including the Register Histogram Similarity we newly propose. The results show that the VLI model has the potential to address the expansion task. △ Less

Submitted 10 November, 2021; originally announced November 2021.

Comments: Going to published as a late-breaking demo paper at ISMIR 2021

arXiv:2110.08828 [pdf]

Compression-aware Projection with Greedy Dimension Reduction for Convolutional Neural Network Activations

Authors: Yu-Shan Tai, Chieh-Fang Teng, Cheng-Yang Chang, An-Yeu Wu

Abstract: Convolutional neural networks (CNNs) achieve remarkable performance in a wide range of fields. However, intensive memory access of activations introduces considerable energy consumption, impeding deployment of CNNs on resourceconstrained edge devices. Existing works in activation compression propose to transform feature maps for higher compressibility, thus enabling dimension reduction. Neverthele… ▽ More Convolutional neural networks (CNNs) achieve remarkable performance in a wide range of fields. However, intensive memory access of activations introduces considerable energy consumption, impeding deployment of CNNs on resourceconstrained edge devices. Existing works in activation compression propose to transform feature maps for higher compressibility, thus enabling dimension reduction. Nevertheless, in the case of aggressive dimension reduction, these methods lead to severe accuracy drop. To improve the trade-off between classification accuracy and compression ratio, we propose a compression-aware projection system, which employs a learnable projection to compensate for the reconstruction loss. In addition, a greedy selection metric is introduced to optimize the layer-wise compression ratio allocation by considering both accuracy and #bits reduction simultaneously. Our test results show that the proposed methods effectively reduce 2.91x~5.97x memory access with negligible accuracy drop on MobileNetV2/ResNet18/VGG16. △ Less

Submitted 17 October, 2021; originally announced October 2021.

Comments: 5 pages, 5 figures, submitted to 2022 ICASSP

arXiv:2109.10181 [pdf, other]

Intelligent Traffic Control System by Using Image Information

Authors: Zong-Ming Lin, Cheng-Yang Chang, Chin-Yu Hu, Yung-Yuan Chen

Abstract: This paper implements a traffic signal control system by using real-time traffic flow feedback. This system is designed to deal with two-lane intersections. We construct an experiment field similar to the roads and drivers in Taiwan using an autonomous simulation software called Virtual Test Drive (VTD) released by MSC Software. We erect four cameras on the side of the roads to get the image of th… ▽ More This paper implements a traffic signal control system by using real-time traffic flow feedback. This system is designed to deal with two-lane intersections. We construct an experiment field similar to the roads and drivers in Taiwan using an autonomous simulation software called Virtual Test Drive (VTD) released by MSC Software. We erect four cameras on the side of the roads to get the image of the intersection, then transfer the image information into traffic flow information. Analyze the traffic information in each lane by using Greenshields traffic flow model. Control the traffic signals by using Webster's method to increase the performance and soothe the traffic. △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: 7 pages, 16 figures

arXiv:2107.05223 [pdf, other]

BERT-like Pre-training for Symbolic Piano Music Classification Tasks

Authors: Yi-Hui Chou, I-Chun Chen, Chin-Jui Chang, Joann Ching, Yi-Hsuan Yang

Abstract: This article presents a benchmark study of symbolic piano music classification using the masked language modelling approach of the Bidirectional Encoder Representations from Transformers (BERT). Specifically, we consider two types of MIDI data: MIDI scores, which are musical scores rendered directly into MIDI with no dynamics and precisely aligned with the metrical grid notated by its composer and… ▽ More This article presents a benchmark study of symbolic piano music classification using the masked language modelling approach of the Bidirectional Encoder Representations from Transformers (BERT). Specifically, we consider two types of MIDI data: MIDI scores, which are musical scores rendered directly into MIDI with no dynamics and precisely aligned with the metrical grid notated by its composer and MIDI performances, which are MIDI encodings of human performances of musical scoresheets. With five public-domain datasets of single-track piano MIDI files, we pre-train two 12-layer Transformer models using the BERT approach, one for MIDI scores and the other for MIDI performances, and fine-tune them for four downstream classification tasks. These include two note-level classification tasks (melody extraction and velocity prediction) and two sequence-level classification tasks (style classification and emotion classification). Our evaluation shows that the BERT approach leads to higher classification accuracy than recurrent neural network (RNN)-based baselines. △ Less

Submitted 13 April, 2024; v1 submitted 12 July, 2021; originally announced July 2021.

Comments: Accepted to Journal of Creative Music Systems

arXiv:2106.06924 [pdf, other]

doi 10.1109/ACCESS.2023.3233976

Deep Learning for Predictive Analytics in Reversible Steganography

Authors: Ching-Chun Chang, Xu Wang, Sisheng Chen, Isao Echizen, Victor Sanchez, Chang-Tsun Li

Abstract: Deep learning is regarded as a promising solution for reversible steganography. There is an accelerating trend of representing a reversible steo-system by monolithic neural networks, which bypass intermediate operations in traditional pipelines of reversible steganography. This end-to-end paradigm, however, suffers from imperfect reversibility. By contrast, the modular paradigm that incorporates n… ▽ More Deep learning is regarded as a promising solution for reversible steganography. There is an accelerating trend of representing a reversible steo-system by monolithic neural networks, which bypass intermediate operations in traditional pipelines of reversible steganography. This end-to-end paradigm, however, suffers from imperfect reversibility. By contrast, the modular paradigm that incorporates neural networks into modules of traditional pipelines can stably guarantee reversibility with mathematical explainability. Prediction-error modulation is a well-established reversible steganography pipeline for digital images. It consists of a predictive analytics module and a reversible coding module. Given that reversibility is governed independently by the coding module, we narrow our focus to the incorporation of neural networks into the analytics module, which serves the purpose of predicting pixel intensities and a pivotal role in determining capacity and imperceptibility. The objective of this study is to evaluate the impacts of different training configurations upon predictive accuracy of neural networks and provide practical insights. In particular, we investigate how different initialisation strategies for input images may affect the learning process and how different training strategies for dual-layer prediction respond to the problem of distributional shift. Furthermore, we compare steganographic performance of various model architectures with different loss functions. △ Less

Submitted 7 March, 2023; v1 submitted 13 June, 2021; originally announced June 2021.

Journal ref: IEEE Access (2023), vol. 11, pp. 3494-3510

arXiv:2104.13895 [pdf, other]

Closed-loop Control Design and Motor Allocation for a Lower-limb Cable-driven Exoskeleton: A Switched Systems Approach

Authors: Chen-Hao Chang, Jonathan Casas, Victor H. Duenas

Abstract: Powered lower-limb exoskeletons provide assistive torques to coordinate limb motion during walking in individuals with movement disorders. Advances in sensing and actuation have improved the wearability and portability of state-of-the-art exoskeletons for walking. Cable-driven exoskeletons offload the actuators away from the user, thus rendering light-weight devices to facilitate locomotion traini… ▽ More Powered lower-limb exoskeletons provide assistive torques to coordinate limb motion during walking in individuals with movement disorders. Advances in sensing and actuation have improved the wearability and portability of state-of-the-art exoskeletons for walking. Cable-driven exoskeletons offload the actuators away from the user, thus rendering light-weight devices to facilitate locomotion training. However, cable-driven mechanisms experience a slacking behavior if tension is not accurately controlled. Moreover, counteracting forces can arise between the agonist and antagonist motors yielding undesired joint motion. In this paper, the strategy is to develop two control layers to improve the performance of a cable-driven exoskeleton. First, a joint tracking controller is designed using a high-gain robust approach to track desired knee and hip trajectories. Second, a motor synchronization objective is developed to mitigate the effects of cable slacking for a pair of electric motors that actuate each joint. A sliding-mode robust controller is designed for the motor synchronization objective. A Lyapunov-based stability analysis is developed to guarantee a uniformly ultimately bounded result for joint tracking and exponential tracking for the motor synchronization objective. Moreover, an average dwell time analysis provides a bound on the number of motor switches when allocating the control between motors that actuate each joint. An experimental result with an able-bodied individual illustrates the feasibility of the developed control methods. △ Less

Submitted 28 April, 2021; originally announced April 2021.

arXiv:2011.07442 [pdf, other]

Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information

Authors: Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe, Yu Tsao

Abstract: Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the broad phonetic properties of the input speech when performing enhancement to attain performance improvements. In this paper, we explore the contextual information of articulatory attributes as additional information t… ▽ More Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the broad phonetic properties of the input speech when performing enhancement to attain performance improvements. In this paper, we explore the contextual information of articulatory attributes as additional information to further benefit SE. More specifically, we propose to improve the SE performance by leveraging losses from an end-to-end automatic speech recognition (E2E-ASR) model that predicts the sequence of broad phonetic classes (BPCs). We also developed multi-objective training with ASR and perceptual losses to train the SE system based on a BPC-based E2E-ASR. Experimental results from speech denoising, speech dereverberation, and impaired speech enhancement tasks confirmed that contextual BPC information improves SE performance. Moreover, the SE model trained with the BPC-based E2E-ASR outperforms that with the phoneme-based E2E-ASR. The results suggest that objectives with misclassification of phonemes by the ASR system may lead to imperfect feedback, and BPC could be a potentially better choice. Finally, it is noted that combining the most-confusable phonetic targets into the same BPC when calculating the additional objective can effectively improve the SE performance. △ Less

Submitted 18 June, 2023; v1 submitted 14 November, 2020; originally announced November 2020.

Comments: To appear in IEEE Transactions on Audio, Speech and Language Processing (TASLP)

arXiv:2011.07406 [pdf, other]

Using Convolutional Variational Autoencoders to Predict Post-Trauma Health Outcomes from Actigraphy Data

Authors: Ayse S. Cakmak, Nina Thigpen, Garrett Honke, Erick Perez Alday, Ali Bahrami Rad, Rebecca Adaimi, Chia Jung Chang, Qiao Li, Pramod Gupta, Thomas Neylan, Samuel A. McLean, Gari D. Clifford

Abstract: Depression and post-traumatic stress disorder (PTSD) are psychiatric conditions commonly associated with experiencing a traumatic event. Estimating mental health status through non-invasive techniques such as activity-based algorithms can help to identify successful early interventions. In this work, we used locomotor activity captured from 1113 individuals who wore a research grade smartwatch pos… ▽ More Depression and post-traumatic stress disorder (PTSD) are psychiatric conditions commonly associated with experiencing a traumatic event. Estimating mental health status through non-invasive techniques such as activity-based algorithms can help to identify successful early interventions. In this work, we used locomotor activity captured from 1113 individuals who wore a research grade smartwatch post-trauma. A convolutional variational autoencoder (VAE) architecture was used for unsupervised feature extraction from four weeks of actigraphy data. By using VAE latent variables and the participant's pre-trauma physical health status as features, a logistic regression classifier achieved an area under the receiver operating characteristic curve (AUC) of 0.64 to estimate mental health outcomes. The results indicate that the VAE model is a promising approach for actigraphy data analysis for mental health outcomes in long-term studies. △ Less

Submitted 19 November, 2020; v1 submitted 14 November, 2020; originally announced November 2020.

Comments: Fixed typo in author affiliations

arXiv:2011.04101 [pdf, other]

Enabling DER Participation in Frequency Regulation Markets

Authors: Priyank Srivastava, Chin-Yao Chang, Jorge Cortes

Abstract: Distributed energy resources (DERs) are playing an increasing role in ancillary services for the bulk grid, particularly in frequency regulation. In this paper, we propose a framework for collections of DERs, combined to form microgrids and controlled by aggregators, to participate in frequency regulation markets. Our approach covers both the identification of bids for the market clearing stage an… ▽ More Distributed energy resources (DERs) are playing an increasing role in ancillary services for the bulk grid, particularly in frequency regulation. In this paper, we propose a framework for collections of DERs, combined to form microgrids and controlled by aggregators, to participate in frequency regulation markets. Our approach covers both the identification of bids for the market clearing stage and the mechanisms for the real-time allocation of the regulation signal. The proposed framework is hierarchical, consisting of a top layer and a bottom layer. The top layer consists of the aggregators communicating in a distributed fashion to optimally disaggregate the regulation signal requested by the system operator. The bottom layer consists of the DERs inside each microgrid whose power levels are adjusted so that the tie line power matches the output of the corresponding aggregator in the top layer. The coordination at the top layer requires the knowledge of cost functions, ramp rates and capacity bounds of the aggregators. We develop meaningful abstractions for these quantities respecting the power flow constraints and taking into account the load uncertainties, and propose a provably correct distributed algorithm for optimal disaggregation of regulation signal amongst the microgrids. △ Less

Submitted 29 January, 2021; v1 submitted 8 November, 2020; originally announced November 2020.

Comments: 14 pages, 8 figures

arXiv:2009.14668 [pdf]

Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion

Authors: Che-Jui Chang

Abstract: Cross-lingual voice conversion (VC) is a task that aims to synthesize target voices with the same content while source and target speakers speak in different languages. Its challenge lies in the fact that the source and target data are naturally non-parallel, and it is even difficult to bridge the gaps between languages with no transcriptions provided. In this paper, we focus on knowledge transfer… ▽ More Cross-lingual voice conversion (VC) is a task that aims to synthesize target voices with the same content while source and target speakers speak in different languages. Its challenge lies in the fact that the source and target data are naturally non-parallel, and it is even difficult to bridge the gaps between languages with no transcriptions provided. In this paper, we focus on knowledge transfer from monolin-gual ASR to cross-lingual VC, in order to address the con-tent mismatch problem. To achieve this, we first train a monolingual acoustic model for the source language, use it to extract phonetic features for all the speech in the VC dataset, and then train a Seq2Seq conversion model to pre-dict the mel-spectrograms. We successfully address cross-lingual VC without any transcription or language-specific knowledge for foreign speech. We experiment this on Voice Conversion Challenge 2020 datasets and show that our speaker-dependent conversion model outperforms the zero-shot baseline, achieving MOS of 3.83 and 3.54 in speech quality and speaker similarity for cross-lingual conversion. When compared to Cascade ASR-TTS method, our proposed one significantly reduces the MOS drop be-tween intra- and cross-lingual conversion. △ Less

Submitted 30 September, 2020; originally announced September 2020.

arXiv:2009.01759 [pdf, other]

Intra-Utterance Similarity Preserving Knowledge Distillation for Audio Tagging

Authors: Chun-Chieh Chang, Chieh-Chi Kao, Ming Sun, Chao Wang

Abstract: Knowledge Distillation (KD) is a popular area of research for reducing the size of large models while still maintaining good performance. The outputs of larger teacher models are used to guide the training of smaller student models. Given the repetitive nature of acoustic events, we propose to leverage this information to regulate the KD training for Audio Tagging. This novel KD method, "Intra-Utt… ▽ More Knowledge Distillation (KD) is a popular area of research for reducing the size of large models while still maintaining good performance. The outputs of larger teacher models are used to guide the training of smaller student models. Given the repetitive nature of acoustic events, we propose to leverage this information to regulate the KD training for Audio Tagging. This novel KD method, "Intra-Utterance Similarity Preserving KD" (IUSP), shows promising results for the audio tagging task. It is motivated by the previously published KD method: "Similarity Preserving KD" (SP). However, instead of preserving the pairwise similarities between inputs within a mini-batch, our method preserves the pairwise similarities between the frames of a single input utterance. Our proposed KD method, IUSP, shows consistent improvements over SP across student models of different sizes on the DCASE 2019 Task 5 dataset for audio tagging. There is a 27.1% to 122.4% percent increase in improvement of micro AUPRC over the baseline relative to SP's improvement of over the baseline. △ Less

Submitted 3 September, 2020; originally announced September 2020.

Comments: Accepted to Interspeech 2020

arXiv:2004.14252 [pdf]

Task-Projected Hyperdimensional Computing for Multi-Task Learning

Authors: Cheng-Yang Chang, Yu-Chuan Chuang, An-Yeu Wu

Abstract: Brain-inspired Hyperdimensional (HD) computing is an emerging technique for cognitive tasks in the field of low-power design. As a fast-learning and energy-efficient computational paradigm, HD computing has shown great success in many real-world applications. However, an HD model incrementally trained on multiple tasks suffers from the negative impacts of catastrophic forgetting. The model forgets… ▽ More Brain-inspired Hyperdimensional (HD) computing is an emerging technique for cognitive tasks in the field of low-power design. As a fast-learning and energy-efficient computational paradigm, HD computing has shown great success in many real-world applications. However, an HD model incrementally trained on multiple tasks suffers from the negative impacts of catastrophic forgetting. The model forgets the knowledge learned from previous tasks and only focuses on the current one. To the best of our knowledge, no study has been conducted to investigate the feasibility of applying multi-task learning to HD computing. In this paper, we propose Task-Projected Hyperdimensional Computing (TP-HDC) to make the HD model simultaneously support multiple tasks by exploiting the redundant dimensionality in the hyperspace. To mitigate the interferences between different tasks, we project each task into a separate subspace for learning. Compared with the baseline method, our approach efficiently utilizes the unused capacity in the hyperspace and shows a 12.8% improvement in averaged accuracy with negligible memory overhead. △ Less

Submitted 29 April, 2020; originally announced April 2020.

Comments: To be published in 16th International Conference on Artificial Intelligence Applications and Innovations

arXiv:2004.07980 [pdf, other]

Co-simulation Platform for Develo** InfoRich Energy-Efficient Connected and Automated Vehicles

Authors: Shunsuke Aoki, Lung En Jan, Junfeng Zhao, Anand Bhat, Ragunathan, Rajkumar, Chen-Fang Chang

Abstract: With advances in sensing, computing, and communication technologies, Connected and Automated Vehicles (CAVs) are becoming feasible. The advent of CAVs presents new opportunities to improve the energy efficiency of individual vehicles. However, testing and verifying energy-efficient autonomous driving systems are difficult due to safety considerations and repeatability. In this paper, we present a… ▽ More With advances in sensing, computing, and communication technologies, Connected and Automated Vehicles (CAVs) are becoming feasible. The advent of CAVs presents new opportunities to improve the energy efficiency of individual vehicles. However, testing and verifying energy-efficient autonomous driving systems are difficult due to safety considerations and repeatability. In this paper, we present a co-simulation platform to develop and test novel vehicle eco-autonomous driving technologies named InfoRich, which incorporates the information from on-board sensors, V2X communications, and map database. The co-simulation platform includes eco-autonomous driving software, vehicle dynamics and powertrain (VD&PT) model, and a traffic environment simulator. Also, we utilize synthetic drive cycles derived from real-world driving data to test the strategies under realistic driving scenarios. To build road networks from the real-world driving data, we develop an Automated Parser and Calculator for Map/Scenario named AutoPASCAL. Overall, the simulation platform provides a realistic vehicle model, powertrain model, sensor model, traffic model, and road-network model to enable the evaluation of the energy efficiency of eco-autonomous driving. △ Less

Submitted 16 April, 2020; originally announced April 2020.

arXiv:2001.04489 [pdf, other]

On the Computational Viability of Quantum Optimization for PMU Placement

Authors: Eric B. Jones, Eliot Kapit, Chin-Yao Chang, David Biagioni, Deepthi Vaidhynathan, Peter Graf, Wesley Jones

Abstract: Using optimal phasor measurement unit placement as a prototypical problem, we assess the computational viability of the current generation D-Wave Systems 2000Q quantum annealer for power systems design problems. We reformulate minimum dominating set for the annealer hardware, solve the reformulation for a standard set of IEEE test systems, and benchmark solution quality and time to solution agains… ▽ More Using optimal phasor measurement unit placement as a prototypical problem, we assess the computational viability of the current generation D-Wave Systems 2000Q quantum annealer for power systems design problems. We reformulate minimum dominating set for the annealer hardware, solve the reformulation for a standard set of IEEE test systems, and benchmark solution quality and time to solution against the CPLEX Optimizer and simulated annealing. For some problem instances the 2000Q outpaces CPLEX. For instances where the 2000Q underperforms with respect to CPLEX and simulated annealing, we suggest hardware improvements for the next generation of quantum annealers. △ Less

Submitted 13 January, 2020; originally announced January 2020.

arXiv:2001.01052 [pdf, ps, other]

Joint Beamforming and Computation Offloading for Multi-user Mobile-Edge Computing

Authors: Changfeng Ding, Jun-Bo Wang, Ming Cheng, Chuanwen Chang, **-Yuan Wang, Min Lin

Abstract: Mobile edge computing (MEC) is considered as an efficient method to relieve the computation burden of mobile devices. In order to reduce the energy consumption and time delay of mobile devices (MDs) in MEC, multiple users multiple input and multiple output (MU-MIMO) communications is considered to be applied to the MEC system. The purpose of this paper is to minimize the weighted sum of energy con… ▽ More Mobile edge computing (MEC) is considered as an efficient method to relieve the computation burden of mobile devices. In order to reduce the energy consumption and time delay of mobile devices (MDs) in MEC, multiple users multiple input and multiple output (MU-MIMO) communications is considered to be applied to the MEC system. The purpose of this paper is to minimize the weighted sum of energy consumption and time delay of MDs by jointly considering the offloading decision and MU-MIMO beamforming problems. And the resulting optimization problem is a mixed-integer non-linear programming problem, which is NP-hard. To solve the optimization problem, a semidefinite relaxation based algorithm is proposed to solve the offloading decision problem. Then, the MU-MIMO beamforming design problem is handled with a newly proposed fractional programming method. Simulation results show that the proposed algorithms can effectively reduce the energy consumption and time delay of the computation offloading. △ Less

Submitted 4 January, 2020; originally announced January 2020.

arXiv:1907.01919 [pdf, ps, other]

doi 10.1109/GCWkshps45667.2019.9024429

A Reinforcement Learning Approach for the Multichannel Rendezvous Problem

Authors: Jen-Hung Wang, **-En Lu, Cheng-Shang Chang, Duan-Shin Lee

Abstract: In this paper, we consider the multichannel rendezvous problem in cognitive radio networks (CRNs) where the probability that two users hop** on the same channel have a successful rendezvous is a function of channel states. The channel states are modelled by two-state Markov chains that have a good state and a bad state. These channel states are not observable by the users. For such a multichanne… ▽ More In this paper, we consider the multichannel rendezvous problem in cognitive radio networks (CRNs) where the probability that two users hop** on the same channel have a successful rendezvous is a function of channel states. The channel states are modelled by two-state Markov chains that have a good state and a bad state. These channel states are not observable by the users. For such a multichannel rendezvous problem, we are interested in finding the optimal policy to minimize the expected time-to-rendezvous (ETTR) among the class of {\em dynamic blind rendezvous policies}, i.e., at the $t^{th}$ time slot each user selects channel $i$ independently with probability $p_i(t)$, $i=1,2, \ldots, N$. By formulating such a multichannel rendezvous problem as an adversarial bandit problem, we propose using a reinforcement learning approach to learn the channel selection probabilities $p_i(t)$, $i=1,2, \ldots, N$. Our experimental results show that the reinforcement learning approach is very effective and yields comparable ETTRs when comparing to various approximation policies in the literature. △ Less

Submitted 5 July, 2019; v1 submitted 2 July, 2019; originally announced July 2019.

Comments: 5 pages, 9 figures. arXiv admin note: text overlap with arXiv:1906.10424

Journal ref: 2019 IEEE Globecom Workshops (GC Wkshps), Waikoloa, HI, USA, 2019, pp. 1-5

arXiv:1905.00190 [pdf, other]

Learned Image Compression with Soft Bit-based Rate-Distortion Optimization

Authors: David Alexandre, Chih-Peng Chang, Wen-Hsiao Peng, Hsueh-Ming Hang

Abstract: This paper introduces the notion of soft bits to address the rate-distortion optimization for learning-based image compression. Recent methods for such compression train an autoencoder end-to-end with an objective to strike a balance between distortion and rate. They are faced with the zero gradient issue due to quantization and the difficulty of estimating the rate accurately. Inspired by soft qu… ▽ More This paper introduces the notion of soft bits to address the rate-distortion optimization for learning-based image compression. Recent methods for such compression train an autoencoder end-to-end with an objective to strike a balance between distortion and rate. They are faced with the zero gradient issue due to quantization and the difficulty of estimating the rate accurately. Inspired by soft quantization, we represent quantization indices of feature maps with differentiable soft bits. This allows us to couple tightly the rate estimation with context-adaptive binary arithmetic coding. It also provides a differentiable distortion objective function. Experimental results show that our approach achieves the state-of-the-art compression performance among the learning-based schemes in terms of MS-SSIM and PSNR. △ Less

Submitted 1 May, 2019; originally announced May 2019.

arXiv:1903.00101 [pdf, other]

doi 10.1109/TAES.2019.2951213

Receiver Operating Characteristics for a Prototype Quantum Two-Mode Squeezing Radar

Authors: David Luong, C. W. Sandbo Chang, A. M. Vadiraj, Anthony Damini, C. M. Wilson, Bhashyam Balaji

Abstract: We have built and evaluated a prototype quantum radar, which we call a quantum two-mode squeezing radar (QTMS radar), in the laboratory. It operates solely at microwave frequencies; there is no downconversion from optical frequencies. Because the signal generation process relies on quantum mechanical principles, the system is considered to contain a quantum-enhanced radar transmitter. This transmi… ▽ More We have built and evaluated a prototype quantum radar, which we call a quantum two-mode squeezing radar (QTMS radar), in the laboratory. It operates solely at microwave frequencies; there is no downconversion from optical frequencies. Because the signal generation process relies on quantum mechanical principles, the system is considered to contain a quantum-enhanced radar transmitter. This transmitter generates a pair of entangled microwave signals and transmits one of them through free space, where the signal is measured using a simple and rudimentary receiver. At the heart of the transmitter is a device called a Josephson parametric amplifier (JPA), which generates a pair of entangled signals called two-mode squeezed vacuum (TMSV) at 6.1445 GHz and 7.5376 GHz. These are then sent through a chain of amplifiers. The 7.5376 GHz beam passes through 0.5 m of free space; the 6.1445 GHz signal is measured directly after amplification. The two measurement results are correlated in order to distinguish signal from noise. We compare our QTMS radar to a classical radar setup using conventional components, which we call a two-mode noise radar (TMN radar), and find that there is a significant gain when both systems broadcast signals at -82 dBm. This is shown via a comparison of receiver operator characteristic (ROC) curves. In particular, we find that the quantum radar requires 8 times fewer integrated samples compared to its classical counterpart to achieve the same performance. △ Less

Submitted 28 February, 2019; originally announced March 2019.

Comments: 17 pages, 17 figures; submitted to IEEE Transactions on Aerospace and Electronic Systems

arXiv:1811.12214 [pdf, other]

Play as You Like: Timbre-enhanced Multi-modal Music Style Transfer

Authors: Chien-Yu Lu, Min-Xin Xue, Chia-Che Chang, Che-Rung Lee, Li Su

Abstract: Style transfer of polyphonic music recordings is a challenging task when considering the modeling of diverse, imaginative, and reasonable music pieces in the style different from their original one. To achieve this, learning stable multi-modal representations for both domain-variant (i.e., style) and domain-invariant (i.e., content) information of music in an unsupervised manner is critical. In th… ▽ More Style transfer of polyphonic music recordings is a challenging task when considering the modeling of diverse, imaginative, and reasonable music pieces in the style different from their original one. To achieve this, learning stable multi-modal representations for both domain-variant (i.e., style) and domain-invariant (i.e., content) information of music in an unsupervised manner is critical. In this paper, we propose an unsupervised music style transfer method without the need for parallel data. Besides, to characterize the multi-modal distribution of music pieces, we employ the Multi-modal Unsupervised Image-to-Image Translation (MUNIT) framework in the proposed system. This allows one to generate diverse outputs from the learned latent distributions representing contents and styles. Moreover, to better capture the granularity of sound, such as the perceptual dimensions of timbre and the nuance in instrument-specific performance, cognitively plausible features including mel-frequency cepstral coefficients (MFCC), spectral difference, and spectral envelope, are combined with the widely-used mel-spectrogram into a timber-enhanced multi-channel input representation. The Relativistic average Generative Adversarial Networks (RaGAN) is also utilized to achieve fast convergence and high stability. We conduct experiments on bilateral style transfer tasks among three different genres, namely piano solo, guitar solo, and string quartet. Results demonstrate the advantages of the proposed method in music style transfer with improved sound quality and in allowing users to manipulate the output. △ Less

Submitted 28 November, 2018; originally announced November 2018.

Showing 1–50 of 52 results for author: Chang, C