Search | arXiv e-print repository

Design of Interacting Particle Systems for Fast and Efficient Reinforcement Learning

Authors: Anant A Joshi, Heng-Sheng Chang, Amirhossein Taghvaei, Prashant G Mehta, Sean P. Meyn

Abstract: This paper is concerned with the design of algorithms based on systems of interacting particles to represent, approximate, and learn the optimal control law for reinforcement learning (RL). The primary contribution of the present paper is to show that convergence rates can be accelerated dramatically through careful design of interactions between particles. Theory focuses on the linear quadratic s… ▽ More This paper is concerned with the design of algorithms based on systems of interacting particles to represent, approximate, and learn the optimal control law for reinforcement learning (RL). The primary contribution of the present paper is to show that convergence rates can be accelerated dramatically through careful design of interactions between particles. Theory focuses on the linear quadratic stochastic optimal control problem for which a complete and novel theory is presented. Apart from the new algorithm, sample complexity bounds are obtained, and it is shown that the mean square error scales as $1/N$ where $N$ is the number of particles. The theoretical results and algorithms are illustrated with numerical experiments and comparisons with other recent approaches, where the faster convergence of the proposed algorithm is numerically demonstrated. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.06650 [pdf, other]

Predicting the risk of early-stage breast cancer recurrence using H\&E-stained tissue images

Authors: Geongyu Lee, Joonho Lee, Tae-Yeong Kwak, Sun Woo Kim, Youngmee Kwon, Chungyeul Kim, Hyeyoon Chang

Abstract: Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology. A total of 125 hematoxylin and eosin stained breast cancer whole slide images la… ▽ More Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology. A total of 125 hematoxylin and eosin stained breast cancer whole slide images labeled with the risk prediction via genomics assays were used, and we obtained sensitivity of 0.857, 0.746, and 0.529 for predicting low, intermediate, and high risk, and specificity of 0.816, 0.803, and 0.972. When compared to the expert pathologist's regional histology grade information, a Pearson's correlation coefficient of 0.61 was obtained. When we checked the model learned through these studies through the class activation map, we found that it actually considered tubule formation and mitotic rate when predicting different risk groups. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 12 pages, 7 figures

arXiv:2405.20502 [pdf, ps, other]

Reach-Avoid Control Synthesis for a Quadrotor UAV with Formal Safety Guarantees

Authors: Mohamed Serry, Haocheng Chang, Jun Liu

Abstract: Reach-avoid specifications are one of the most common tasks in autonomous aerial vehicle (UAV) applications. Despite the intensive research and development associated with control of aerial vehicles, generating feasible trajectories though complex environments and tracking them with formal safety guarantees remain challenging. In this paper, we propose a control framework for a quadrotor UAV that… ▽ More Reach-avoid specifications are one of the most common tasks in autonomous aerial vehicle (UAV) applications. Despite the intensive research and development associated with control of aerial vehicles, generating feasible trajectories though complex environments and tracking them with formal safety guarantees remain challenging. In this paper, we propose a control framework for a quadrotor UAV that enables accomplishing reach-avoid tasks with formal safety guarantees. In this proposed framework, we integrate geometric control theory for tracking and polynomial trajectory generation using Bezier curves, where tracking errors are accounted for in the trajectory synthesis process. To estimate the tracking errors, we revisit the stability analysis of the closed-loop quadrotor system, when geometric control is implemented. We show that the tracking error dynamics exhibit local exponential stability when geometric control is implemented with any positive control gains, and we derive tight uniform bounds of the tracking error. We also introduce sufficient conditions to be imposed on the desired trajectory utilizing the derived uniform bounds to ensure the well-definedness of the closed-loop system. For the trajectory synthesis, we present an efficient algorithm that enables constructing a safe tube by means of sampling-based planning and safe hyper-rectangular set computations. Then, we compute the trajectory, given as a piecewise continuous Bezier curve, through the safe tube, where a heuristic efficient approach that utilizes iterative linear programming is employed. We present extensive numerical simulations with a cluttered environment to illustrate the effectiveness of the proposed framework in reach-avoid planning scenarios. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2404.09385 [pdf, other]

A Large-Scale Evaluation of Speech Foundation Models

Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work, we establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for speech. We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads. Combining our results with community submissions, we verify that the foundation model paradigm is promising for speech, and our multi-tasking framework is simple yet effective, as the best-performing foundation model shows competitive generalizability across most SUPERB tasks. For reproducibility and extensibility, we have developed a long-term maintained platform that enables deterministic benchmarking, allows for result sharing via an online leaderboard, and promotes collaboration through a community-driven benchmark database to support new development cycles. Finally, we conduct a series of analyses to offer an in-depth understanding of SUPERB and speech foundation models, including information flows across tasks inside the models, the correctness of the weighted-sum benchmarking protocol and the statistical significance and robustness of the benchmark. △ Less

Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

arXiv:2404.05191 [pdf, ps, other]

Graph-based Untrained Neural Network Detector for OTFS Systems

Authors: Hao Chang, Branka Vucetic, Wibowo Hardjawana

Abstract: Inter-carrier interference (ICI) caused by mobile reflectors significantly degrades the conventional orthogonal frequency division multiplexing (OFDM) performance in high-mobility environments. The orthogonal time frequency space (OTFS) modulation system effectively represents ICI in the delay-Doppler domain, thus significantly outperforming OFDM. Existing iterative and neural network (NN) based O… ▽ More Inter-carrier interference (ICI) caused by mobile reflectors significantly degrades the conventional orthogonal frequency division multiplexing (OFDM) performance in high-mobility environments. The orthogonal time frequency space (OTFS) modulation system effectively represents ICI in the delay-Doppler domain, thus significantly outperforming OFDM. Existing iterative and neural network (NN) based OTFS detectors suffer from high complex matrix operations and performance degradation in untrained environments, where the real wireless channel does not match the one used in the training, which often happens in real wireless networks. In this paper, we propose to embed the prior knowledge of interference extracted from the estimated channel state information (CSI) as a directed graph into a decoder untrained neural network (DUNN), namely graph-based DUNN (GDUNN). We then combine it with Bayesian parallel interference cancellation (BPIC) for OTFS symbol detection, resulting in GDUNN-BPIC. Simulation results show that the proposed GDUNN-BPIC outperforms state-of-the-art OTFS detectors under imperfect CSI. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2402.06959 [pdf, other]

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

Authors: Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-yi Lee, Hsin-Min Wang, David Harwath

Abstract: The recently proposed visually grounded speech model SpeechCLIP is an innovative framework that bridges speech and text through images via CLIP without relying on text transcription. On this basis, this paper introduces two extensions to SpeechCLIP. First, we apply the Continuous Integrate-and-Fire (CIF) module to replace a fixed number of CLS tokens in the cascaded architecture. Second, we propos… ▽ More The recently proposed visually grounded speech model SpeechCLIP is an innovative framework that bridges speech and text through images via CLIP without relying on text transcription. On this basis, this paper introduces two extensions to SpeechCLIP. First, we apply the Continuous Integrate-and-Fire (CIF) module to replace a fixed number of CLS tokens in the cascaded architecture. Second, we propose a new hybrid architecture that merges the cascaded and parallel architectures of SpeechCLIP into a multi-task learning framework. Our experimental evaluation is performed on the Flickr8k and SpokenCOCO datasets. The results show that in the speech keyword extraction task, the CIF-based cascaded SpeechCLIP model outperforms the previous cascaded SpeechCLIP model using a fixed number of CLS tokens. Furthermore, through our hybrid architecture, cascaded task learning boosts the performance of the parallel branch in image-speech retrieval tasks. △ Less

Submitted 10 February, 2024; originally announced February 2024.

Comments: Accepted to ICASSP 2024, Self-supervision in Audio, Speech, and Beyond (SASB) workshop

arXiv:2312.01042 [pdf, ps, other]

Covert Communications in STAR-RIS-Aided Rate-Splitting Multiple Access Systems

Authors: Heng Chang, Hai Yang, Shuobo Xu, Xiyu Pang, Hongwu Liu

Abstract: In this paper, we investigate covert communications in a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)-aided rate-splitting multiple access (RSMA) system. Under the RSMA principles, the messages for the covert user (Bob) and public user (Grace) are converted to the common and private streams at the legitimate transmitter (Alice) to realize downlink transm… ▽ More In this paper, we investigate covert communications in a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)-aided rate-splitting multiple access (RSMA) system. Under the RSMA principles, the messages for the covert user (Bob) and public user (Grace) are converted to the common and private streams at the legitimate transmitter (Alice) to realize downlink transmissions, while the STAR-RIS is deployed not only to aid the public transmissions from Alice to Grace, but also to shield the covert transmissions from Alice to Bob against the warden (Willie). To characterize the covert performance of the considered STAR-RIS-aided RSMA (STAR-RIS-RSMA) system, we derive analytical expression for the minimum average detection error probability of Willie, based on which a covert rate maximization problem is formulated. To maximize Bob's covert rate while confusing Willie's monitoring, the transmit power allocation, common rate allocation, and STAR-RIS reflection/transmission beamforming are jointly optimized subject to Grace's quality of service (QoS) requirements. The non-convex covert rate maximization problem, consisting of highly coupled system parameters are decoupled into three sub-problems of transmit power allocation, common rate allocation, and STAR-RIS reflection/transmission beamforming, respectively. To obtain the rank-one constrained optimal solution for the sub-problem of optimizing the STAR-RIS reflection/transmission beamforming, a penalty-based successive convex approximation scheme is developed. Moreover, an alternative optimization (AO) algorithm is designed to determine the optimal solution for the sub-problem of optimizing the transmit power allocation, while the original problem is overall solved by a new AO algorithm. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: 17 pages, submitted to journal

arXiv:2311.09117 [pdf, other]

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces

Authors: Heng-Jui Chang, James Glass

Abstract: This paper introduces Robust Spin (R-Spin), a data-efficient domain-specific self-supervision method for speaker and noise-invariant speech representations by learning discrete acoustic units with speaker-invariant clustering (Spin). R-Spin resolves Spin's issues and enhances content representations by learning to predict acoustic pieces. R-Spin offers a 12X reduction in computational resources co… ▽ More This paper introduces Robust Spin (R-Spin), a data-efficient domain-specific self-supervision method for speaker and noise-invariant speech representations by learning discrete acoustic units with speaker-invariant clustering (Spin). R-Spin resolves Spin's issues and enhances content representations by learning to predict acoustic pieces. R-Spin offers a 12X reduction in computational resources compared to previous state-of-the-art methods while outperforming them in severely distorted speech scenarios. This paper provides detailed analyses to show how discrete units contribute to speech encoder training and improving robustness in diverse acoustic environments. △ Less

Submitted 1 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: Accepted to NAACL 2024

arXiv:2311.08439 [pdf, other]

A Unified Approach for Comprehensive Analysis of Various Spectral and Tissue Doppler Echocardiography

Authors: Jaeik Jeon, Jiyeon Kim, Yeonggul Jang, Yeonyee E. Yoon, Dawun Jeong, Youngtaek Hong, Seung-Ah Lee, Hyuk-Jae Chang

Abstract: Doppler echocardiography offers critical insights into cardiac function and phases by quantifying blood flow velocities and evaluating myocardial motion. However, previous methods for automating Doppler analysis, ranging from initial signal processing techniques to advanced deep learning approaches, have been constrained by their reliance on electrocardiogram (ECG) data and their inability to proc… ▽ More Doppler echocardiography offers critical insights into cardiac function and phases by quantifying blood flow velocities and evaluating myocardial motion. However, previous methods for automating Doppler analysis, ranging from initial signal processing techniques to advanced deep learning approaches, have been constrained by their reliance on electrocardiogram (ECG) data and their inability to process Doppler views collectively. We introduce a novel unified framework using a convolutional neural network for comprehensive analysis of spectral and tissue Doppler echocardiography images that combines automatic measurements and end-diastole (ED) detection into a singular method. The network automatically recognizes key features across various Doppler views, with novel Doppler shape embedding and anti-aliasing modules enhancing interpretation and ensuring consistent analysis. Empirical results indicate a consistent outperformance in performance metrics, including dice similarity coefficients (DSC) and intersection over union (IoU). The proposed framework demonstrates strong agreement with clinicians in Doppler automatic measurements and competitive performance in ED detection. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.00518 [pdf]

See SIFT in a Rain

Authors: Wei Wu, Hao Chang, Zhu Li

Abstract: Rain streaks bring complicated pixel intensity changes and additional gradients, greatly obstructing the extraction of image features from background. This causes serious performance degradation in feature-based applications. Thus, it is critical to remove rain streaks from a single rainy image to recover image features. Recently, many excellent image deraining methods have made remarkable progres… ▽ More Rain streaks bring complicated pixel intensity changes and additional gradients, greatly obstructing the extraction of image features from background. This causes serious performance degradation in feature-based applications. Thus, it is critical to remove rain streaks from a single rainy image to recover image features. Recently, many excellent image deraining methods have made remarkable progress. However, these human visual system-driven approaches mainly focus on improving image quality with pixel recovery as loss function, and neglect how to enhance image feature recovery ability. To address this issue, we propose a task-driven image deraining algorithm to strengthen image feature supply for subsequent feature-based applications. Due to the extensive use and strong practicability of Scale-Invariant Feature Transform (SIFT), we first propose two separate networks using distinct losses and modules to achieve two goals, respectively. One is difference of Gaussian (DoG) pyramid recovery network (DPRNet) for SIFT detection, and the other gradients of Gaussian images recovery network (GGIRNet) for SIFT description. Second, in the DPRNet we propose an alternative interest point loss that directly penalizes scale response extrema to recover the DoG pyramid. Third, we advance a gradient attention module in the GGIRNet to recover those gradients of Gaussian images. Finally, with the recovered DoG pyramid and gradients, we can regain SIFT key points. This divide-and-conquer scheme to set different objectives for SIFT detection and description leads to good robustness. Compared with state-of-the-art methods, experimental results demonstrate that our proposed algorithm achieves better performance in both the number of recovered SIFT key points and their accuracy. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: A direct DoG feature pyramid recovery from rainy pixels solution for SIFT detection, accepted by T-CSVT, 2023

Journal ref: IEEE Trans. on Circuits & System for Video Tech., 2023

arXiv:2310.14893 [pdf, other]

Data Drift Monitoring for Log Anomaly Detection Pipelines

Authors: Dipak Wani, Samuel Ackerman, Eitan Farchi, Xiaotong Liu, Hau-wen Chang, Sarasi Lalithsena

Abstract: Logs enable the monitoring of infrastructure status and the performance of associated applications. Logs are also invaluable for diagnosing the root causes of any problems that may arise. Log Anomaly Detection (LAD) pipelines automate the detection of anomalies in logs, providing assistance to site reliability engineers (SREs) in system diagnosis. Log patterns change over time, necessitating updat… ▽ More Logs enable the monitoring of infrastructure status and the performance of associated applications. Logs are also invaluable for diagnosing the root causes of any problems that may arise. Log Anomaly Detection (LAD) pipelines automate the detection of anomalies in logs, providing assistance to site reliability engineers (SREs) in system diagnosis. Log patterns change over time, necessitating updates to the LAD model defining the `normal' log activity profile. In this paper, we introduce a Bayes Factor-based drift detection method that identifies when intervention, retraining, and updating of the LAD model are required with human involvement. We illustrate our method using sequences of log activity, both from unaltered data, and simulated activity with controlled levels of anomaly contamination, based on real collected log data. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.12837 [pdf]

Deep Beamforming for Speech Enhancement and Speaker Localization with an Array Response-Aware Loss Function

Authors: Hsinyu Chang, Yicheng Hsu, Mingsian R. Bai

Abstract: Recent research advances in deep neural network (DNN)-based beamformers have shown great promise for speech enhancement under adverse acoustic conditions. Different network architectures and input features have been explored in estimating beamforming weights. In this paper, we propose a deep beamformer based on an efficient convolutional recurrent network (CRN) trained with a novel ARray RespOnse-… ▽ More Recent research advances in deep neural network (DNN)-based beamformers have shown great promise for speech enhancement under adverse acoustic conditions. Different network architectures and input features have been explored in estimating beamforming weights. In this paper, we propose a deep beamformer based on an efficient convolutional recurrent network (CRN) trained with a novel ARray RespOnse-aWare (ARROW) loss function. The ARROW loss exploits the array responses of the target and interferer by using the ground truth relative transfer functions (RTFs). The DNN-based beamforming system, trained with ARROW loss through supervised learning, is able to perform speech enhancement and speaker localization jointly. Experimental results have shown that the proposed deep beamformer, trained with the linearly weighted scale-invariant source-to-noise ratio (SI-SNR) and ARROW loss functions, achieves superior performance in speech enhancement and speaker localization compared to two baselines. △ Less

Submitted 22 October, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: 6 pages

arXiv:2310.08897 [pdf, other]

Self supervised convolutional kernel based handcrafted feature harmonization: Enhanced left ventricle hypertension disease phenoty** on echocardiography

Authors: **a Lee, Youngtaek Hong, Dawun Jeong, Yeonggul Jang, Jaeik Jeon, Sihyeon Jeong, Taekgeun Jung, Yeonyee E. Yoon, Inki Moon, Seung-Ah Lee, Hyuk-Jae Chang

Abstract: Radiomics, a medical imaging technique, extracts quantitative handcrafted features from images to predict diseases. Harmonization in those features ensures consistent feature extraction across various imaging devices and protocols. Methods for harmonization include standardized imaging protocols, statistical adjustments, and evaluating feature robustness. Myocardial diseases such as Left Ventricul… ▽ More Radiomics, a medical imaging technique, extracts quantitative handcrafted features from images to predict diseases. Harmonization in those features ensures consistent feature extraction across various imaging devices and protocols. Methods for harmonization include standardized imaging protocols, statistical adjustments, and evaluating feature robustness. Myocardial diseases such as Left Ventricular Hypertrophy (LVH) and Hypertensive Heart Disease (HHD) are diagnosed via echocardiography, but variable imaging settings pose challenges. Harmonization techniques are crucial for applying handcrafted features in disease diagnosis in such scenario. Self-supervised learning (SSL) enhances data understanding within limited datasets and adapts to diverse data settings. ConvNeXt-V2 integrates convolutional layers into SSL, displaying superior performance in various tasks. This study focuses on convolutional filters within SSL, using them as preprocessing to convert images into feature maps for handcrafted feature harmonization. Our proposed method excelled in harmonization evaluation and exhibited superior LVH classification performance compared to existing methods. △ Less

Submitted 22 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: 11 pages, 7 figures

arXiv:2309.07707 [pdf, other]

CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders

Authors: Heng-Jui Chang, Ning Dong, Ruslan Mavlyutov, Sravya Popuri, Yu-An Chung

Abstract: Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech recognition and translation tasks. Due to the high cost of develo** these large models, building new encoders for new tasks and deploying them to on-device applications are infeasible. Prior studies propose model compression methods to address this issue, but those works focus on smaller models a… ▽ More Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech recognition and translation tasks. Due to the high cost of develo** these large models, building new encoders for new tasks and deploying them to on-device applications are infeasible. Prior studies propose model compression methods to address this issue, but those works focus on smaller models and less realistic tasks. Thus, we propose Contrastive Layer-to-layer Distillation (CoLLD), a novel knowledge distillation method to compress pre-trained speech encoders by leveraging masked prediction and contrastive learning to train student models to copy the behavior of a large teacher model. CoLLD outperforms prior methods and closes the gap between small and large models on multilingual speech-to-text translation and recognition benchmarks. △ Less

Submitted 27 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: Accepted to ICASSP 2024

arXiv:2308.16483 [pdf, other]

Improving Out-of-Distribution Detection in Echocardiographic View Classication through Enhancing Semantic Features

Authors: Jaeik Jeon, Seongmin Ha, Yeonggul Jang, Yeonyee E. Yoon, Jiyeon Kim, Hyunseok Jeong, Dawun Jeong, Youngtaek Hong, Seung-Ah Lee Hyuk-Jae Chang

Abstract: In echocardiographic view classification, accurately detecting out-of-distribution (OOD) data is essential but challenging, especially given the subtle differences between in-distribution and OOD data. While conventional OOD detection methods, such as Mahalanobis distance (MD) are effective in far-OOD scenarios with clear distinctions between distributions, they struggle to discern the less obviou… ▽ More In echocardiographic view classification, accurately detecting out-of-distribution (OOD) data is essential but challenging, especially given the subtle differences between in-distribution and OOD data. While conventional OOD detection methods, such as Mahalanobis distance (MD) are effective in far-OOD scenarios with clear distinctions between distributions, they struggle to discern the less obvious variations characteristic of echocardiographic data. In this study, we introduce a novel use of label smoothing to enhance semantic feature representation in echocardiographic images, demonstrating that these enriched semantic features are key for significantly improving near-OOD instance detection. By combining label smoothing with MD-based OOD detection, we establish a new benchmark for accuracy in echocardiographic OOD detection. △ Less

Submitted 23 November, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.04156 [pdf, other]

Towards Top-Down Stereo Image Quality Assessment via Stereo Attention

Authors: Huilin Zhang, Sumei Li, Haoxiang Chang, Peiming Lin

Abstract: Stereo image quality assessment (SIQA) plays a crucial role in evaluating and improving the visual experience of 3D content. Existing visual properties-based methods for SIQA have achieved promising performance. However, these approaches ignore the top-down philosophy, leading to a lack of a comprehensive grasp of the human visual system (HVS) and SIQA. This paper presents a novel Stereo AttenTion… ▽ More Stereo image quality assessment (SIQA) plays a crucial role in evaluating and improving the visual experience of 3D content. Existing visual properties-based methods for SIQA have achieved promising performance. However, these approaches ignore the top-down philosophy, leading to a lack of a comprehensive grasp of the human visual system (HVS) and SIQA. This paper presents a novel Stereo AttenTion Network (SATNet), which employs a top-down perspective to guide the quality assessment process. Specifically, our generalized Stereo AttenTion (SAT) structure adapts components and input/output for stereo scenarios. It leverages the fusion-generated attention map as a higher-level binocular modulator to influence two lower-level monocular features, allowing progressive recalibration of both throughout the pipeline. Additionally, we introduce an Energy Coefficient (EC) to flexibly tune the magnitude of binocular response, accounting for the fact that binocular responses in the primate primary visual cortex are less than the sum of monocular responses. To extract the most discriminative quality information from the summation and subtraction of the two branches of monocular features, we utilize a dual-pooling strategy that applies min-pooling and max-pooling operations to the respective branches. Experimental results highlight the superiority of our top-down method in advancing the state-of-the-art in the SIQA field. The code is available at https://github.com/Fanning-Zhang/SATNet. △ Less

Submitted 14 November, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

Comments: 12 pages, 5 figures

arXiv:2307.12126 [pdf, other]

doi 10.1109/TWC.2024.3376332

Optimal preprocessing of WiFi CSI for sensing applications

Authors: Vishnu V. Ratnam, Hao Chen, Hao Hsuan Chang, Abhishek Sehgal, Jianzhong, Zhang

Abstract: Due to its ubiquitous and contact-free nature, the use of WiFi infrastructure for performing sensing tasks has tremendous potential. However, the channel state information (CSI) measured by a WiFi receiver suffers from errors in both its gain and phase, which can significantly hinder sensing tasks. By analyzing these errors from different WiFi receivers, a mathematical model for these gain and pha… ▽ More Due to its ubiquitous and contact-free nature, the use of WiFi infrastructure for performing sensing tasks has tremendous potential. However, the channel state information (CSI) measured by a WiFi receiver suffers from errors in both its gain and phase, which can significantly hinder sensing tasks. By analyzing these errors from different WiFi receivers, a mathematical model for these gain and phase errors is developed in this work. Based on these models, several theoretically justified preprocessing algorithms for correcting such errors at a receiver and, thus, obtaining clean CSI are presented. Simulation results show that at typical system parameters, the developed algorithms for cleaning CSI can reduce noise by $40$% and $200$%, respectively, compared to baseline methods for gain correction and phase correction, without significantly impacting computational cost. The superiority of the proposed methods is also validated in a real-world test bed for respiration rate monitoring (an example sensing task), where they improve the estimation signal-to-noise ratio by $20$% compared to baseline methods. △ Less

Submitted 21 May, 2024; v1 submitted 22 July, 2023; originally announced July 2023.

Comments: Paper is accepted to IEEE Transactions on Wireless Communications

Journal ref: IEEE Transactions on Wireless Communications (2024)

arXiv:2305.17896 [pdf, other]

Continuous and Noninvasive Measurement of Arterial Pulse Pressure and Pressure Waveform using an Image-free Ultrasound System

Authors: Lirui Xu, Pang Wu, Pan Xia, Fanglin Geng, Peng Wang, Xianxiang Chen, Zhenfeng Li, Lidong Du, Shu** Liu, Li Li, Hongbo Chang, Zhen Fang

Abstract: The local beat-to-beat local pulse pressure (PP) and blood pressure waveform of arteries, especially central arteries, are important indicators of the course of cardiovascular diseases (CVDs). Nevertheless, noninvasive measurement of them remains a challenge in the clinic. This work presents a three-element image-free ultrasound system with a low-computational method for real-time measurement of l… ▽ More The local beat-to-beat local pulse pressure (PP) and blood pressure waveform of arteries, especially central arteries, are important indicators of the course of cardiovascular diseases (CVDs). Nevertheless, noninvasive measurement of them remains a challenge in the clinic. This work presents a three-element image-free ultrasound system with a low-computational method for real-time measurement of local pulse wave velocity (PWV) and diameter waveforms, enabling real-time and noninvasive continuous PP and blood pressure waveforms measurement without calibration. The developed system has been well-validated in vitro and in vivo. In in vitro cardiovascular phantom experiments, the results demonstrated high accuracy in the measurement of PP (error < 3 mmHg) and blood pressure waveform (root-mean-square-errors (RMSE) < 2 mmHg, correlation coefficient (r) > textgreater 0.99). In subsequent human carotid experiments, the system was compared with an arterial tonometer, which showed excellent PP accuracy (mean absolute error (MAE) = 3.7 +- 3.4 mmHg) and pressure waveform similarity (RMSE = 3.7 +- 1.6 mmHg, r = 0.98 +- 0.01). Furthermore, comparative experiments with the volume clamp device demonstrated the system's ability to accurately trace blood pressure changes (induced by deep breathing) over a period of one minute, with the MAE of DBP, MAP, and SBP within 5 +- 8 mmHg. The present results demonstrate the accuracy and reliability of the developed system for continuous and noninvasive measurement of arterial PP and blood pressure waveform measurements, with potential applications in the diagnosis and prevention of CVDs. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: 13 pages, 12 figures

arXiv:2305.11237 [pdf, other]

DRL meets DSA Networks: Convergence Analysis and Its Application to System Design

Authors: Ramin Safavinejad, Hao-Hsuan Chang, Lingjia Liu

Abstract: In dynamic spectrum access (DSA) networks, secondary users (SUs) need to opportunistically access primary users' (PUs) radio spectrum without causing significant interference. Since the interaction between the SU and the PU systems are limited, deep reinforcement learning (DRL) has been introduced to help SUs to conduct spectrum access. Specifically, deep recurrent Q network (DRQN) has been utiliz… ▽ More In dynamic spectrum access (DSA) networks, secondary users (SUs) need to opportunistically access primary users' (PUs) radio spectrum without causing significant interference. Since the interaction between the SU and the PU systems are limited, deep reinforcement learning (DRL) has been introduced to help SUs to conduct spectrum access. Specifically, deep recurrent Q network (DRQN) has been utilized in DSA networks for SUs to aggregate the information from the recent experiences to make spectrum access decisions. DRQN is notorious for its sample efficiency in the sense that it needs a rather large number of training data samples to tune its parameters which is a computationally demanding task. In our recent work, deep echo state network (DEQN) has been introduced to DSA networks to address the sample efficiency issue of DRQN. In this paper, we analytically show that DEQN comparatively requires less amount of training samples than DRQN to converge to the best policy. Furthermore, we introduce a method to determine the right hyperparameters for the DEQN providing system design guidance for DEQN-based DSA networks. Extensive performance evaluation confirms that DEQN-based DSA strategy is the superior choice with regard to computational power while outperforming DRQN-based DSA strategies. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.11072 [pdf, other]

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering

Authors: Heng-Jui Chang, Alexander H. Liu, James Glass

Abstract: Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging. We propose speaker-invariant clustering (Spin), a novel self-supervised learning method that clusters speech representations and performs swapped prediction between the original and speaker-perturbed utterances. Spin disentangles speaker… ▽ More Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging. We propose speaker-invariant clustering (Spin), a novel self-supervised learning method that clusters speech representations and performs swapped prediction between the original and speaker-perturbed utterances. Spin disentangles speaker information and preserves content representations with just 45 minutes of fine-tuning on a single GPU. Spin improves pre-trained networks and outperforms prior methods in speech recognition and acoustic unit discovery. △ Less

Submitted 18 May, 2023; originally announced May 2023.

Comments: Accepted to Interspeech 2023

arXiv:2305.04414 [pdf, ps, other]

Untrained Neural Network based Bayesian Detector for OTFS Modulation Systems

Authors: Hao Chang, Alva Kosasih, Wibowo Hardjawana, Xinwei Qu, Branka Vucetic

Abstract: The orthogonal time frequency space (OTFS) symbol detector design for high mobility communication scenarios has received numerous attention lately. Current state-of-the-art OTFS detectors mainly can be divided into two categories; iterative and training-based deep neural network (DNN) detectors. Many practical iterative detectors rely on minimum-mean-square-error (MMSE) denoiser to get the initial… ▽ More The orthogonal time frequency space (OTFS) symbol detector design for high mobility communication scenarios has received numerous attention lately. Current state-of-the-art OTFS detectors mainly can be divided into two categories; iterative and training-based deep neural network (DNN) detectors. Many practical iterative detectors rely on minimum-mean-square-error (MMSE) denoiser to get the initial symbol estimates. However, their computational complexity increases exponentially with the number of detected symbols. Training-based DNN detectors typically suffer from dependency on the availability of large computation resources and the fidelity of synthetic datasets for the training phase, which are both costly. In this paper, we propose an untrained DNN based on the deep image prior (DIP) and decoder architecture, referred to as D-DIP that replaces the MMSE denoiser in the iterative detector. DIP is a type of DNN that requires no training, which makes it beneficial in OTFS detector design. Then we propose to combine the D-DIP denoiser with the Bayesian parallel interference cancellation (BPIC) detector to perform iterative symbol detection, referred to as D-DIP-BPIC. Our simulation results show that the symbol error rate (SER) performance of the proposed D-DIP-BPIC detector outperforms practical state-of-the-art detectors by 0.5 dB and retains low computational complexity. △ Less

Submitted 7 May, 2023; originally announced May 2023.

arXiv:2303.09828 [pdf, other]

Model Reference Gaussian Process Regression: Data-Driven State Feedback Controller

Authors: Hyuntae Kim, Hamin Chang, Hyungbo Shim

Abstract: This paper proposes a data-driven state feedback controller that enables reference tracking for nonlinear discrete-time systems. The controller is designed based on the identified inverse model of the system and a given reference model, assuming that the identification of the inverse model is carried out using only the system's state/input measurements. When its results are provided, we present co… ▽ More This paper proposes a data-driven state feedback controller that enables reference tracking for nonlinear discrete-time systems. The controller is designed based on the identified inverse model of the system and a given reference model, assuming that the identification of the inverse model is carried out using only the system's state/input measurements. When its results are provided, we present conditions that guarantee a certain level of reference tracking performance, regardless of the identification method employed for the inverse model. Specifically, when Gaussian process regression (GPR) is used as the identification method, we propose sufficient conditions for the required data by applying some lemmas related to identification errors to the aforementioned conditions, ensuring that the Model reference-GPR (MR-GPR) controller can guarantee a certain level of reference tracking performance. Finally, an example is provided to demonstrate the effectiveness of the MR-GPR controller. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: 6pages, 3figures, Submitted to LCSS/CDC 2023

arXiv:2302.05811 [pdf, other]

Hierarchical control and learning of a foraging CyberOctopus

Authors: Chia-Hsien Shih, Noel Naughton, Udit Halder, Heng-Sheng Chang, Seung Hyun Kim, Rhanor Gillette, Prashant G. Mehta, Mattia Gazzola

Abstract: Inspired by the unique neurophysiology of the octopus, we propose a hierarchical framework that simplifies the coordination of multiple soft arms by decomposing control into high-level decision making, low-level motor activation, and local reflexive behaviors via sensory feedback. When evaluated in the illustrative problem of a model octopus foraging for food, this hierarchical decomposition resul… ▽ More Inspired by the unique neurophysiology of the octopus, we propose a hierarchical framework that simplifies the coordination of multiple soft arms by decomposing control into high-level decision making, low-level motor activation, and local reflexive behaviors via sensory feedback. When evaluated in the illustrative problem of a model octopus foraging for food, this hierarchical decomposition results in significant improvements relative to end-to-end methods. Performance is achieved through a mixed-modes approach, whereby qualitatively different tasks are addressed via complementary control schemes. Here, model-free reinforcement learning is employed for high-level decision-making, while model-based energy sha** takes care of arm-level motor execution. To render the pairing computationally tenable, a novel neural-network energy sha** (NN-ES) controller is developed, achieving accurate motions with time-to-solutions 200 times faster than previous attempts. Our hierarchical framework is then successfully deployed in increasingly challenging foraging scenarios, including an arena littered with obstacles in 3D space, demonstrating the viability of our approach. △ Less

Submitted 11 February, 2023; originally announced February 2023.

Comments: 16 pages, 7 figures

arXiv:2301.05351 [pdf, other]

Data-driven Moving Horizon Estimation for Angular Velocity of Space Noncooperative Target in Eddy Current De-tumbling Mission

Authors: Xiyao Liu, Haitao Chang, Zhenyu Lu, Panfeng Huang

Abstract: Angular velocity estimation is critical for eddy current de-tumbling of noncooperative space targets. However, unknown model of the noncooperative target and few observation data make the model-based estimation methods challenged. In this paper, a Data-driven Moving Horizon Estimation method is proposed to estimate the angular velocity of the noncooperative target with de-tumbling torque. In this… ▽ More Angular velocity estimation is critical for eddy current de-tumbling of noncooperative space targets. However, unknown model of the noncooperative target and few observation data make the model-based estimation methods challenged. In this paper, a Data-driven Moving Horizon Estimation method is proposed to estimate the angular velocity of the noncooperative target with de-tumbling torque. In this method, model-free state estimation of the angular velocity can be achieved using only one historical trajectory data that satisfies the rank condition. With local linear approximation, the Willems fundamental lemma is extended to nonlinear autonomous systems, and the rank condition for the historical trajectory data is deduced. Then, a data-driven moving horizon estimation algorithm based on the M step Lyapunov function is designed, and the time-discount robust stability of the algorithm is given. In order to illustrate the effectiveness of the proposed algorithm, experiments and simulations are performed to estimate the angular velocity in eddy current de-tumbling with only de-tumbling torque measurement. △ Less

Submitted 12 January, 2023; originally announced January 2023.

arXiv:2211.06619 [pdf, other]

Fast Iterative Algorithms for Blind Phase Retrieval: A survey

Authors: Huibin Chang, Li Yang, Stefano Marchesini

Abstract: In nanoscale imaging technique and ultrafast laser, the reconstruction procedure is normally formulated as a blind phase retrieval (BPR) problem, where one has to recover both the sample and the probe (pupil) jointly from phaseless data. This survey first presents the mathematical formula of BPR, related nonlinear optimization problems and then gives a brief review of the recent iterative algorith… ▽ More In nanoscale imaging technique and ultrafast laser, the reconstruction procedure is normally formulated as a blind phase retrieval (BPR) problem, where one has to recover both the sample and the probe (pupil) jointly from phaseless data. This survey first presents the mathematical formula of BPR, related nonlinear optimization problems and then gives a brief review of the recent iterative algorithms. It mainly consists of three types of algorithms, including the operator-splitting based first-order optimization methods, second order algorithm with Hessian,and subspace methods. The future research directions for experimental issues and theoretical analysis are further discussed. △ Less

Submitted 12 November, 2022; originally announced November 2022.

arXiv:2211.01180 [pdf, other]

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval

Authors: Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-yi Lee, David Harwath

Abstract: This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval. For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin both when training separate models for each language, and with a single model which processes speech in all three languages. We identify key differenc… ▽ More This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval. For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin both when training separate models for each language, and with a single model which processes speech in all three languages. We identify key differences in model behavior and performance between English and non-English settings, attributable to the English-only pre-training of CLIP and HuBERT, and investigate how fine-tuning the pre-trained models impacts these differences. Finally, we show that our models can be used for mono- and cross-lingual speech-text retrieval and cross-lingual speech-speech retrieval, despite never having seen any parallel speech-text or speech-speech data during training. △ Less

Submitted 10 April, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted to ICASSP 2023

arXiv:2210.02494 [pdf, other]

Model Reference Gaussian Process Regression: Data-Driven Output Feedback Controller

Authors: Hyuntae Kim, Hamin Chang, Hyungbo Shim

Abstract: Data-driven controls using Gaussian process regression have recently gained much attention. In such approaches, system identification by Gaussian process regression is mostly followed by model-based controller designs. However, the outcomes of Gaussian process regression are often too complicated to apply conventional control designs, which makes the numerical design such as model predictive contr… ▽ More Data-driven controls using Gaussian process regression have recently gained much attention. In such approaches, system identification by Gaussian process regression is mostly followed by model-based controller designs. However, the outcomes of Gaussian process regression are often too complicated to apply conventional control designs, which makes the numerical design such as model predictive control employed in many cases. To overcome the restriction, our idea is to perform Gaussian process regression to the inverse of the plant with the same input/output data for the conventional regression. With the inverse, one can design a model reference controller without resorting to numerical control methods. This paper considers single-input single-output (SISO) discrete-time nonlinear systems of minimum phase with relative degree one. It is highlighted that the model reference Gaussian process regression controller is designed directly from pre-collected input/output data without system identification. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: 6 pages, 5 figures, submitted to American Control Conference 2023

arXiv:2210.00705 [pdf, other]

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

Authors: Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-yi Lee, David Harwath

Abstract: Data-driven speech processing models usually perform well with a large amount of text supervision, but collecting transcribed speech data is costly. Therefore, we propose SpeechCLIP, a novel framework bridging speech and text through images to enhance speech models without transcriptions. We leverage state-of-the-art pre-trained HuBERT and CLIP, aligning them via paired images and spoken captions… ▽ More Data-driven speech processing models usually perform well with a large amount of text supervision, but collecting transcribed speech data is costly. Therefore, we propose SpeechCLIP, a novel framework bridging speech and text through images to enhance speech models without transcriptions. We leverage state-of-the-art pre-trained HuBERT and CLIP, aligning them via paired images and spoken captions with minimal fine-tuning. SpeechCLIP outperforms prior state-of-the-art on image-speech retrieval and performs zero-shot speech-text retrieval without direct supervision from transcriptions. Moreover, SpeechCLIP can directly retrieve semantically related keywords from speech. △ Less

Submitted 25 October, 2022; v1 submitted 3 October, 2022; originally announced October 2022.

Comments: Accepted to IEEE SLT 2022

arXiv:2209.08630 [pdf, other]

RVSL: Robust Vehicle Similarity Learning in Real Hazy Scenes Based on Semi-supervised Learning

Authors: Wei-Ting Chen, I-Hsiang Chen, Chih-Yuan Yeh, Hao-Hsiang Yang, Hua-En Chang, Jian-Jiun Ding, Sy-Yen Kuo

Abstract: Recently, vehicle similarity learning, also called re-identification (ReID), has attracted significant attention in computer vision. Several algorithms have been developed and obtained considerable success. However, most existing methods have unpleasant performance in the hazy scenario due to poor visibility. Though some strategies are possible to resolve this problem, they still have room to be i… ▽ More Recently, vehicle similarity learning, also called re-identification (ReID), has attracted significant attention in computer vision. Several algorithms have been developed and obtained considerable success. However, most existing methods have unpleasant performance in the hazy scenario due to poor visibility. Though some strategies are possible to resolve this problem, they still have room to be improved due to the limited performance in real-world scenarios and the lack of real-world clear ground truth. Thus, to resolve this problem, inspired by CycleGAN, we construct a training paradigm called \textbf{RVSL} which integrates ReID and domain transformation techniques. The network is trained on semi-supervised fashion and does not require to employ the ID labels and the corresponding clear ground truths to learn hazy vehicle ReID mission in the real-world haze scenes. To further constrain the unsupervised learning process effectively, several losses are developed. Experimental results on synthetic and real-world datasets indicate that the proposed method can achieve state-of-the-art performance on hazy vehicle ReID problems. It is worth mentioning that although the proposed method is trained without real-world label information, it can achieve competitive performance compared to existing supervised methods trained on complete label information. △ Less

Submitted 18 September, 2022; originally announced September 2022.

Comments: Accepted by ECCV 2022

arXiv:2209.04089 [pdf, other]

doi 10.1098/rspa.2022.0593

Energy Sha** Control of a Muscular Octopus Arm Moving in Three Dimensions

Authors: Heng-Sheng Chang, Udit Halder, Chia-Hsien Shih, Noel Naughton, Mattia Gazzola, Prashant G. Mehta

Abstract: Flexible octopus arms exhibit an exceptional ability to coordinate large numbers of degrees of freedom and perform complex manipulation tasks. As a consequence, these systems continue to attract the attention of biologists and roboticists alike. In this paper, we develop a three-dimensional model of a soft octopus arm, equipped with biomechanically realistic muscle actuation. Internal forces and c… ▽ More Flexible octopus arms exhibit an exceptional ability to coordinate large numbers of degrees of freedom and perform complex manipulation tasks. As a consequence, these systems continue to attract the attention of biologists and roboticists alike. In this paper, we develop a three-dimensional model of a soft octopus arm, equipped with biomechanically realistic muscle actuation. Internal forces and couples exerted by all major muscle groups are considered. An energy sha** control method is described to coordinate muscle activity so as to grasp and reach in 3D space. Key contributions of this paper are: (i) modeling of major muscle groups to elicit three-dimensional movements; (ii) a mathematical formulation for muscle activations based on a stored energy function; and (iii) a computationally efficient procedure to design task-specific equilibrium configurations, obtained by solving an optimization problem in the Special Euclidean group SE(3). Muscle controls are then iteratively computed based on the co-state variable arising from the solution of the optimization problem. The approach is numerically demonstrated in the physically accurate software environment Elastica. Results of numerical experiments mimicking observed octopus behaviors are reported. △ Less

Submitted 8 September, 2022; originally announced September 2022.

arXiv:2204.10836 [pdf, other]

doi 10.1038/s41467-022-33407-5

Federated Learning Enables Big Data for Rare Cancer Boundary Detection

Authors: Sarthak Pati, Ujjwal Baid, Brandon Edwards, Micah Sheller, Shih-Han Wang, G Anthony Reina, Patrick Foley, Alexey Gruzdev, Deepthi Karkada, Christos Davatzikos, Chiharu Sako, Satyam Ghodasara, Michel Bilello, Suyash Mohan, Philipp Vollmuth, Gianluca Brugnara, Chandrakanth J Preetha, Felix Sahm, Klaus Maier-Hein, Maximilian Zenk, Martin Bendszus, Wolfgang Wick, Evan Calabrese, Jeffrey Rudie, Javier Villanueva-Meyer , et al. (254 additional authors not shown)

Abstract: Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc… ▽ More Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing. △ Less

Submitted 25 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: federated learning, deep learning, convolutional neural network, segmentation, brain tumor, glioma, glioblastoma, FeTS, BraTS

arXiv:2204.08987 [pdf]

Deep learning based closed-loop optimization of geothermal reservoir production

Authors: Nanzhe Wang, Haibin Chang, Xiangzhao Kong, Martin O. Saar, Dongxiao Zhang

Abstract: To maximize the economic benefits of geothermal energy production, it is essential to optimize geothermal reservoir management strategies, in which geologic uncertainty should be considered. In this work, we propose a closed-loop optimization framework, based on deep learning surrogates, for the well control optimization of geothermal reservoirs. In this framework, we construct a hybrid convolutio… ▽ More To maximize the economic benefits of geothermal energy production, it is essential to optimize geothermal reservoir management strategies, in which geologic uncertainty should be considered. In this work, we propose a closed-loop optimization framework, based on deep learning surrogates, for the well control optimization of geothermal reservoirs. In this framework, we construct a hybrid convolution-recurrent neural network surrogate, which combines the convolution neural network (CNN) and long short-term memory (LSTM) recurrent network. The convolution structure can extract spatial information of geologic parameter fields and the recurrent structure can approximate sequence-to-sequence map**. The trained model can predict time-varying production responses (rate, temperature, etc.) for cases with different permeability fields and well control sequences. In the closed-loop optimization framework, production optimization based on the differential evolution (DE) algorithm, and data assimilation based on the iterative ensemble smoother (IES), are performed alternately to achieve real-time well control optimization and geologic parameter estimation as the production proceeds. In addition, the averaged objective function over the ensemble of geologic parameter estimations is adopted to consider geologic uncertainty in the optimization process. Several geothermal reservoir development cases are designed to test the performance of the proposed production optimization framework. The results show that the proposed framework can achieve efficient and effective real-time optimization and data assimilation in the geothermal reservoir production process. △ Less

Submitted 15 April, 2022; originally announced April 2022.

Comments: 37 pages, 24 figures

arXiv:2203.06849 [pdf, other]

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

Authors: Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

Abstract: Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards in… ▽ More Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards introducing a common benchmark to evaluate pre-trained models across various speech tasks. In this paper, we introduce SUPERB-SG, a new benchmark focused on evaluating the semantic and generative capabilities of pre-trained models by increasing task diversity and difficulty over SUPERB. We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain and quality across different types of tasks. It entails freezing pre-trained model parameters, only using simple task-specific trainable heads. The goal is to be inclusive of all researchers, and encourage efficient use of computational resources. We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: ACL 2022 main conference

arXiv:2203.05043 [pdf, other]

In-Place Rotation for Enhancing Snake-like Robot Mobility

Authors: Alexander H. Chang, Patricio A. Vela

Abstract: Gaits engineered for snake-like robots to rotate in-place instrumentally fill a gap in the set of locomotive gaits that have traditionally prioritized translation. This paper designs a Turn-in-Place gait and demonstrates the ability of a shape-centric modeling framework to capture the gait's locomotive properties. Shape modeling for turning involves a time-varying continuous body curve described b… ▽ More Gaits engineered for snake-like robots to rotate in-place instrumentally fill a gap in the set of locomotive gaits that have traditionally prioritized translation. This paper designs a Turn-in-Place gait and demonstrates the ability of a shape-centric modeling framework to capture the gait's locomotive properties. Shape modeling for turning involves a time-varying continuous body curve described by a standing wave. Presumed viscous robot-ground frictional interactions lead to body dynamics conditioned on the time-varying shape model. The dynamic equations describing the Turn-in-Place gait are validated by an articulated snake-like robot using a physics-based simulator and a physical robot. The results affirm the shape-centric modeling framework's capacity to model a variety of snake-like robot gaits with fundamentally different body-ground contact patterns. As an applied demonstration, example locomotion scenarios partner the shape-centric Turn-in-Place gait with a Rectilinear gait for maneuvering through constrained environments based on a multi-modal locomotive planning strategy. Unified shape-centric modeling facilitates trajectory planning and tracking for a snake-like robot to successfully negotiate non-trivial obstacle configurations. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: 8 pages, 5 figures. Submitted to RA-L (IEEE Robotics and Automation Letters) with IROS 2022 Option

arXiv:2202.13627 [pdf, ps, other]

Changeable Rate and Novel Quantization for CSI Feedback Based on Deep Learning

Authors: Xin Liang, Haoran Chang, Haozhen Li, Xinyu Gu, Lin Zhang

Abstract: Deep learning (DL)-based channel state information (CSI) feedback improves the capacity and energy efficiency of massive multiple-input multiple-output (MIMO) systems in frequency division duplexing mode. However, multiple neural networks with different lengths of feedback overhead are required by time-varying bandwidth resources. The storage space required at the user equipment (UE) and the base… ▽ More Deep learning (DL)-based channel state information (CSI) feedback improves the capacity and energy efficiency of massive multiple-input multiple-output (MIMO) systems in frequency division duplexing mode. However, multiple neural networks with different lengths of feedback overhead are required by time-varying bandwidth resources. The storage space required at the user equipment (UE) and the base station (BS) for these models increases linearly with the number of models. In this paper, we propose a DL-based changeable-rate framework with novel quantization scheme to improve the efficiency and feasibility of CSI feedback systems. This framework can reutilize all the network layers to achieve overhead-changeable CSI feedback to optimize the storage efficiency at the UE and the BS sides. Designed quantizer in this framework can avoid the normalization and gradient problems faced by traditional quantization schemes. Specifically, we propose two DL-based changeable-rate CSI feedback networks CH-CsiNetPro and CH-DualNetSph by introducing a feedback overhead control unit. Then, a pluggable quantization block (PQB) is developed to further improve the encoding efficiency of CSI feedback in an end-to-end way. Compared with existing CSI feedback methods, the proposed framework saves the storage space by about 50% with changeable-rate scheme and improves the encoding efficiency with the quantization module. △ Less

Submitted 28 February, 2022; originally announced February 2022.

arXiv:2202.01946 [pdf, ps, other]

Unsupervised Learning Based Hybrid Beamforming with Low-Resolution Phase Shifters for MU-MIMO Systems

Authors: Chia-Ho Kuo, Hsin-Yuan Chang, Ronald Y. Chang, Wei-Ho Chung

Abstract: Millimeter wave (mmWave) is a key technology for fifth-generation (5G) and beyond communications. Hybrid beamforming has been proposed for large-scale antenna systems in mmWave communications. Existing hybrid beamforming designs based on infinite-resolution phase shifters (PSs) are impractical due to hardware cost and power consumption. In this paper, we propose an unsupervised-learning-based sche… ▽ More Millimeter wave (mmWave) is a key technology for fifth-generation (5G) and beyond communications. Hybrid beamforming has been proposed for large-scale antenna systems in mmWave communications. Existing hybrid beamforming designs based on infinite-resolution phase shifters (PSs) are impractical due to hardware cost and power consumption. In this paper, we propose an unsupervised-learning-based scheme to jointly design the analog precoder and combiner with low-resolution PSs for multiuser multiple-input multiple-output (MU-MIMO) systems. We transform the analog precoder and combiner design problem into a phase classification problem and propose a generic neural network architecture, termed the phase classification network (PCNet), capable of producing solutions of various PS resolutions. Simulation results demonstrate the superior sum-rate and complexity performance of the proposed scheme, as compared to state-of-the-art hybrid beamforming designs for the most commonly used low-resolution PS configurations. △ Less

Submitted 3 February, 2022; originally announced February 2022.

Comments: IEEE International Conference on Communications (ICC) 2022

arXiv:2110.09924 [pdf, ps, other]

Speech Enhancement Based on Cyclegan with Noise-informed Training

Authors: Wen-Yuan Ting, Syu-Siang Wang, Hsin-Li Chang, Borching Su, Yu Tsao

Abstract: Cycle-consistent generative adversarial networks (CycleGAN) were successfully applied to speech enhancement (SE) tasks with unpaired noisy-clean training data. The CycleGAN SE system adopted two generators and two discriminators trained with losses from noisy-to-clean and clean-to-noisy conversions. CycleGAN showed promising results for numerous SE tasks. Herein, we investigate a potential limitat… ▽ More Cycle-consistent generative adversarial networks (CycleGAN) were successfully applied to speech enhancement (SE) tasks with unpaired noisy-clean training data. The CycleGAN SE system adopted two generators and two discriminators trained with losses from noisy-to-clean and clean-to-noisy conversions. CycleGAN showed promising results for numerous SE tasks. Herein, we investigate a potential limitation of the clean-to-noisy conversion part and propose a novel noise-informed training (NIT) approach to improve the performance of the original CycleGAN SE system. The main idea of the NIT approach is to incorporate target domain information for clean-to-noisy conversion to facilitate a better training procedure. The experimental results confirmed that the proposed NIT approach improved the generalization capability of the original CycleGAN SE system with a notable margin. △ Less

Submitted 6 December, 2022; v1 submitted 19 October, 2021; originally announced October 2021.

Journal ref: ISCSLP 2022

arXiv:2110.06142 [pdf, ps, other]

CSI Sensing and Feedback: A Semi-Supervised Learning Approach

Authors: Haozhen Li, Boyuan Zhang, Xin Liang, Haoran Chang, Xinyu Gu, Lin Zhang

Abstract: Deep learning-based (DL-based) channel state information (CSI) feedback for a Massive multiple-input multiple-output (MIMO) system has proved to be a creative and efficient application. However, the existing systems ignored the wireless channel environment variation sensing, e.g., indoor and outdoor scenarios. Moreover, systems training requires excess pre-labeled CSI data, which is often unavaila… ▽ More Deep learning-based (DL-based) channel state information (CSI) feedback for a Massive multiple-input multiple-output (MIMO) system has proved to be a creative and efficient application. However, the existing systems ignored the wireless channel environment variation sensing, e.g., indoor and outdoor scenarios. Moreover, systems training requires excess pre-labeled CSI data, which is often unavailable. In this letter, to address these issues, we first exploit the rationality of introducing semi-supervised learning on CSI feedback, then one semi-supervised CSI sensing and feedback Network ($S^2$CsiNet) with three classifiers comparisons is proposed. Experiment shows that $S^2$CsiNet primarily improves the feasibility of the DL-based CSI feedback system by \textbf{\textit{indoor}} and \textbf{\textit{outdoor}} environment sensing and at most 96.2\% labeled dataset decreasing and secondarily boost the system performance by data distillation and latent information mining. △ Less

Submitted 26 September, 2021; originally announced October 2021.

arXiv:2110.03504 [pdf, other]

Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models

Authors: Liang-Hsuan Tseng, Yu-Kuan Fu, Heng-Jui Chang, Hung-yi Lee

Abstract: Code-switching (CS) is common in daily conversations where more than one language is used within a sentence. The difficulties of CS speech recognition lie in alternating languages and the lack of transcribed data. Therefore, this paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS. We show that hidden representations of SSL mo… ▽ More Code-switching (CS) is common in daily conversations where more than one language is used within a sentence. The difficulties of CS speech recognition lie in alternating languages and the lack of transcribed data. Therefore, this paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS. We show that hidden representations of SSL models offer frame-level language identity even if the models are trained with English speech only. Jointly training CTC and language identification modules with self-supervised speech representations improves CS speech recognition performance. Furthermore, using multilingual speech data for pre-training obtains the best CS speech recognition. △ Less

Submitted 7 October, 2021; originally announced October 2021.

Comments: Submitted to ICASSP 2022

arXiv:2110.01900 [pdf, other]

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

Authors: Heng-Jui Chang, Shu-wen Yang, Hung-yi Lee

Abstract: Self-supervised speech representation learning methods like wav2vec 2.0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks. Despite the success of these methods, they require large memory and high pre-training costs, making them inaccessible for researchers in academia and small companies. Therefore, thi… ▽ More Self-supervised speech representation learning methods like wav2vec 2.0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks. Despite the success of these methods, they require large memory and high pre-training costs, making them inaccessible for researchers in academia and small companies. Therefore, this paper introduces DistilHuBERT, a novel multi-task learning framework to distill hidden representations from a HuBERT model directly. This method reduces HuBERT's size by 75% and 73% faster while retaining most performance in ten different tasks. Moreover, DistilHuBERT required little training time and data, opening the possibilities of pre-training personal and on-device SSL models for speech. △ Less

Submitted 27 April, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: Accepted to ICASSP 2022

arXiv:2108.09499 [pdf]

MITI Minimum Information guidelines for highly multiplexed tissue images

Authors: Denis Schapiro, Clarence Yapp, Artem Sokolov, Sheila M. Reynolds, Yu-An Chen, Damir Sudar, Yubin Xie, Jeremy L. Muhlich, Raquel Arias-Camison, Sarah Arena, Adam J. Taylor, Milen Nikolov, Madison Tyler, Jia-Ren Lin, Erik A. Burlingame, Human Tumor Atlas Network, Young H. Chang, Samouil L Farhi, Vésteinn Thorsson, Nithya Venkatamohan, Julia L. Drewes, Dana Pe'er, David A. Gutman, Markus D. Herrmann, Nils Gehlenborg , et al. (14 additional authors not shown)

Abstract: The imminent release of tissue atlases combining multi-channel microscopy with single cell sequencing and other omics data from normal and diseased specimens creates an urgent need for data and metadata standards that guide data deposition, curation and release. We describe a Minimum Information about highly multiplexed Tissue Imaging (MITI) standard that applies best practices developed for genom… ▽ More The imminent release of tissue atlases combining multi-channel microscopy with single cell sequencing and other omics data from normal and diseased specimens creates an urgent need for data and metadata standards that guide data deposition, curation and release. We describe a Minimum Information about highly multiplexed Tissue Imaging (MITI) standard that applies best practices developed for genomics and other microscopy data to highly multiplexed tissue images and traditional histology. △ Less

Submitted 23 February, 2022; v1 submitted 21 August, 2021; originally announced August 2021.

arXiv:2107.04589 [pdf, other]

ViTGAN: Training GANs with Vision Transformers

Authors: Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, Ce Liu

Abstract: Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such performance can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). For ViT discriminators, we observe that existing regularization methods f… ▽ More Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such performance can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). For ViT discriminators, we observe that existing regularization methods for GANs interact poorly with self-attention, causing serious instability during training. To resolve this issue, we introduce several novel regularization techniques for training GANs with ViTs. For ViT generators, we examine architectural choices for latent and pixel map** layers to facilitate convergence. Empirically, our approach, named ViTGAN, achieves comparable performance to the leading CNN-based GAN models on three datasets: CIFAR-10, CelebA, and LSUN bedroom. △ Less

Submitted 29 May, 2024; v1 submitted 9 July, 2021; originally announced July 2021.

Comments: Accepted to ICLR 2022 (Spotlight)

arXiv:2106.14976 [pdf, other]

Federated Dynamic Spectrum Access

Authors: Yifei Song, Hao-Hsuan Chang, Zhou Zhou, Shashank Jere, Lingjia Liu

Abstract: Due to the growing volume of data traffic produced by the surge of Internet of Things (IoT) devices, the demand for radio spectrum resources is approaching their limitation defined by Federal Communications Commission (FCC). To this end, Dynamic Spectrum Access (DSA) is considered as a promising technology to handle this spectrum scarcity. However, standard DSA techniques often rely on analytical… ▽ More Due to the growing volume of data traffic produced by the surge of Internet of Things (IoT) devices, the demand for radio spectrum resources is approaching their limitation defined by Federal Communications Commission (FCC). To this end, Dynamic Spectrum Access (DSA) is considered as a promising technology to handle this spectrum scarcity. However, standard DSA techniques often rely on analytical modeling wireless networks, making its application intractable in under-measured network environments. Therefore, utilizing neural networks to approximate the network dynamics is an alternative approach. In this article, we introduce a Federated Learning (FL) based framework for the task of DSA, where FL is a distributive machine learning framework that can reserve the privacy of network terminals under heterogeneous data distributions. We discuss the opportunities, challenges, and opening problems of this framework. To evaluate its feasibility, we implement a Multi-Agent Reinforcement Learning (MARL)-based FL as a realization associated with its initial evaluation results. △ Less

Submitted 28 June, 2021; originally announced June 2021.

arXiv:2105.12227 [pdf, other]

Learning a Model-Driven Variational Network for Deformable Image Registration

Authors: Xi Jia, Alexander Thorley, Wei Chen, Huaqi Qiu, Linlin Shen, Iain B Styles, Hyung ** Chang, Ales Leonardis, Antonio de Marvao, Declan P. O'Regan, Daniel Rueckert, **ming Duan

Abstract: Data-driven deep learning approaches to image registration can be less accurate than conventional iterative approaches, especially when training data is limited. To address this whilst retaining the fast inference speed of deep learning, we propose VR-Net, a novel cascaded variational network for unsupervised deformable image registration. Using the variable splitting optimization scheme, we first… ▽ More Data-driven deep learning approaches to image registration can be less accurate than conventional iterative approaches, especially when training data is limited. To address this whilst retaining the fast inference speed of deep learning, we propose VR-Net, a novel cascaded variational network for unsupervised deformable image registration. Using the variable splitting optimization scheme, we first convert the image registration problem, established in a generic variational framework, into two sub-problems, one with a point-wise, closed-form solution while the other one is a denoising problem. We then propose two neural layers (i.e. war** layer and intensity consistency layer) to model the analytical solution and a residual U-Net to formulate the denoising problem (i.e. generalized denoising layer). Finally, we cascade the war** layer, intensity consistency layer, and generalized denoising layer to form the VR-Net. Extensive experiments on three (two 2D and one 3D) cardiac magnetic resonance imaging datasets show that VR-Net outperforms state-of-the-art deep learning methods on registration accuracy, while maintains the fast inference speed of deep learning and the data-efficiency of variational model. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2104.13450 [pdf, other]

Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings

Authors: Innfarn Yoo, Huiwen Chang, Xiyang Luo, Ondrej Stava, Ce Liu, Peyman Milanfar, Feng Yang

Abstract: Digital watermarking is widely used for copyright protection. Traditional 3D watermarking approaches or commercial software are typically designed to embed messages into 3D meshes, and later retrieve the messages directly from distorted/undistorted watermarked 3D meshes. However, in many cases, users only have access to rendered 2D images instead of 3D meshes. Unfortunately, retrieving messages fr… ▽ More Digital watermarking is widely used for copyright protection. Traditional 3D watermarking approaches or commercial software are typically designed to embed messages into 3D meshes, and later retrieve the messages directly from distorted/undistorted watermarked 3D meshes. However, in many cases, users only have access to rendered 2D images instead of 3D meshes. Unfortunately, retrieving messages from 2D renderings of 3D meshes is still challenging and underexplored. We introduce a novel end-to-end learning framework to solve this problem through: 1) an encoder to covertly embed messages in both mesh geometry and textures; 2) a differentiable renderer to render watermarked 3D objects from different camera angles and under varied lighting conditions; 3) a decoder to recover the messages from 2D rendered images. From our experiments, we show that our model can learn to embed information visually imperceptible to humans, and to retrieve the embedded information from 2D renderings that undergo 3D distortions. In addition, we demonstrate that our method can also work with other renderers, such as ray tracers and real-time renderers with and without fine-tuning. △ Less

Submitted 29 March, 2022; v1 submitted 27 April, 2021; originally announced April 2021.

Comments: Accepted by CVPR 2022

arXiv:2104.01616 [pdf, other]

Towards Lifelong Learning of End-to-end ASR

Authors: Heng-Jui Chang, Hung-yi Lee, Lin-shan Lee

Abstract: Automatic speech recognition (ASR) technologies today are primarily optimized for given datasets; thus, any changes in the application environment (e.g., acoustic conditions or topic domains) may inevitably degrade the performance. We can collect new data describing the new environment and fine-tune the system, but this naturally leads to higher error rates for the earlier datasets, referred to as… ▽ More Automatic speech recognition (ASR) technologies today are primarily optimized for given datasets; thus, any changes in the application environment (e.g., acoustic conditions or topic domains) may inevitably degrade the performance. We can collect new data describing the new environment and fine-tune the system, but this naturally leads to higher error rates for the earlier datasets, referred to as catastrophic forgetting. The concept of lifelong learning (LLL) aiming to enable a machine to sequentially learn new tasks from new datasets describing the changing real world without forgetting the previously learned knowledge is thus brought to attention. This paper reports, to our knowledge, the first effort to extensively consider and analyze the use of various approaches of LLL in end-to-end (E2E) ASR, including proposing novel methods in saving data for past domains to mitigate the catastrophic forgetting problem. An overall relative reduction of 28.7% in WER was achieved compared to the fine-tuning baseline when sequentially learning on three very different benchmark corpora. This can be the first step toward the highly desired ASR technologies capable of synchronizing with the continuously changing real world. △ Less

Submitted 2 July, 2021; v1 submitted 4 April, 2021; originally announced April 2021.

Comments: Interspeech 2021. We acknowledge the support of Salesforce Research Deep Learning Grant

arXiv:2010.03368 [pdf, other]

doi 10.1109/CDC45484.2021.9683318

Controlling a CyberOctopus Soft Arm with Muscle-like Actuation

Authors: Heng-Sheng Chang, Udit Halder, Ekaterina Gribkova, Arman Tekinalp, Noel Naughton, Mattia Gazzola, Prashant G. Mehta

Abstract: This paper presents an application of the energy sha** methodology to control a flexible, elastic Cosserat rod model of a single octopus arm. The novel contributions of this work are two-fold: (i) a control-oriented modeling of the anatomically realistic internal muscular architecture of an octopus arm; and (ii) the integration of these muscle models into the energy sha** control methodology.… ▽ More This paper presents an application of the energy sha** methodology to control a flexible, elastic Cosserat rod model of a single octopus arm. The novel contributions of this work are two-fold: (i) a control-oriented modeling of the anatomically realistic internal muscular architecture of an octopus arm; and (ii) the integration of these muscle models into the energy sha** control methodology. The control-oriented modeling takes inspiration in equal parts from theories of nonlinear elasticity and energy sha** control. By introducing a stored energy function for muscles, the difficulties associated with explicitly solving the matching conditions of the energy sha** methodology are avoided. The overall control design problem is posed as a bilevel optimization problem. Its solution is obtained through iterative algorithms. The methodology is numerically implemented and demonstrated in a full-scale dynamic simulation environment Elastica. Two bio-inspired numerical experiments involving the control of octopus arms are reported. △ Less

Submitted 1 April, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

arXiv:2010.01226 [pdf, other]

doi 10.23919/ACC50511.2021.9483284

Optimal Control of a Soft CyberOctopus Arm

Authors: Tixian Wang, Udit Halder, Heng-Sheng Chang, Mattia Gazzola, Prashant G. Mehta

Abstract: In this paper, we use the optimal control methodology to control a flexible, elastic Cosserat rod. An inspiration comes from stereotypical movement patterns in octopus arms, which are observed in a variety of manipulation tasks, such as reaching or fetching. To help uncover the mechanisms underlying these observed morphologies, we outline an optimal control-based framework. A single octopus arm is… ▽ More In this paper, we use the optimal control methodology to control a flexible, elastic Cosserat rod. An inspiration comes from stereotypical movement patterns in octopus arms, which are observed in a variety of manipulation tasks, such as reaching or fetching. To help uncover the mechanisms underlying these observed morphologies, we outline an optimal control-based framework. A single octopus arm is modeled as a Hamiltonian control system, where the continuum mechanics of the arm is modeled after the Cosserat rod theory, and internal, distributed muscle forces and couples are considered as controls. First order necessary optimality conditions are derived for an optimal control problem formulated for this infinite dimensional system. Solutions to this problem are obtained numerically by an iterative forward-backward algorithm. The state and adjoint equations are solved in a dynamic simulation environment, setting the stage for studying a broader class of optimal control problems. Trajectories that minimize control effort are demonstrated and qualitatively compared with observed behaviors. △ Less

Submitted 1 April, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

arXiv:2007.15580 [pdf]

doi 10.1029/2020JB020549

Deep-Learning based Inverse Modeling Approaches: A Subsurface Flow Example

Authors: Nanzhe Wang, Haibin Chang, Dongxiao Zhang

Abstract: Deep-learning has achieved good performance and shown great potential for solving forward and inverse problems. In this work, two categories of innovative deep-learning based inverse modeling methods are proposed and compared. The first category is deep-learning surrogate-based inversion methods, in which the Theory-guided Neural Network (TgNN) is constructed as a deep-learning surrogate for probl… ▽ More Deep-learning has achieved good performance and shown great potential for solving forward and inverse problems. In this work, two categories of innovative deep-learning based inverse modeling methods are proposed and compared. The first category is deep-learning surrogate-based inversion methods, in which the Theory-guided Neural Network (TgNN) is constructed as a deep-learning surrogate for problems with uncertain model parameters. By incorporating physical laws and other constraints, the TgNN surrogate can be constructed with limited simulation runs and accelerate the inversion process significantly. Three TgNN surrogate-based inversion methods are proposed, including the gradient method, the iterative ensemble smoother (IES), and the training method. The second category is direct-deep-learning-inversion methods, in which TgNN constrained with geostatistical information, named TgNN-geo, is proposed for direct inverse modeling. In TgNN-geo, two neural networks are introduced to approximate the respective random model parameters and the solution. Since the prior geostatistical information can be incorporated, the direct-inversion method based on TgNN-geo works well, even in cases with sparse spatial measurements or imprecise prior statistics. Although the proposed deep-learning based inverse modeling methods are general in nature, and thus applicable to a wide variety of problems, they are tested with several subsurface flow problems. It is found that satisfactory results are obtained with a high efficiency. Moreover, both the advantages and disadvantages are further analyzed for the proposed two categories of deep-learning based inversion methods. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Comments: 53 pages, 22 figures, 7 tables

Journal ref: Journal of Geophysical Research: Solid Earth, e2020JB020549, 2020

arXiv:2007.13973 [pdf, other]

Multi-Frequency Multi-Scenario Millimeter Wave MIMO Channel Measurements and Modeling for B5G Wireless Communication Systems

Authors: Jie Huang, Cheng-Xiang Wang, Hengtai Chang, Jian Sun, Xiqi Gao

Abstract: Millimeter wave (mmWave) bands have been utilized for the fifth generation (5G) communication systems and will no doubt continue to be deployed for beyond 5G (B5G). However, the underlying channels are not fully investigated at multifrequency bands and in multi-scenarios by using the same channel sounder, especially for the outdoor, multiple-input multiple-output (MIMO), and vehicle-to-vehicle (V2… ▽ More Millimeter wave (mmWave) bands have been utilized for the fifth generation (5G) communication systems and will no doubt continue to be deployed for beyond 5G (B5G). However, the underlying channels are not fully investigated at multifrequency bands and in multi-scenarios by using the same channel sounder, especially for the outdoor, multiple-input multiple-output (MIMO), and vehicle-to-vehicle (V2V) conditions. In this paper, we conduct multi-frequency multi-scenario mmWave MIMO channel measurements with 4*4 antennas at 28, 32, and 39 GHz bands for three cases, i.e., the human body and vehicle blockage measurements, outdoor path loss measurements, and V2V measurements. The channel characteristics, including blockage effect, path loss and coverage range, and non-stationarity and spatial consistency, are thoroughly studied. The blockage model, path loss model, and time-varying channel model are proposed for mmWave MIMO channels. The channel measurement and modeling results will be of great importance for further mmWave communication system deployments in indoor hotspot, outdoor, and vehicular network scenarios for B5G. △ Less

Submitted 27 July, 2020; originally announced July 2020.

Showing 1–50 of 71 results for author: Chang, H