-
Design of Interacting Particle Systems for Fast and Efficient Reinforcement Learning
Authors:
Anant A Joshi,
Heng-Sheng Chang,
Amirhossein Taghvaei,
Prashant G Mehta,
Sean P. Meyn
Abstract:
This paper is concerned with the design of algorithms based on systems of interacting particles to represent, approximate, and learn the optimal control law for reinforcement learning (RL). The primary contribution of the present paper is to show that convergence rates can be accelerated dramatically through careful design of interactions between particles. Theory focuses on the linear quadratic s…
▽ More
This paper is concerned with the design of algorithms based on systems of interacting particles to represent, approximate, and learn the optimal control law for reinforcement learning (RL). The primary contribution of the present paper is to show that convergence rates can be accelerated dramatically through careful design of interactions between particles. Theory focuses on the linear quadratic stochastic optimal control problem for which a complete and novel theory is presented. Apart from the new algorithm, sample complexity bounds are obtained, and it is shown that the mean square error scales as $1/N$ where $N$ is the number of particles. The theoretical results and algorithms are illustrated with numerical experiments and comparisons with other recent approaches, where the faster convergence of the proposed algorithm is numerically demonstrated.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Predicting the risk of early-stage breast cancer recurrence using H\&E-stained tissue images
Authors:
Geongyu Lee,
Joonho Lee,
Tae-Yeong Kwak,
Sun Woo Kim,
Youngmee Kwon,
Chungyeul Kim,
Hyeyoon Chang
Abstract:
Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology. A total of 125 hematoxylin and eosin stained breast cancer whole slide images la…
▽ More
Accurate prediction of the likelihood of recurrence is important in the selection of postoperative treatment for patients with early-stage breast cancer. In this study, we investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology. A total of 125 hematoxylin and eosin stained breast cancer whole slide images labeled with the risk prediction via genomics assays were used, and we obtained sensitivity of 0.857, 0.746, and 0.529 for predicting low, intermediate, and high risk, and specificity of 0.816, 0.803, and 0.972. When compared to the expert pathologist's regional histology grade information, a Pearson's correlation coefficient of 0.61 was obtained. When we checked the model learned through these studies through the class activation map, we found that it actually considered tubule formation and mitotic rate when predicting different risk groups.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Reach-Avoid Control Synthesis for a Quadrotor UAV with Formal Safety Guarantees
Authors:
Mohamed Serry,
Haocheng Chang,
Jun Liu
Abstract:
Reach-avoid specifications are one of the most common tasks in autonomous aerial vehicle (UAV) applications. Despite the intensive research and development associated with control of aerial vehicles, generating feasible trajectories though complex environments and tracking them with formal safety guarantees remain challenging. In this paper, we propose a control framework for a quadrotor UAV that…
▽ More
Reach-avoid specifications are one of the most common tasks in autonomous aerial vehicle (UAV) applications. Despite the intensive research and development associated with control of aerial vehicles, generating feasible trajectories though complex environments and tracking them with formal safety guarantees remain challenging. In this paper, we propose a control framework for a quadrotor UAV that enables accomplishing reach-avoid tasks with formal safety guarantees. In this proposed framework, we integrate geometric control theory for tracking and polynomial trajectory generation using Bezier curves, where tracking errors are accounted for in the trajectory synthesis process. To estimate the tracking errors, we revisit the stability analysis of the closed-loop quadrotor system, when geometric control is implemented. We show that the tracking error dynamics exhibit local exponential stability when geometric control is implemented with any positive control gains, and we derive tight uniform bounds of the tracking error. We also introduce sufficient conditions to be imposed on the desired trajectory utilizing the derived uniform bounds to ensure the well-definedness of the closed-loop system. For the trajectory synthesis, we present an efficient algorithm that enables constructing a safe tube by means of sampling-based planning and safe hyper-rectangular set computations. Then, we compute the trajectory, given as a piecewise continuous Bezier curve, through the safe tube, where a heuristic efficient approach that utilizes iterative linear programming is employed. We present extensive numerical simulations with a cluttered environment to illustrate the effectiveness of the proposed framework in reach-avoid planning scenarios.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
A Large-Scale Evaluation of Speech Foundation Models
Authors:
Shu-wen Yang,
Heng-Jui Chang,
Zili Huang,
Andy T. Liu,
Cheng-I Lai,
Haibin Wu,
Jiatong Shi,
Xuankai Chang,
Hsiang-Sheng Tsai,
Wen-Chin Huang,
Tzu-hsun Feng,
Po-Han Chi,
Yist Y. Lin,
Yung-Sung Chuang,
Tzu-Hsien Huang,
Wei-Cheng Tseng,
Kushal Lakhotia,
Shang-Wen Li,
Abdelrahman Mohamed,
Shinji Watanabe,
Hung-yi Lee
Abstract:
The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,…
▽ More
The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work, we establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for speech. We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads. Combining our results with community submissions, we verify that the foundation model paradigm is promising for speech, and our multi-tasking framework is simple yet effective, as the best-performing foundation model shows competitive generalizability across most SUPERB tasks. For reproducibility and extensibility, we have developed a long-term maintained platform that enables deterministic benchmarking, allows for result sharing via an online leaderboard, and promotes collaboration through a community-driven benchmark database to support new development cycles. Finally, we conduct a series of analyses to offer an in-depth understanding of SUPERB and speech foundation models, including information flows across tasks inside the models, the correctness of the weighted-sum benchmarking protocol and the statistical significance and robustness of the benchmark.
△ Less
Submitted 29 May, 2024; v1 submitted 14 April, 2024;
originally announced April 2024.
-
Graph-based Untrained Neural Network Detector for OTFS Systems
Authors:
Hao Chang,
Branka Vucetic,
Wibowo Hardjawana
Abstract:
Inter-carrier interference (ICI) caused by mobile reflectors significantly degrades the conventional orthogonal frequency division multiplexing (OFDM) performance in high-mobility environments. The orthogonal time frequency space (OTFS) modulation system effectively represents ICI in the delay-Doppler domain, thus significantly outperforming OFDM. Existing iterative and neural network (NN) based O…
▽ More
Inter-carrier interference (ICI) caused by mobile reflectors significantly degrades the conventional orthogonal frequency division multiplexing (OFDM) performance in high-mobility environments. The orthogonal time frequency space (OTFS) modulation system effectively represents ICI in the delay-Doppler domain, thus significantly outperforming OFDM. Existing iterative and neural network (NN) based OTFS detectors suffer from high complex matrix operations and performance degradation in untrained environments, where the real wireless channel does not match the one used in the training, which often happens in real wireless networks. In this paper, we propose to embed the prior knowledge of interference extracted from the estimated channel state information (CSI) as a directed graph into a decoder untrained neural network (DUNN), namely graph-based DUNN (GDUNN). We then combine it with Bayesian parallel interference cancellation (BPIC) for OTFS symbol detection, resulting in GDUNN-BPIC. Simulation results show that the proposed GDUNN-BPIC outperforms state-of-the-art OTFS detectors under imperfect CSI.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
Authors:
Hsuan-Fu Wang,
Yi-Jen Shih,
Heng-Jui Chang,
Layne Berry,
Puyuan Peng,
Hung-yi Lee,
Hsin-Min Wang,
David Harwath
Abstract:
The recently proposed visually grounded speech model SpeechCLIP is an innovative framework that bridges speech and text through images via CLIP without relying on text transcription. On this basis, this paper introduces two extensions to SpeechCLIP. First, we apply the Continuous Integrate-and-Fire (CIF) module to replace a fixed number of CLS tokens in the cascaded architecture. Second, we propos…
▽ More
The recently proposed visually grounded speech model SpeechCLIP is an innovative framework that bridges speech and text through images via CLIP without relying on text transcription. On this basis, this paper introduces two extensions to SpeechCLIP. First, we apply the Continuous Integrate-and-Fire (CIF) module to replace a fixed number of CLS tokens in the cascaded architecture. Second, we propose a new hybrid architecture that merges the cascaded and parallel architectures of SpeechCLIP into a multi-task learning framework. Our experimental evaluation is performed on the Flickr8k and SpokenCOCO datasets. The results show that in the speech keyword extraction task, the CIF-based cascaded SpeechCLIP model outperforms the previous cascaded SpeechCLIP model using a fixed number of CLS tokens. Furthermore, through our hybrid architecture, cascaded task learning boosts the performance of the parallel branch in image-speech retrieval tasks.
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
Covert Communications in STAR-RIS-Aided Rate-Splitting Multiple Access Systems
Authors:
Heng Chang,
Hai Yang,
Shuobo Xu,
Xiyu Pang,
Hongwu Liu
Abstract:
In this paper, we investigate covert communications in a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)-aided rate-splitting multiple access (RSMA) system. Under the RSMA principles, the messages for the covert user (Bob) and public user (Grace) are converted to the common and private streams at the legitimate transmitter (Alice) to realize downlink transm…
▽ More
In this paper, we investigate covert communications in a simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)-aided rate-splitting multiple access (RSMA) system. Under the RSMA principles, the messages for the covert user (Bob) and public user (Grace) are converted to the common and private streams at the legitimate transmitter (Alice) to realize downlink transmissions, while the STAR-RIS is deployed not only to aid the public transmissions from Alice to Grace, but also to shield the covert transmissions from Alice to Bob against the warden (Willie). To characterize the covert performance of the considered STAR-RIS-aided RSMA (STAR-RIS-RSMA) system, we derive analytical expression for the minimum average detection error probability of Willie, based on which a covert rate maximization problem is formulated. To maximize Bob's covert rate while confusing Willie's monitoring, the transmit power allocation, common rate allocation, and STAR-RIS reflection/transmission beamforming are jointly optimized subject to Grace's quality of service (QoS) requirements. The non-convex covert rate maximization problem, consisting of highly coupled system parameters are decoupled into three sub-problems of transmit power allocation, common rate allocation, and STAR-RIS reflection/transmission beamforming, respectively. To obtain the rank-one constrained optimal solution for the sub-problem of optimizing the STAR-RIS reflection/transmission beamforming, a penalty-based successive convex approximation scheme is developed. Moreover, an alternative optimization (AO) algorithm is designed to determine the optimal solution for the sub-problem of optimizing the transmit power allocation, while the original problem is overall solved by a new AO algorithm.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces
Authors:
Heng-Jui Chang,
James Glass
Abstract:
This paper introduces Robust Spin (R-Spin), a data-efficient domain-specific self-supervision method for speaker and noise-invariant speech representations by learning discrete acoustic units with speaker-invariant clustering (Spin). R-Spin resolves Spin's issues and enhances content representations by learning to predict acoustic pieces. R-Spin offers a 12X reduction in computational resources co…
▽ More
This paper introduces Robust Spin (R-Spin), a data-efficient domain-specific self-supervision method for speaker and noise-invariant speech representations by learning discrete acoustic units with speaker-invariant clustering (Spin). R-Spin resolves Spin's issues and enhances content representations by learning to predict acoustic pieces. R-Spin offers a 12X reduction in computational resources compared to previous state-of-the-art methods while outperforming them in severely distorted speech scenarios. This paper provides detailed analyses to show how discrete units contribute to speech encoder training and improving robustness in diverse acoustic environments.
△ Less
Submitted 1 April, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
A Unified Approach for Comprehensive Analysis of Various Spectral and Tissue Doppler Echocardiography
Authors:
Jaeik Jeon,
Jiyeon Kim,
Yeonggul Jang,
Yeonyee E. Yoon,
Dawun Jeong,
Youngtaek Hong,
Seung-Ah Lee,
Hyuk-Jae Chang
Abstract:
Doppler echocardiography offers critical insights into cardiac function and phases by quantifying blood flow velocities and evaluating myocardial motion. However, previous methods for automating Doppler analysis, ranging from initial signal processing techniques to advanced deep learning approaches, have been constrained by their reliance on electrocardiogram (ECG) data and their inability to proc…
▽ More
Doppler echocardiography offers critical insights into cardiac function and phases by quantifying blood flow velocities and evaluating myocardial motion. However, previous methods for automating Doppler analysis, ranging from initial signal processing techniques to advanced deep learning approaches, have been constrained by their reliance on electrocardiogram (ECG) data and their inability to process Doppler views collectively. We introduce a novel unified framework using a convolutional neural network for comprehensive analysis of spectral and tissue Doppler echocardiography images that combines automatic measurements and end-diastole (ED) detection into a singular method. The network automatically recognizes key features across various Doppler views, with novel Doppler shape embedding and anti-aliasing modules enhancing interpretation and ensuring consistent analysis. Empirical results indicate a consistent outperformance in performance metrics, including dice similarity coefficients (DSC) and intersection over union (IoU). The proposed framework demonstrates strong agreement with clinicians in Doppler automatic measurements and competitive performance in ED detection.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
See SIFT in a Rain
Authors:
Wei Wu,
Hao Chang,
Zhu Li
Abstract:
Rain streaks bring complicated pixel intensity changes and additional gradients, greatly obstructing the extraction of image features from background. This causes serious performance degradation in feature-based applications. Thus, it is critical to remove rain streaks from a single rainy image to recover image features. Recently, many excellent image deraining methods have made remarkable progres…
▽ More
Rain streaks bring complicated pixel intensity changes and additional gradients, greatly obstructing the extraction of image features from background. This causes serious performance degradation in feature-based applications. Thus, it is critical to remove rain streaks from a single rainy image to recover image features. Recently, many excellent image deraining methods have made remarkable progress. However, these human visual system-driven approaches mainly focus on improving image quality with pixel recovery as loss function, and neglect how to enhance image feature recovery ability. To address this issue, we propose a task-driven image deraining algorithm to strengthen image feature supply for subsequent feature-based applications. Due to the extensive use and strong practicability of Scale-Invariant Feature Transform (SIFT), we first propose two separate networks using distinct losses and modules to achieve two goals, respectively. One is difference of Gaussian (DoG) pyramid recovery network (DPRNet) for SIFT detection, and the other gradients of Gaussian images recovery network (GGIRNet) for SIFT description. Second, in the DPRNet we propose an alternative interest point loss that directly penalizes scale response extrema to recover the DoG pyramid. Third, we advance a gradient attention module in the GGIRNet to recover those gradients of Gaussian images. Finally, with the recovered DoG pyramid and gradients, we can regain SIFT key points. This divide-and-conquer scheme to set different objectives for SIFT detection and description leads to good robustness. Compared with state-of-the-art methods, experimental results demonstrate that our proposed algorithm achieves better performance in both the number of recovered SIFT key points and their accuracy.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Data Drift Monitoring for Log Anomaly Detection Pipelines
Authors:
Dipak Wani,
Samuel Ackerman,
Eitan Farchi,
Xiaotong Liu,
Hau-wen Chang,
Sarasi Lalithsena
Abstract:
Logs enable the monitoring of infrastructure status and the performance of associated applications. Logs are also invaluable for diagnosing the root causes of any problems that may arise. Log Anomaly Detection (LAD) pipelines automate the detection of anomalies in logs, providing assistance to site reliability engineers (SREs) in system diagnosis. Log patterns change over time, necessitating updat…
▽ More
Logs enable the monitoring of infrastructure status and the performance of associated applications. Logs are also invaluable for diagnosing the root causes of any problems that may arise. Log Anomaly Detection (LAD) pipelines automate the detection of anomalies in logs, providing assistance to site reliability engineers (SREs) in system diagnosis. Log patterns change over time, necessitating updates to the LAD model defining the `normal' log activity profile. In this paper, we introduce a Bayes Factor-based drift detection method that identifies when intervention, retraining, and updating of the LAD model are required with human involvement. We illustrate our method using sequences of log activity, both from unaltered data, and simulated activity with controlled levels of anomaly contamination, based on real collected log data.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Deep Beamforming for Speech Enhancement and Speaker Localization with an Array Response-Aware Loss Function
Authors:
Hsinyu Chang,
Yicheng Hsu,
Mingsian R. Bai
Abstract:
Recent research advances in deep neural network (DNN)-based beamformers have shown great promise for speech enhancement under adverse acoustic conditions. Different network architectures and input features have been explored in estimating beamforming weights. In this paper, we propose a deep beamformer based on an efficient convolutional recurrent network (CRN) trained with a novel ARray RespOnse-…
▽ More
Recent research advances in deep neural network (DNN)-based beamformers have shown great promise for speech enhancement under adverse acoustic conditions. Different network architectures and input features have been explored in estimating beamforming weights. In this paper, we propose a deep beamformer based on an efficient convolutional recurrent network (CRN) trained with a novel ARray RespOnse-aWare (ARROW) loss function. The ARROW loss exploits the array responses of the target and interferer by using the ground truth relative transfer functions (RTFs). The DNN-based beamforming system, trained with ARROW loss through supervised learning, is able to perform speech enhancement and speaker localization jointly. Experimental results have shown that the proposed deep beamformer, trained with the linearly weighted scale-invariant source-to-noise ratio (SI-SNR) and ARROW loss functions, achieves superior performance in speech enhancement and speaker localization compared to two baselines.
△ Less
Submitted 22 October, 2023; v1 submitted 19 October, 2023;
originally announced October 2023.
-
Self supervised convolutional kernel based handcrafted feature harmonization: Enhanced left ventricle hypertension disease phenoty** on echocardiography
Authors:
**a Lee,
Youngtaek Hong,
Dawun Jeong,
Yeonggul Jang,
Jaeik Jeon,
Sihyeon Jeong,
Taekgeun Jung,
Yeonyee E. Yoon,
Inki Moon,
Seung-Ah Lee,
Hyuk-Jae Chang
Abstract:
Radiomics, a medical imaging technique, extracts quantitative handcrafted features from images to predict diseases. Harmonization in those features ensures consistent feature extraction across various imaging devices and protocols. Methods for harmonization include standardized imaging protocols, statistical adjustments, and evaluating feature robustness. Myocardial diseases such as Left Ventricul…
▽ More
Radiomics, a medical imaging technique, extracts quantitative handcrafted features from images to predict diseases. Harmonization in those features ensures consistent feature extraction across various imaging devices and protocols. Methods for harmonization include standardized imaging protocols, statistical adjustments, and evaluating feature robustness. Myocardial diseases such as Left Ventricular Hypertrophy (LVH) and Hypertensive Heart Disease (HHD) are diagnosed via echocardiography, but variable imaging settings pose challenges. Harmonization techniques are crucial for applying handcrafted features in disease diagnosis in such scenario. Self-supervised learning (SSL) enhances data understanding within limited datasets and adapts to diverse data settings. ConvNeXt-V2 integrates convolutional layers into SSL, displaying superior performance in various tasks. This study focuses on convolutional filters within SSL, using them as preprocessing to convert images into feature maps for handcrafted feature harmonization. Our proposed method excelled in harmonization evaluation and exhibited superior LVH classification performance compared to existing methods.
△ Less
Submitted 22 November, 2023; v1 submitted 13 October, 2023;
originally announced October 2023.
-
CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders
Authors:
Heng-Jui Chang,
Ning Dong,
Ruslan Mavlyutov,
Sravya Popuri,
Yu-An Chung
Abstract:
Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech recognition and translation tasks. Due to the high cost of develo** these large models, building new encoders for new tasks and deploying them to on-device applications are infeasible. Prior studies propose model compression methods to address this issue, but those works focus on smaller models a…
▽ More
Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech recognition and translation tasks. Due to the high cost of develo** these large models, building new encoders for new tasks and deploying them to on-device applications are infeasible. Prior studies propose model compression methods to address this issue, but those works focus on smaller models and less realistic tasks. Thus, we propose Contrastive Layer-to-layer Distillation (CoLLD), a novel knowledge distillation method to compress pre-trained speech encoders by leveraging masked prediction and contrastive learning to train student models to copy the behavior of a large teacher model. CoLLD outperforms prior methods and closes the gap between small and large models on multilingual speech-to-text translation and recognition benchmarks.
△ Less
Submitted 27 December, 2023; v1 submitted 14 September, 2023;
originally announced September 2023.
-
Improving Out-of-Distribution Detection in Echocardiographic View Classication through Enhancing Semantic Features
Authors:
Jaeik Jeon,
Seongmin Ha,
Yeonggul Jang,
Yeonyee E. Yoon,
Jiyeon Kim,
Hyunseok Jeong,
Dawun Jeong,
Youngtaek Hong,
Seung-Ah Lee Hyuk-Jae Chang
Abstract:
In echocardiographic view classification, accurately detecting out-of-distribution (OOD) data is essential but challenging, especially given the subtle differences between in-distribution and OOD data. While conventional OOD detection methods, such as Mahalanobis distance (MD) are effective in far-OOD scenarios with clear distinctions between distributions, they struggle to discern the less obviou…
▽ More
In echocardiographic view classification, accurately detecting out-of-distribution (OOD) data is essential but challenging, especially given the subtle differences between in-distribution and OOD data. While conventional OOD detection methods, such as Mahalanobis distance (MD) are effective in far-OOD scenarios with clear distinctions between distributions, they struggle to discern the less obvious variations characteristic of echocardiographic data. In this study, we introduce a novel use of label smoothing to enhance semantic feature representation in echocardiographic images, demonstrating that these enriched semantic features are key for significantly improving near-OOD instance detection. By combining label smoothing with MD-based OOD detection, we establish a new benchmark for accuracy in echocardiographic OOD detection.
△ Less
Submitted 23 November, 2023; v1 submitted 31 August, 2023;
originally announced August 2023.
-
Towards Top-Down Stereo Image Quality Assessment via Stereo Attention
Authors:
Huilin Zhang,
Sumei Li,
Haoxiang Chang,
Peiming Lin
Abstract:
Stereo image quality assessment (SIQA) plays a crucial role in evaluating and improving the visual experience of 3D content. Existing visual properties-based methods for SIQA have achieved promising performance. However, these approaches ignore the top-down philosophy, leading to a lack of a comprehensive grasp of the human visual system (HVS) and SIQA. This paper presents a novel Stereo AttenTion…
▽ More
Stereo image quality assessment (SIQA) plays a crucial role in evaluating and improving the visual experience of 3D content. Existing visual properties-based methods for SIQA have achieved promising performance. However, these approaches ignore the top-down philosophy, leading to a lack of a comprehensive grasp of the human visual system (HVS) and SIQA. This paper presents a novel Stereo AttenTion Network (SATNet), which employs a top-down perspective to guide the quality assessment process. Specifically, our generalized Stereo AttenTion (SAT) structure adapts components and input/output for stereo scenarios. It leverages the fusion-generated attention map as a higher-level binocular modulator to influence two lower-level monocular features, allowing progressive recalibration of both throughout the pipeline. Additionally, we introduce an Energy Coefficient (EC) to flexibly tune the magnitude of binocular response, accounting for the fact that binocular responses in the primate primary visual cortex are less than the sum of monocular responses. To extract the most discriminative quality information from the summation and subtraction of the two branches of monocular features, we utilize a dual-pooling strategy that applies min-pooling and max-pooling operations to the respective branches. Experimental results highlight the superiority of our top-down method in advancing the state-of-the-art in the SIQA field. The code is available at https://github.com/Fanning-Zhang/SATNet.
△ Less
Submitted 14 November, 2023; v1 submitted 8 August, 2023;
originally announced August 2023.
-
Optimal preprocessing of WiFi CSI for sensing applications
Authors:
Vishnu V. Ratnam,
Hao Chen,
Hao Hsuan Chang,
Abhishek Sehgal,
Jianzhong,
Zhang
Abstract:
Due to its ubiquitous and contact-free nature, the use of WiFi infrastructure for performing sensing tasks has tremendous potential. However, the channel state information (CSI) measured by a WiFi receiver suffers from errors in both its gain and phase, which can significantly hinder sensing tasks. By analyzing these errors from different WiFi receivers, a mathematical model for these gain and pha…
▽ More
Due to its ubiquitous and contact-free nature, the use of WiFi infrastructure for performing sensing tasks has tremendous potential. However, the channel state information (CSI) measured by a WiFi receiver suffers from errors in both its gain and phase, which can significantly hinder sensing tasks. By analyzing these errors from different WiFi receivers, a mathematical model for these gain and phase errors is developed in this work. Based on these models, several theoretically justified preprocessing algorithms for correcting such errors at a receiver and, thus, obtaining clean CSI are presented. Simulation results show that at typical system parameters, the developed algorithms for cleaning CSI can reduce noise by $40$% and $200$%, respectively, compared to baseline methods for gain correction and phase correction, without significantly impacting computational cost. The superiority of the proposed methods is also validated in a real-world test bed for respiration rate monitoring (an example sensing task), where they improve the estimation signal-to-noise ratio by $20$% compared to baseline methods.
△ Less
Submitted 21 May, 2024; v1 submitted 22 July, 2023;
originally announced July 2023.
-
Continuous and Noninvasive Measurement of Arterial Pulse Pressure and Pressure Waveform using an Image-free Ultrasound System
Authors:
Lirui Xu,
Pang Wu,
Pan Xia,
Fanglin Geng,
Peng Wang,
Xianxiang Chen,
Zhenfeng Li,
Lidong Du,
Shu** Liu,
Li Li,
Hongbo Chang,
Zhen Fang
Abstract:
The local beat-to-beat local pulse pressure (PP) and blood pressure waveform of arteries, especially central arteries, are important indicators of the course of cardiovascular diseases (CVDs). Nevertheless, noninvasive measurement of them remains a challenge in the clinic. This work presents a three-element image-free ultrasound system with a low-computational method for real-time measurement of l…
▽ More
The local beat-to-beat local pulse pressure (PP) and blood pressure waveform of arteries, especially central arteries, are important indicators of the course of cardiovascular diseases (CVDs). Nevertheless, noninvasive measurement of them remains a challenge in the clinic. This work presents a three-element image-free ultrasound system with a low-computational method for real-time measurement of local pulse wave velocity (PWV) and diameter waveforms, enabling real-time and noninvasive continuous PP and blood pressure waveforms measurement without calibration. The developed system has been well-validated in vitro and in vivo. In in vitro cardiovascular phantom experiments, the results demonstrated high accuracy in the measurement of PP (error < 3 mmHg) and blood pressure waveform (root-mean-square-errors (RMSE) < 2 mmHg, correlation coefficient (r) > textgreater 0.99). In subsequent human carotid experiments, the system was compared with an arterial tonometer, which showed excellent PP accuracy (mean absolute error (MAE) = 3.7 +- 3.4 mmHg) and pressure waveform similarity (RMSE = 3.7 +- 1.6 mmHg, r = 0.98 +- 0.01). Furthermore, comparative experiments with the volume clamp device demonstrated the system's ability to accurately trace blood pressure changes (induced by deep breathing) over a period of one minute, with the MAE of DBP, MAP, and SBP within 5 +- 8 mmHg. The present results demonstrate the accuracy and reliability of the developed system for continuous and noninvasive measurement of arterial PP and blood pressure waveform measurements, with potential applications in the diagnosis and prevention of CVDs.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
DRL meets DSA Networks: Convergence Analysis and Its Application to System Design
Authors:
Ramin Safavinejad,
Hao-Hsuan Chang,
Lingjia Liu
Abstract:
In dynamic spectrum access (DSA) networks, secondary users (SUs) need to opportunistically access primary users' (PUs) radio spectrum without causing significant interference. Since the interaction between the SU and the PU systems are limited, deep reinforcement learning (DRL) has been introduced to help SUs to conduct spectrum access. Specifically, deep recurrent Q network (DRQN) has been utiliz…
▽ More
In dynamic spectrum access (DSA) networks, secondary users (SUs) need to opportunistically access primary users' (PUs) radio spectrum without causing significant interference. Since the interaction between the SU and the PU systems are limited, deep reinforcement learning (DRL) has been introduced to help SUs to conduct spectrum access. Specifically, deep recurrent Q network (DRQN) has been utilized in DSA networks for SUs to aggregate the information from the recent experiences to make spectrum access decisions. DRQN is notorious for its sample efficiency in the sense that it needs a rather large number of training data samples to tune its parameters which is a computationally demanding task. In our recent work, deep echo state network (DEQN) has been introduced to DSA networks to address the sample efficiency issue of DRQN. In this paper, we analytically show that DEQN comparatively requires less amount of training samples than DRQN to converge to the best policy. Furthermore, we introduce a method to determine the right hyperparameters for the DEQN providing system design guidance for DEQN-based DSA networks. Extensive performance evaluation confirms that DEQN-based DSA strategy is the superior choice with regard to computational power while outperforming DRQN-based DSA strategies.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering
Authors:
Heng-Jui Chang,
Alexander H. Liu,
James Glass
Abstract:
Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging. We propose speaker-invariant clustering (Spin), a novel self-supervised learning method that clusters speech representations and performs swapped prediction between the original and speaker-perturbed utterances. Spin disentangles speaker…
▽ More
Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging. We propose speaker-invariant clustering (Spin), a novel self-supervised learning method that clusters speech representations and performs swapped prediction between the original and speaker-perturbed utterances. Spin disentangles speaker information and preserves content representations with just 45 minutes of fine-tuning on a single GPU. Spin improves pre-trained networks and outperforms prior methods in speech recognition and acoustic unit discovery.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Untrained Neural Network based Bayesian Detector for OTFS Modulation Systems
Authors:
Hao Chang,
Alva Kosasih,
Wibowo Hardjawana,
Xinwei Qu,
Branka Vucetic
Abstract:
The orthogonal time frequency space (OTFS) symbol detector design for high mobility communication scenarios has received numerous attention lately. Current state-of-the-art OTFS detectors mainly can be divided into two categories; iterative and training-based deep neural network (DNN) detectors. Many practical iterative detectors rely on minimum-mean-square-error (MMSE) denoiser to get the initial…
▽ More
The orthogonal time frequency space (OTFS) symbol detector design for high mobility communication scenarios has received numerous attention lately. Current state-of-the-art OTFS detectors mainly can be divided into two categories; iterative and training-based deep neural network (DNN) detectors. Many practical iterative detectors rely on minimum-mean-square-error (MMSE) denoiser to get the initial symbol estimates. However, their computational complexity increases exponentially with the number of detected symbols. Training-based DNN detectors typically suffer from dependency on the availability of large computation resources and the fidelity of synthetic datasets for the training phase, which are both costly. In this paper, we propose an untrained DNN based on the deep image prior (DIP) and decoder architecture, referred to as D-DIP that replaces the MMSE denoiser in the iterative detector. DIP is a type of DNN that requires no training, which makes it beneficial in OTFS detector design. Then we propose to combine the D-DIP denoiser with the Bayesian parallel interference cancellation (BPIC) detector to perform iterative symbol detection, referred to as D-DIP-BPIC. Our simulation results show that the symbol error rate (SER) performance of the proposed D-DIP-BPIC detector outperforms practical state-of-the-art detectors by 0.5 dB and retains low computational complexity.
△ Less
Submitted 7 May, 2023;
originally announced May 2023.
-
Model Reference Gaussian Process Regression: Data-Driven State Feedback Controller
Authors:
Hyuntae Kim,
Hamin Chang,
Hyungbo Shim
Abstract:
This paper proposes a data-driven state feedback controller that enables reference tracking for nonlinear discrete-time systems. The controller is designed based on the identified inverse model of the system and a given reference model, assuming that the identification of the inverse model is carried out using only the system's state/input measurements. When its results are provided, we present co…
▽ More
This paper proposes a data-driven state feedback controller that enables reference tracking for nonlinear discrete-time systems. The controller is designed based on the identified inverse model of the system and a given reference model, assuming that the identification of the inverse model is carried out using only the system's state/input measurements. When its results are provided, we present conditions that guarantee a certain level of reference tracking performance, regardless of the identification method employed for the inverse model. Specifically, when Gaussian process regression (GPR) is used as the identification method, we propose sufficient conditions for the required data by applying some lemmas related to identification errors to the aforementioned conditions, ensuring that the Model reference-GPR (MR-GPR) controller can guarantee a certain level of reference tracking performance. Finally, an example is provided to demonstrate the effectiveness of the MR-GPR controller.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Hierarchical control and learning of a foraging CyberOctopus
Authors:
Chia-Hsien Shih,
Noel Naughton,
Udit Halder,
Heng-Sheng Chang,
Seung Hyun Kim,
Rhanor Gillette,
Prashant G. Mehta,
Mattia Gazzola
Abstract:
Inspired by the unique neurophysiology of the octopus, we propose a hierarchical framework that simplifies the coordination of multiple soft arms by decomposing control into high-level decision making, low-level motor activation, and local reflexive behaviors via sensory feedback. When evaluated in the illustrative problem of a model octopus foraging for food, this hierarchical decomposition resul…
▽ More
Inspired by the unique neurophysiology of the octopus, we propose a hierarchical framework that simplifies the coordination of multiple soft arms by decomposing control into high-level decision making, low-level motor activation, and local reflexive behaviors via sensory feedback. When evaluated in the illustrative problem of a model octopus foraging for food, this hierarchical decomposition results in significant improvements relative to end-to-end methods. Performance is achieved through a mixed-modes approach, whereby qualitatively different tasks are addressed via complementary control schemes. Here, model-free reinforcement learning is employed for high-level decision-making, while model-based energy sha** takes care of arm-level motor execution. To render the pairing computationally tenable, a novel neural-network energy sha** (NN-ES) controller is developed, achieving accurate motions with time-to-solutions 200 times faster than previous attempts. Our hierarchical framework is then successfully deployed in increasingly challenging foraging scenarios, including an arena littered with obstacles in 3D space, demonstrating the viability of our approach.
△ Less
Submitted 11 February, 2023;
originally announced February 2023.
-
Data-driven Moving Horizon Estimation for Angular Velocity of Space Noncooperative Target in Eddy Current De-tumbling Mission
Authors:
Xiyao Liu,
Haitao Chang,
Zhenyu Lu,
Panfeng Huang
Abstract:
Angular velocity estimation is critical for eddy current de-tumbling of noncooperative space targets. However, unknown model of the noncooperative target and few observation data make the model-based estimation methods challenged. In this paper, a Data-driven Moving Horizon Estimation method is proposed to estimate the angular velocity of the noncooperative target with de-tumbling torque. In this…
▽ More
Angular velocity estimation is critical for eddy current de-tumbling of noncooperative space targets. However, unknown model of the noncooperative target and few observation data make the model-based estimation methods challenged. In this paper, a Data-driven Moving Horizon Estimation method is proposed to estimate the angular velocity of the noncooperative target with de-tumbling torque. In this method, model-free state estimation of the angular velocity can be achieved using only one historical trajectory data that satisfies the rank condition. With local linear approximation, the Willems fundamental lemma is extended to nonlinear autonomous systems, and the rank condition for the historical trajectory data is deduced. Then, a data-driven moving horizon estimation algorithm based on the M step Lyapunov function is designed, and the time-discount robust stability of the algorithm is given. In order to illustrate the effectiveness of the proposed algorithm, experiments and simulations are performed to estimate the angular velocity in eddy current de-tumbling with only de-tumbling torque measurement.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
Fast Iterative Algorithms for Blind Phase Retrieval: A survey
Authors:
Huibin Chang,
Li Yang,
Stefano Marchesini
Abstract:
In nanoscale imaging technique and ultrafast laser, the reconstruction procedure is normally formulated as a blind phase retrieval (BPR) problem, where one has to recover both the sample and the probe (pupil) jointly from phaseless data. This survey first presents the mathematical formula of BPR, related nonlinear optimization problems and then gives a brief review of the recent iterative algorith…
▽ More
In nanoscale imaging technique and ultrafast laser, the reconstruction procedure is normally formulated as a blind phase retrieval (BPR) problem, where one has to recover both the sample and the probe (pupil) jointly from phaseless data. This survey first presents the mathematical formula of BPR, related nonlinear optimization problems and then gives a brief review of the recent iterative algorithms. It mainly consists of three types of algorithms, including the operator-splitting based first-order optimization methods, second order algorithm with Hessian,and subspace methods. The future research directions for experimental issues and theoretical analysis are further discussed.
△ Less
Submitted 12 November, 2022;
originally announced November 2022.
-
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval
Authors:
Layne Berry,
Yi-Jen Shih,
Hsuan-Fu Wang,
Heng-Jui Chang,
Hung-yi Lee,
David Harwath
Abstract:
This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval. For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin both when training separate models for each language, and with a single model which processes speech in all three languages. We identify key differenc…
▽ More
This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval. For non-English image-speech retrieval, we outperform the current state-of-the-art performance by a wide margin both when training separate models for each language, and with a single model which processes speech in all three languages. We identify key differences in model behavior and performance between English and non-English settings, attributable to the English-only pre-training of CLIP and HuBERT, and investigate how fine-tuning the pre-trained models impacts these differences. Finally, we show that our models can be used for mono- and cross-lingual speech-text retrieval and cross-lingual speech-speech retrieval, despite never having seen any parallel speech-text or speech-speech data during training.
△ Less
Submitted 10 April, 2023; v1 submitted 2 November, 2022;
originally announced November 2022.
-
Model Reference Gaussian Process Regression: Data-Driven Output Feedback Controller
Authors:
Hyuntae Kim,
Hamin Chang,
Hyungbo Shim
Abstract:
Data-driven controls using Gaussian process regression have recently gained much attention. In such approaches, system identification by Gaussian process regression is mostly followed by model-based controller designs. However, the outcomes of Gaussian process regression are often too complicated to apply conventional control designs, which makes the numerical design such as model predictive contr…
▽ More
Data-driven controls using Gaussian process regression have recently gained much attention. In such approaches, system identification by Gaussian process regression is mostly followed by model-based controller designs. However, the outcomes of Gaussian process regression are often too complicated to apply conventional control designs, which makes the numerical design such as model predictive control employed in many cases. To overcome the restriction, our idea is to perform Gaussian process regression to the inverse of the plant with the same input/output data for the conventional regression. With the inverse, one can design a model reference controller without resorting to numerical control methods. This paper considers single-input single-output (SISO) discrete-time nonlinear systems of minimum phase with relative degree one. It is highlighted that the model reference Gaussian process regression controller is designed directly from pre-collected input/output data without system identification.
△ Less
Submitted 5 October, 2022;
originally announced October 2022.
-
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Authors:
Yi-Jen Shih,
Hsuan-Fu Wang,
Heng-Jui Chang,
Layne Berry,
Hung-yi Lee,
David Harwath
Abstract:
Data-driven speech processing models usually perform well with a large amount of text supervision, but collecting transcribed speech data is costly. Therefore, we propose SpeechCLIP, a novel framework bridging speech and text through images to enhance speech models without transcriptions. We leverage state-of-the-art pre-trained HuBERT and CLIP, aligning them via paired images and spoken captions…
▽ More
Data-driven speech processing models usually perform well with a large amount of text supervision, but collecting transcribed speech data is costly. Therefore, we propose SpeechCLIP, a novel framework bridging speech and text through images to enhance speech models without transcriptions. We leverage state-of-the-art pre-trained HuBERT and CLIP, aligning them via paired images and spoken captions with minimal fine-tuning. SpeechCLIP outperforms prior state-of-the-art on image-speech retrieval and performs zero-shot speech-text retrieval without direct supervision from transcriptions. Moreover, SpeechCLIP can directly retrieve semantically related keywords from speech.
△ Less
Submitted 25 October, 2022; v1 submitted 3 October, 2022;
originally announced October 2022.
-
RVSL: Robust Vehicle Similarity Learning in Real Hazy Scenes Based on Semi-supervised Learning
Authors:
Wei-Ting Chen,
I-Hsiang Chen,
Chih-Yuan Yeh,
Hao-Hsiang Yang,
Hua-En Chang,
Jian-Jiun Ding,
Sy-Yen Kuo
Abstract:
Recently, vehicle similarity learning, also called re-identification (ReID), has attracted significant attention in computer vision. Several algorithms have been developed and obtained considerable success. However, most existing methods have unpleasant performance in the hazy scenario due to poor visibility. Though some strategies are possible to resolve this problem, they still have room to be i…
▽ More
Recently, vehicle similarity learning, also called re-identification (ReID), has attracted significant attention in computer vision. Several algorithms have been developed and obtained considerable success. However, most existing methods have unpleasant performance in the hazy scenario due to poor visibility. Though some strategies are possible to resolve this problem, they still have room to be improved due to the limited performance in real-world scenarios and the lack of real-world clear ground truth. Thus, to resolve this problem, inspired by CycleGAN, we construct a training paradigm called \textbf{RVSL} which integrates ReID and domain transformation techniques. The network is trained on semi-supervised fashion and does not require to employ the ID labels and the corresponding clear ground truths to learn hazy vehicle ReID mission in the real-world haze scenes. To further constrain the unsupervised learning process effectively, several losses are developed. Experimental results on synthetic and real-world datasets indicate that the proposed method can achieve state-of-the-art performance on hazy vehicle ReID problems. It is worth mentioning that although the proposed method is trained without real-world label information, it can achieve competitive performance compared to existing supervised methods trained on complete label information.
△ Less
Submitted 18 September, 2022;
originally announced September 2022.
-
Energy Sha** Control of a Muscular Octopus Arm Moving in Three Dimensions
Authors:
Heng-Sheng Chang,
Udit Halder,
Chia-Hsien Shih,
Noel Naughton,
Mattia Gazzola,
Prashant G. Mehta
Abstract:
Flexible octopus arms exhibit an exceptional ability to coordinate large numbers of degrees of freedom and perform complex manipulation tasks. As a consequence, these systems continue to attract the attention of biologists and roboticists alike. In this paper, we develop a three-dimensional model of a soft octopus arm, equipped with biomechanically realistic muscle actuation. Internal forces and c…
▽ More
Flexible octopus arms exhibit an exceptional ability to coordinate large numbers of degrees of freedom and perform complex manipulation tasks. As a consequence, these systems continue to attract the attention of biologists and roboticists alike. In this paper, we develop a three-dimensional model of a soft octopus arm, equipped with biomechanically realistic muscle actuation. Internal forces and couples exerted by all major muscle groups are considered. An energy sha** control method is described to coordinate muscle activity so as to grasp and reach in 3D space. Key contributions of this paper are: (i) modeling of major muscle groups to elicit three-dimensional movements; (ii) a mathematical formulation for muscle activations based on a stored energy function; and (iii) a computationally efficient procedure to design task-specific equilibrium configurations, obtained by solving an optimization problem in the Special Euclidean group SE(3). Muscle controls are then iteratively computed based on the co-state variable arising from the solution of the optimization problem. The approach is numerically demonstrated in the physically accurate software environment Elastica. Results of numerical experiments mimicking observed octopus behaviors are reported.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Federated Learning Enables Big Data for Rare Cancer Boundary Detection
Authors:
Sarthak Pati,
Ujjwal Baid,
Brandon Edwards,
Micah Sheller,
Shih-Han Wang,
G Anthony Reina,
Patrick Foley,
Alexey Gruzdev,
Deepthi Karkada,
Christos Davatzikos,
Chiharu Sako,
Satyam Ghodasara,
Michel Bilello,
Suyash Mohan,
Philipp Vollmuth,
Gianluca Brugnara,
Chandrakanth J Preetha,
Felix Sahm,
Klaus Maier-Hein,
Maximilian Zenk,
Martin Bendszus,
Wolfgang Wick,
Evan Calabrese,
Jeffrey Rudie,
Javier Villanueva-Meyer
, et al. (254 additional authors not shown)
Abstract:
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train acc…
▽ More
Although machine learning (ML) has shown promise in numerous domains, there are concerns about generalizability to out-of-sample data. This is currently addressed by centrally sharing ample, and importantly diverse, data from multiple sites. However, such centralization is challenging to scale (or even not feasible) due to various limitations. Federated ML (FL) provides an alternative to train accurate and generalizable ML models, by only sharing numerical model updates. Here we present findings from the largest FL study to-date, involving data from 71 healthcare institutions across 6 continents, to generate an automatic tumor boundary detector for the rare disease of glioblastoma, utilizing the largest dataset of such patients ever used in the literature (25,256 MRI scans from 6,314 patients). We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent. We anticipate our study to: 1) enable more studies in healthcare informed by large and diverse data, ensuring meaningful results for rare diseases and underrepresented populations, 2) facilitate further quantitative analyses for glioblastoma via performance optimization of our consensus model for eventual public release, and 3) demonstrate the effectiveness of FL at such scale and task complexity as a paradigm shift for multi-site collaborations, alleviating the need for data sharing.
△ Less
Submitted 25 April, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
Deep learning based closed-loop optimization of geothermal reservoir production
Authors:
Nanzhe Wang,
Haibin Chang,
Xiangzhao Kong,
Martin O. Saar,
Dongxiao Zhang
Abstract:
To maximize the economic benefits of geothermal energy production, it is essential to optimize geothermal reservoir management strategies, in which geologic uncertainty should be considered. In this work, we propose a closed-loop optimization framework, based on deep learning surrogates, for the well control optimization of geothermal reservoirs. In this framework, we construct a hybrid convolutio…
▽ More
To maximize the economic benefits of geothermal energy production, it is essential to optimize geothermal reservoir management strategies, in which geologic uncertainty should be considered. In this work, we propose a closed-loop optimization framework, based on deep learning surrogates, for the well control optimization of geothermal reservoirs. In this framework, we construct a hybrid convolution-recurrent neural network surrogate, which combines the convolution neural network (CNN) and long short-term memory (LSTM) recurrent network. The convolution structure can extract spatial information of geologic parameter fields and the recurrent structure can approximate sequence-to-sequence map**. The trained model can predict time-varying production responses (rate, temperature, etc.) for cases with different permeability fields and well control sequences. In the closed-loop optimization framework, production optimization based on the differential evolution (DE) algorithm, and data assimilation based on the iterative ensemble smoother (IES), are performed alternately to achieve real-time well control optimization and geologic parameter estimation as the production proceeds. In addition, the averaged objective function over the ensemble of geologic parameter estimations is adopted to consider geologic uncertainty in the optimization process. Several geothermal reservoir development cases are designed to test the performance of the proposed production optimization framework. The results show that the proposed framework can achieve efficient and effective real-time optimization and data assimilation in the geothermal reservoir production process.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
-
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Authors:
Hsiang-Sheng Tsai,
Heng-Jui Chang,
Wen-Chin Huang,
Zili Huang,
Kushal Lakhotia,
Shu-wen Yang,
Shuyan Dong,
Andy T. Liu,
Cheng-I Jeff Lai,
Jiatong Shi,
Xuankai Chang,
Phil Hall,
Hsuan-Jui Chen,
Shang-Wen Li,
Shinji Watanabe,
Abdelrahman Mohamed,
Hung-yi Lee
Abstract:
Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards in…
▽ More
Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards introducing a common benchmark to evaluate pre-trained models across various speech tasks. In this paper, we introduce SUPERB-SG, a new benchmark focused on evaluating the semantic and generative capabilities of pre-trained models by increasing task diversity and difficulty over SUPERB. We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain and quality across different types of tasks. It entails freezing pre-trained model parameters, only using simple task-specific trainable heads. The goal is to be inclusive of all researchers, and encourage efficient use of computational resources. We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
In-Place Rotation for Enhancing Snake-like Robot Mobility
Authors:
Alexander H. Chang,
Patricio A. Vela
Abstract:
Gaits engineered for snake-like robots to rotate in-place instrumentally fill a gap in the set of locomotive gaits that have traditionally prioritized translation. This paper designs a Turn-in-Place gait and demonstrates the ability of a shape-centric modeling framework to capture the gait's locomotive properties. Shape modeling for turning involves a time-varying continuous body curve described b…
▽ More
Gaits engineered for snake-like robots to rotate in-place instrumentally fill a gap in the set of locomotive gaits that have traditionally prioritized translation. This paper designs a Turn-in-Place gait and demonstrates the ability of a shape-centric modeling framework to capture the gait's locomotive properties. Shape modeling for turning involves a time-varying continuous body curve described by a standing wave. Presumed viscous robot-ground frictional interactions lead to body dynamics conditioned on the time-varying shape model. The dynamic equations describing the Turn-in-Place gait are validated by an articulated snake-like robot using a physics-based simulator and a physical robot. The results affirm the shape-centric modeling framework's capacity to model a variety of snake-like robot gaits with fundamentally different body-ground contact patterns. As an applied demonstration, example locomotion scenarios partner the shape-centric Turn-in-Place gait with a Rectilinear gait for maneuvering through constrained environments based on a multi-modal locomotive planning strategy. Unified shape-centric modeling facilitates trajectory planning and tracking for a snake-like robot to successfully negotiate non-trivial obstacle configurations.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Changeable Rate and Novel Quantization for CSI Feedback Based on Deep Learning
Authors:
Xin Liang,
Haoran Chang,
Haozhen Li,
Xinyu Gu,
Lin Zhang
Abstract:
Deep learning (DL)-based channel state information (CSI) feedback improves the capacity and energy efficiency of massive multiple-input multiple-output (MIMO) systems in frequency division duplexing mode. However, multiple neural networks with different lengths of feedback overhead are required by time-varying bandwidth resources. The storage space required at the user equipment (UE) and the base…
▽ More
Deep learning (DL)-based channel state information (CSI) feedback improves the capacity and energy efficiency of massive multiple-input multiple-output (MIMO) systems in frequency division duplexing mode. However, multiple neural networks with different lengths of feedback overhead are required by time-varying bandwidth resources. The storage space required at the user equipment (UE) and the base station (BS) for these models increases linearly with the number of models. In this paper, we propose a DL-based changeable-rate framework with novel quantization scheme to improve the efficiency and feasibility of CSI feedback systems. This framework can reutilize all the network layers to achieve overhead-changeable CSI feedback to optimize the storage efficiency at the UE and the BS sides. Designed quantizer in this framework can avoid the normalization and gradient problems faced by traditional quantization schemes. Specifically, we propose two DL-based changeable-rate CSI feedback networks CH-CsiNetPro and CH-DualNetSph by introducing a feedback overhead control unit. Then, a pluggable quantization block (PQB) is developed to further improve the encoding efficiency of CSI feedback in an end-to-end way. Compared with existing CSI feedback methods, the proposed framework saves the storage space by about 50% with changeable-rate scheme and improves the encoding efficiency with the quantization module.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
Unsupervised Learning Based Hybrid Beamforming with Low-Resolution Phase Shifters for MU-MIMO Systems
Authors:
Chia-Ho Kuo,
Hsin-Yuan Chang,
Ronald Y. Chang,
Wei-Ho Chung
Abstract:
Millimeter wave (mmWave) is a key technology for fifth-generation (5G) and beyond communications. Hybrid beamforming has been proposed for large-scale antenna systems in mmWave communications. Existing hybrid beamforming designs based on infinite-resolution phase shifters (PSs) are impractical due to hardware cost and power consumption. In this paper, we propose an unsupervised-learning-based sche…
▽ More
Millimeter wave (mmWave) is a key technology for fifth-generation (5G) and beyond communications. Hybrid beamforming has been proposed for large-scale antenna systems in mmWave communications. Existing hybrid beamforming designs based on infinite-resolution phase shifters (PSs) are impractical due to hardware cost and power consumption. In this paper, we propose an unsupervised-learning-based scheme to jointly design the analog precoder and combiner with low-resolution PSs for multiuser multiple-input multiple-output (MU-MIMO) systems. We transform the analog precoder and combiner design problem into a phase classification problem and propose a generic neural network architecture, termed the phase classification network (PCNet), capable of producing solutions of various PS resolutions. Simulation results demonstrate the superior sum-rate and complexity performance of the proposed scheme, as compared to state-of-the-art hybrid beamforming designs for the most commonly used low-resolution PS configurations.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
Speech Enhancement Based on Cyclegan with Noise-informed Training
Authors:
Wen-Yuan Ting,
Syu-Siang Wang,
Hsin-Li Chang,
Borching Su,
Yu Tsao
Abstract:
Cycle-consistent generative adversarial networks (CycleGAN) were successfully applied to speech enhancement (SE) tasks with unpaired noisy-clean training data. The CycleGAN SE system adopted two generators and two discriminators trained with losses from noisy-to-clean and clean-to-noisy conversions. CycleGAN showed promising results for numerous SE tasks. Herein, we investigate a potential limitat…
▽ More
Cycle-consistent generative adversarial networks (CycleGAN) were successfully applied to speech enhancement (SE) tasks with unpaired noisy-clean training data. The CycleGAN SE system adopted two generators and two discriminators trained with losses from noisy-to-clean and clean-to-noisy conversions. CycleGAN showed promising results for numerous SE tasks. Herein, we investigate a potential limitation of the clean-to-noisy conversion part and propose a novel noise-informed training (NIT) approach to improve the performance of the original CycleGAN SE system. The main idea of the NIT approach is to incorporate target domain information for clean-to-noisy conversion to facilitate a better training procedure. The experimental results confirmed that the proposed NIT approach improved the generalization capability of the original CycleGAN SE system with a notable margin.
△ Less
Submitted 6 December, 2022; v1 submitted 19 October, 2021;
originally announced October 2021.
-
CSI Sensing and Feedback: A Semi-Supervised Learning Approach
Authors:
Haozhen Li,
Boyuan Zhang,
Xin Liang,
Haoran Chang,
Xinyu Gu,
Lin Zhang
Abstract:
Deep learning-based (DL-based) channel state information (CSI) feedback for a Massive multiple-input multiple-output (MIMO) system has proved to be a creative and efficient application. However, the existing systems ignored the wireless channel environment variation sensing, e.g., indoor and outdoor scenarios. Moreover, systems training requires excess pre-labeled CSI data, which is often unavaila…
▽ More
Deep learning-based (DL-based) channel state information (CSI) feedback for a Massive multiple-input multiple-output (MIMO) system has proved to be a creative and efficient application. However, the existing systems ignored the wireless channel environment variation sensing, e.g., indoor and outdoor scenarios. Moreover, systems training requires excess pre-labeled CSI data, which is often unavailable. In this letter, to address these issues, we first exploit the rationality of introducing semi-supervised learning on CSI feedback, then one semi-supervised CSI sensing and feedback Network ($S^2$CsiNet) with three classifiers comparisons is proposed. Experiment shows that $S^2$CsiNet primarily improves the feasibility of the DL-based CSI feedback system by \textbf{\textit{indoor}} and \textbf{\textit{outdoor}} environment sensing and at most 96.2\% labeled dataset decreasing and secondarily boost the system performance by data distillation and latent information mining.
△ Less
Submitted 26 September, 2021;
originally announced October 2021.
-
Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models
Authors:
Liang-Hsuan Tseng,
Yu-Kuan Fu,
Heng-Jui Chang,
Hung-yi Lee
Abstract:
Code-switching (CS) is common in daily conversations where more than one language is used within a sentence. The difficulties of CS speech recognition lie in alternating languages and the lack of transcribed data. Therefore, this paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS. We show that hidden representations of SSL mo…
▽ More
Code-switching (CS) is common in daily conversations where more than one language is used within a sentence. The difficulties of CS speech recognition lie in alternating languages and the lack of transcribed data. Therefore, this paper uses the recently successful self-supervised learning (SSL) methods to leverage many unlabeled speech data without CS. We show that hidden representations of SSL models offer frame-level language identity even if the models are trained with English speech only. Jointly training CTC and language identification modules with self-supervised speech representations improves CS speech recognition performance. Furthermore, using multilingual speech data for pre-training obtains the best CS speech recognition.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT
Authors:
Heng-Jui Chang,
Shu-wen Yang,
Hung-yi Lee
Abstract:
Self-supervised speech representation learning methods like wav2vec 2.0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks. Despite the success of these methods, they require large memory and high pre-training costs, making them inaccessible for researchers in academia and small companies. Therefore, thi…
▽ More
Self-supervised speech representation learning methods like wav2vec 2.0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks. Despite the success of these methods, they require large memory and high pre-training costs, making them inaccessible for researchers in academia and small companies. Therefore, this paper introduces DistilHuBERT, a novel multi-task learning framework to distill hidden representations from a HuBERT model directly. This method reduces HuBERT's size by 75% and 73% faster while retaining most performance in ten different tasks. Moreover, DistilHuBERT required little training time and data, opening the possibilities of pre-training personal and on-device SSL models for speech.
△ Less
Submitted 27 April, 2022; v1 submitted 5 October, 2021;
originally announced October 2021.
-
MITI Minimum Information guidelines for highly multiplexed tissue images
Authors:
Denis Schapiro,
Clarence Yapp,
Artem Sokolov,
Sheila M. Reynolds,
Yu-An Chen,
Damir Sudar,
Yubin Xie,
Jeremy L. Muhlich,
Raquel Arias-Camison,
Sarah Arena,
Adam J. Taylor,
Milen Nikolov,
Madison Tyler,
Jia-Ren Lin,
Erik A. Burlingame,
Human Tumor Atlas Network,
Young H. Chang,
Samouil L Farhi,
Vésteinn Thorsson,
Nithya Venkatamohan,
Julia L. Drewes,
Dana Pe'er,
David A. Gutman,
Markus D. Herrmann,
Nils Gehlenborg
, et al. (14 additional authors not shown)
Abstract:
The imminent release of tissue atlases combining multi-channel microscopy with single cell sequencing and other omics data from normal and diseased specimens creates an urgent need for data and metadata standards that guide data deposition, curation and release. We describe a Minimum Information about highly multiplexed Tissue Imaging (MITI) standard that applies best practices developed for genom…
▽ More
The imminent release of tissue atlases combining multi-channel microscopy with single cell sequencing and other omics data from normal and diseased specimens creates an urgent need for data and metadata standards that guide data deposition, curation and release. We describe a Minimum Information about highly multiplexed Tissue Imaging (MITI) standard that applies best practices developed for genomics and other microscopy data to highly multiplexed tissue images and traditional histology.
△ Less
Submitted 23 February, 2022; v1 submitted 21 August, 2021;
originally announced August 2021.
-
ViTGAN: Training GANs with Vision Transformers
Authors:
Kwonjoon Lee,
Huiwen Chang,
Lu Jiang,
Han Zhang,
Zhuowen Tu,
Ce Liu
Abstract:
Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such performance can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). For ViT discriminators, we observe that existing regularization methods f…
▽ More
Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such performance can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). For ViT discriminators, we observe that existing regularization methods for GANs interact poorly with self-attention, causing serious instability during training. To resolve this issue, we introduce several novel regularization techniques for training GANs with ViTs. For ViT generators, we examine architectural choices for latent and pixel map** layers to facilitate convergence. Empirically, our approach, named ViTGAN, achieves comparable performance to the leading CNN-based GAN models on three datasets: CIFAR-10, CelebA, and LSUN bedroom.
△ Less
Submitted 29 May, 2024; v1 submitted 9 July, 2021;
originally announced July 2021.
-
Federated Dynamic Spectrum Access
Authors:
Yifei Song,
Hao-Hsuan Chang,
Zhou Zhou,
Shashank Jere,
Lingjia Liu
Abstract:
Due to the growing volume of data traffic produced by the surge of Internet of Things (IoT) devices, the demand for radio spectrum resources is approaching their limitation defined by Federal Communications Commission (FCC). To this end, Dynamic Spectrum Access (DSA) is considered as a promising technology to handle this spectrum scarcity. However, standard DSA techniques often rely on analytical…
▽ More
Due to the growing volume of data traffic produced by the surge of Internet of Things (IoT) devices, the demand for radio spectrum resources is approaching their limitation defined by Federal Communications Commission (FCC). To this end, Dynamic Spectrum Access (DSA) is considered as a promising technology to handle this spectrum scarcity. However, standard DSA techniques often rely on analytical modeling wireless networks, making its application intractable in under-measured network environments. Therefore, utilizing neural networks to approximate the network dynamics is an alternative approach. In this article, we introduce a Federated Learning (FL) based framework for the task of DSA, where FL is a distributive machine learning framework that can reserve the privacy of network terminals under heterogeneous data distributions. We discuss the opportunities, challenges, and opening problems of this framework. To evaluate its feasibility, we implement a Multi-Agent Reinforcement Learning (MARL)-based FL as a realization associated with its initial evaluation results.
△ Less
Submitted 28 June, 2021;
originally announced June 2021.
-
Learning a Model-Driven Variational Network for Deformable Image Registration
Authors:
Xi Jia,
Alexander Thorley,
Wei Chen,
Huaqi Qiu,
Linlin Shen,
Iain B Styles,
Hyung ** Chang,
Ales Leonardis,
Antonio de Marvao,
Declan P. O'Regan,
Daniel Rueckert,
**ming Duan
Abstract:
Data-driven deep learning approaches to image registration can be less accurate than conventional iterative approaches, especially when training data is limited. To address this whilst retaining the fast inference speed of deep learning, we propose VR-Net, a novel cascaded variational network for unsupervised deformable image registration. Using the variable splitting optimization scheme, we first…
▽ More
Data-driven deep learning approaches to image registration can be less accurate than conventional iterative approaches, especially when training data is limited. To address this whilst retaining the fast inference speed of deep learning, we propose VR-Net, a novel cascaded variational network for unsupervised deformable image registration. Using the variable splitting optimization scheme, we first convert the image registration problem, established in a generic variational framework, into two sub-problems, one with a point-wise, closed-form solution while the other one is a denoising problem. We then propose two neural layers (i.e. war** layer and intensity consistency layer) to model the analytical solution and a residual U-Net to formulate the denoising problem (i.e. generalized denoising layer). Finally, we cascade the war** layer, intensity consistency layer, and generalized denoising layer to form the VR-Net. Extensive experiments on three (two 2D and one 3D) cardiac magnetic resonance imaging datasets show that VR-Net outperforms state-of-the-art deep learning methods on registration accuracy, while maintains the fast inference speed of deep learning and the data-efficiency of variational model.
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings
Authors:
Innfarn Yoo,
Huiwen Chang,
Xiyang Luo,
Ondrej Stava,
Ce Liu,
Peyman Milanfar,
Feng Yang
Abstract:
Digital watermarking is widely used for copyright protection. Traditional 3D watermarking approaches or commercial software are typically designed to embed messages into 3D meshes, and later retrieve the messages directly from distorted/undistorted watermarked 3D meshes. However, in many cases, users only have access to rendered 2D images instead of 3D meshes. Unfortunately, retrieving messages fr…
▽ More
Digital watermarking is widely used for copyright protection. Traditional 3D watermarking approaches or commercial software are typically designed to embed messages into 3D meshes, and later retrieve the messages directly from distorted/undistorted watermarked 3D meshes. However, in many cases, users only have access to rendered 2D images instead of 3D meshes. Unfortunately, retrieving messages from 2D renderings of 3D meshes is still challenging and underexplored. We introduce a novel end-to-end learning framework to solve this problem through: 1) an encoder to covertly embed messages in both mesh geometry and textures; 2) a differentiable renderer to render watermarked 3D objects from different camera angles and under varied lighting conditions; 3) a decoder to recover the messages from 2D rendered images. From our experiments, we show that our model can learn to embed information visually imperceptible to humans, and to retrieve the embedded information from 2D renderings that undergo 3D distortions. In addition, we demonstrate that our method can also work with other renderers, such as ray tracers and real-time renderers with and without fine-tuning.
△ Less
Submitted 29 March, 2022; v1 submitted 27 April, 2021;
originally announced April 2021.
-
Towards Lifelong Learning of End-to-end ASR
Authors:
Heng-Jui Chang,
Hung-yi Lee,
Lin-shan Lee
Abstract:
Automatic speech recognition (ASR) technologies today are primarily optimized for given datasets; thus, any changes in the application environment (e.g., acoustic conditions or topic domains) may inevitably degrade the performance. We can collect new data describing the new environment and fine-tune the system, but this naturally leads to higher error rates for the earlier datasets, referred to as…
▽ More
Automatic speech recognition (ASR) technologies today are primarily optimized for given datasets; thus, any changes in the application environment (e.g., acoustic conditions or topic domains) may inevitably degrade the performance. We can collect new data describing the new environment and fine-tune the system, but this naturally leads to higher error rates for the earlier datasets, referred to as catastrophic forgetting. The concept of lifelong learning (LLL) aiming to enable a machine to sequentially learn new tasks from new datasets describing the changing real world without forgetting the previously learned knowledge is thus brought to attention. This paper reports, to our knowledge, the first effort to extensively consider and analyze the use of various approaches of LLL in end-to-end (E2E) ASR, including proposing novel methods in saving data for past domains to mitigate the catastrophic forgetting problem. An overall relative reduction of 28.7% in WER was achieved compared to the fine-tuning baseline when sequentially learning on three very different benchmark corpora. This can be the first step toward the highly desired ASR technologies capable of synchronizing with the continuously changing real world.
△ Less
Submitted 2 July, 2021; v1 submitted 4 April, 2021;
originally announced April 2021.
-
Controlling a CyberOctopus Soft Arm with Muscle-like Actuation
Authors:
Heng-Sheng Chang,
Udit Halder,
Ekaterina Gribkova,
Arman Tekinalp,
Noel Naughton,
Mattia Gazzola,
Prashant G. Mehta
Abstract:
This paper presents an application of the energy sha** methodology to control a flexible, elastic Cosserat rod model of a single octopus arm. The novel contributions of this work are two-fold: (i) a control-oriented modeling of the anatomically realistic internal muscular architecture of an octopus arm; and (ii) the integration of these muscle models into the energy sha** control methodology.…
▽ More
This paper presents an application of the energy sha** methodology to control a flexible, elastic Cosserat rod model of a single octopus arm. The novel contributions of this work are two-fold: (i) a control-oriented modeling of the anatomically realistic internal muscular architecture of an octopus arm; and (ii) the integration of these muscle models into the energy sha** control methodology. The control-oriented modeling takes inspiration in equal parts from theories of nonlinear elasticity and energy sha** control. By introducing a stored energy function for muscles, the difficulties associated with explicitly solving the matching conditions of the energy sha** methodology are avoided. The overall control design problem is posed as a bilevel optimization problem. Its solution is obtained through iterative algorithms. The methodology is numerically implemented and demonstrated in a full-scale dynamic simulation environment Elastica. Two bio-inspired numerical experiments involving the control of octopus arms are reported.
△ Less
Submitted 1 April, 2021; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Optimal Control of a Soft CyberOctopus Arm
Authors:
Tixian Wang,
Udit Halder,
Heng-Sheng Chang,
Mattia Gazzola,
Prashant G. Mehta
Abstract:
In this paper, we use the optimal control methodology to control a flexible, elastic Cosserat rod. An inspiration comes from stereotypical movement patterns in octopus arms, which are observed in a variety of manipulation tasks, such as reaching or fetching. To help uncover the mechanisms underlying these observed morphologies, we outline an optimal control-based framework. A single octopus arm is…
▽ More
In this paper, we use the optimal control methodology to control a flexible, elastic Cosserat rod. An inspiration comes from stereotypical movement patterns in octopus arms, which are observed in a variety of manipulation tasks, such as reaching or fetching. To help uncover the mechanisms underlying these observed morphologies, we outline an optimal control-based framework. A single octopus arm is modeled as a Hamiltonian control system, where the continuum mechanics of the arm is modeled after the Cosserat rod theory, and internal, distributed muscle forces and couples are considered as controls. First order necessary optimality conditions are derived for an optimal control problem formulated for this infinite dimensional system. Solutions to this problem are obtained numerically by an iterative forward-backward algorithm. The state and adjoint equations are solved in a dynamic simulation environment, setting the stage for studying a broader class of optimal control problems. Trajectories that minimize control effort are demonstrated and qualitatively compared with observed behaviors.
△ Less
Submitted 1 April, 2021; v1 submitted 2 October, 2020;
originally announced October 2020.
-
Deep-Learning based Inverse Modeling Approaches: A Subsurface Flow Example
Authors:
Nanzhe Wang,
Haibin Chang,
Dongxiao Zhang
Abstract:
Deep-learning has achieved good performance and shown great potential for solving forward and inverse problems. In this work, two categories of innovative deep-learning based inverse modeling methods are proposed and compared. The first category is deep-learning surrogate-based inversion methods, in which the Theory-guided Neural Network (TgNN) is constructed as a deep-learning surrogate for probl…
▽ More
Deep-learning has achieved good performance and shown great potential for solving forward and inverse problems. In this work, two categories of innovative deep-learning based inverse modeling methods are proposed and compared. The first category is deep-learning surrogate-based inversion methods, in which the Theory-guided Neural Network (TgNN) is constructed as a deep-learning surrogate for problems with uncertain model parameters. By incorporating physical laws and other constraints, the TgNN surrogate can be constructed with limited simulation runs and accelerate the inversion process significantly. Three TgNN surrogate-based inversion methods are proposed, including the gradient method, the iterative ensemble smoother (IES), and the training method. The second category is direct-deep-learning-inversion methods, in which TgNN constrained with geostatistical information, named TgNN-geo, is proposed for direct inverse modeling. In TgNN-geo, two neural networks are introduced to approximate the respective random model parameters and the solution. Since the prior geostatistical information can be incorporated, the direct-inversion method based on TgNN-geo works well, even in cases with sparse spatial measurements or imprecise prior statistics. Although the proposed deep-learning based inverse modeling methods are general in nature, and thus applicable to a wide variety of problems, they are tested with several subsurface flow problems. It is found that satisfactory results are obtained with a high efficiency. Moreover, both the advantages and disadvantages are further analyzed for the proposed two categories of deep-learning based inversion methods.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
Multi-Frequency Multi-Scenario Millimeter Wave MIMO Channel Measurements and Modeling for B5G Wireless Communication Systems
Authors:
Jie Huang,
Cheng-Xiang Wang,
Hengtai Chang,
Jian Sun,
Xiqi Gao
Abstract:
Millimeter wave (mmWave) bands have been utilized for the fifth generation (5G) communication systems and will no doubt continue to be deployed for beyond 5G (B5G). However, the underlying channels are not fully investigated at multifrequency bands and in multi-scenarios by using the same channel sounder, especially for the outdoor, multiple-input multiple-output (MIMO), and vehicle-to-vehicle (V2…
▽ More
Millimeter wave (mmWave) bands have been utilized for the fifth generation (5G) communication systems and will no doubt continue to be deployed for beyond 5G (B5G). However, the underlying channels are not fully investigated at multifrequency bands and in multi-scenarios by using the same channel sounder, especially for the outdoor, multiple-input multiple-output (MIMO), and vehicle-to-vehicle (V2V) conditions. In this paper, we conduct multi-frequency multi-scenario mmWave MIMO channel measurements with 4*4 antennas at 28, 32, and 39 GHz bands for three cases, i.e., the human body and vehicle blockage measurements, outdoor path loss measurements, and V2V measurements. The channel characteristics, including blockage effect, path loss and coverage range, and non-stationarity and spatial consistency, are thoroughly studied. The blockage model, path loss model, and time-varying channel model are proposed for mmWave MIMO channels. The channel measurement and modeling results will be of great importance for further mmWave communication system deployments in indoor hotspot, outdoor, and vehicular network scenarios for B5G.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.