Search | arXiv e-print repository

arXiv:2406.18536 [pdf, other]

Reliable Interval Prediction of Minimum Operating Voltage Based on On-chip Monitors via Conformalized Quantile Regression

Authors: Yuxuan Yin, Xiaoxiao Wang, Rebecca Chen, Chen He, Peng Li

Abstract: Predicting the minimum operating voltage ($V_{min}$) of chips is one of the important techniques for improving the manufacturing testing flow, as well as ensuring the long-term reliability and safety of in-field systems. Current $V_{min}$ prediction methods often provide only point estimates, necessitating additional techniques for constructing prediction confidence intervals to cover uncertaintie… ▽ More Predicting the minimum operating voltage ($V_{min}$) of chips is one of the important techniques for improving the manufacturing testing flow, as well as ensuring the long-term reliability and safety of in-field systems. Current $V_{min}$ prediction methods often provide only point estimates, necessitating additional techniques for constructing prediction confidence intervals to cover uncertainties caused by different sources of variations. While some existing techniques offer region predictions, but they rely on certain distributional assumptions and/or provide no coverage guarantees. In response to these limitations, we propose a novel distribution-free $V_{min}$ interval estimation methodology possessing a theoretical guarantee of coverage. Our approach leverages conformalized quantile regression and on-chip monitors to generate reliable prediction intervals. We demonstrate the effectiveness of the proposed method on an industrial 5nm automotive chip dataset. Moreover, we show that the use of on-chip monitors can reduce the interval length significantly for $V_{min}$ prediction. △ Less

Submitted 3 May, 2024; originally announced June 2024.

Comments: Accepted by DATE 2024. Camera-ready version

arXiv:2405.11895 [pdf, other]

Sparse Attention-driven Quality Prediction for Production Process Optimization in Digital Twins

Authors: Yanlei Yin, Lihua Wang, Wenbo Wang, Dinh Thai Hoang

Abstract: In the process industry, optimizing production lines for long-term efficiency requires real-time monitoring and analysis of operation states to fine-tune production line parameters. However, the complexity in operational logic and the intricate coupling of production process parameters make it difficult to develop an accurate mathematical model for the entire process, thus hindering the deployment… ▽ More In the process industry, optimizing production lines for long-term efficiency requires real-time monitoring and analysis of operation states to fine-tune production line parameters. However, the complexity in operational logic and the intricate coupling of production process parameters make it difficult to develop an accurate mathematical model for the entire process, thus hindering the deployment of efficient optimization mechanisms. In view of these difficulties, we propose to deploy a digital twin of the production line by digitally abstracting its physical layout and operational logic. By iteratively map** the real-world data reflecting equipment operation status and product quality inspection in the digital twin, we adopt a quality prediction model for production process based on self-attention-enabled temporal convolutional neural networks. This model enables the data-driven state evolution of the digital twin. The digital twin takes a role of aggregating the information of actual operating conditions and the results of quality-sensitive analysis, which facilitates the optimization of process production quality with virtual-reality evolution under multi-dimensional constraints. Leveraging the digital twin model as an information-flow carrier, we extract temporal features from key process indicators and establish a production process quality prediction model based on the proposed composite neural network. Our operation experiments on a specific tobacco shredding line demonstrate that the proposed digital twin-based production process optimization method fosters seamless integration between virtual and real production lines. This integration achieves an average operating status prediction accuracy of over 98\% and near-optimal production process control. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2403.19470 [pdf, other]

Deep decomposition method for the limited aperture inverse obstacle scattering problem

Authors: Yunwen Yin, Liang Yan

Abstract: In this paper, we consider a deep learning approach to the limited aperture inverse obstacle scattering problem. It is well known that traditional deep learning relies solely on data, which may limit its performance for the inverse problem when only indirect observation data and a physical model are available. A fundamental question arises in light of these limitations: is it possible to enable de… ▽ More In this paper, we consider a deep learning approach to the limited aperture inverse obstacle scattering problem. It is well known that traditional deep learning relies solely on data, which may limit its performance for the inverse problem when only indirect observation data and a physical model are available. A fundamental question arises in light of these limitations: is it possible to enable deep learning to work on inverse problems without labeled data and to be aware of what it is learning? This work proposes a deep decomposition method (DDM) for such purposes, which does not require ground truth labels. It accomplishes this by providing physical operators associated with the scattering model to the neural network architecture. Additionally, a deep learning based data completion scheme is implemented in DDM to prevent distorting the solution of the inverse problem for limited aperture data. Furthermore, apart from addressing the ill-posedness imposed by the inverse problem itself, DDM is a physics-aware machine learning technique that can have interpretability property. The convergence result of DDM is theoretically proven. Numerical experiments are presented to demonstrate the validity of the proposed DDM even when the incident and observation apertures are extremely limited. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2310.06930 [pdf, other]

Prosody Analysis of Audiobooks

Authors: Charuta Pethe, Yunting Yin, Steven Skiena

Abstract: Recent advances in text-to-speech have made it possible to generate natural-sounding audio from text. However, audiobook narrations involve dramatic vocalizations and intonations by the reader, with greater reliance on emotions, dialogues, and descriptions in the narrative. Using our dataset of 93 aligned book-audiobook pairs, we present improved models for prosody prediction properties (pitch, vo… ▽ More Recent advances in text-to-speech have made it possible to generate natural-sounding audio from text. However, audiobook narrations involve dramatic vocalizations and intonations by the reader, with greater reliance on emotions, dialogues, and descriptions in the narrative. Using our dataset of 93 aligned book-audiobook pairs, we present improved models for prosody prediction properties (pitch, volume, and rate of speech) from narrative text using language modeling. Our predicted prosody attributes correlate much better with human audiobook readings than results from a state-of-the-art commercial TTS system: our predicted pitch shows a higher correlation with human reading for 22 out of the 24 books, while our predicted volume attribute proves more similar to human reading for 23 out of the 24 books. Finally, we present a human evaluation study to quantify the extent that people prefer prosody-enhanced audiobook readings over commercial text-to-speech systems. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2309.02418 [pdf, other]

Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition

Authors: Minh Tran, Yufeng Yin, Mohammad Soleymani

Abstract: There are individual differences in expressive behaviors driven by cultural norms and personality. This between-person variation can result in reduced emotion recognition performance. Therefore, personalization is an important step in improving the generalization and robustness of speech emotion recognition. In this paper, to achieve unsupervised personalized emotion recognition, we first pre-trai… ▽ More There are individual differences in expressive behaviors driven by cultural norms and personality. This between-person variation can result in reduced emotion recognition performance. Therefore, personalization is an important step in improving the generalization and robustness of speech emotion recognition. In this paper, to achieve unsupervised personalized emotion recognition, we first pre-train an encoder with learnable speaker embeddings in a self-supervised manner to learn robust speech representations conditioned on speakers. Second, we propose an unsupervised method to compensate for the label distribution shifts by finding similar speakers and leveraging their label distributions from the training set. Extensive experimental results on the MSP-Podcast corpus indicate that our method consistently outperforms strong personalization baselines and achieves state-of-the-art performance for valence estimation. △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: Accepted by INTERSPEECH 2023

arXiv:2308.05975 [pdf]

A Self-supervised SAR Image Despeckling Strategy Based on Parameter-sharing Convolutional Neural Networks

Authors: Liang Chen, Yifei Yin, Hao Shi, Qingqing Sheng, Wei Li

Abstract: Speckle noise is generated due to the SAR imaging mechanism, which brings difficulties in SAR image interpretation. Hence, despeckling is a helpful step in SAR pre-processing. Nowadays, deep learning has been proved to be a progressive method for SAR image despeckling. Most deep learning methods for despeckling are based on supervised learning, which needs original SAR images and speckle-free SAR… ▽ More Speckle noise is generated due to the SAR imaging mechanism, which brings difficulties in SAR image interpretation. Hence, despeckling is a helpful step in SAR pre-processing. Nowadays, deep learning has been proved to be a progressive method for SAR image despeckling. Most deep learning methods for despeckling are based on supervised learning, which needs original SAR images and speckle-free SAR images to train the network. However, the speckle-free SAR images are generally not available. So, this issue was tackled by adding multiplicative noise to optical images synthetically for simulating speckled image. Therefore, there are following challenges in SAR image despeckling: (1) lack of speckle-free SAR image; (2) difficulty in kee** details such as edges and textures in heterogeneous areas. To address these issues, we propose a self-supervised SAR despeckling strategy that can be trained without speckle-free images. Firstly, the feasibility of SAR image despeckling without speckle-free images is proved theoretically. Then, the sub-sampler based on the adjacent-syntropy criteria is proposed. The training image pairs are generated by the sub-sampler from real-word SAR image to estimate the noise distribution. Furthermore, to make full use of training pairs, the parameter sharing convolutional neural networks are adopted. Finally, according to the characteristics of SAR images, a multi-feature loss function is proposed. The proposed loss function is composed of despeckling term, regular term and perception term, to constrain the gap between the generated paired images. The ability of edge and texture feature preserving is improved simultaneously. Finally, qualitative and quantitative experiments are validated on real-world SAR images, showing better performances than several advanced SAR image despeckling methods. △ Less

Submitted 11 August, 2023; originally announced August 2023.

arXiv:2306.15530 [pdf, other]

Fast and Automatic 3D Modeling of Antenna Structure Using CNN-LSTM Network for Efficient Data Generation

Authors: Zhaohui Wei, Zhao Zhou, Peng Wang, Jian Ren, Yingzeng Yin, Gert Frølund Pedersen, Ming Shen

Abstract: Deep learning-assisted antenna design methods such as surrogate models have gained significant popularity in recent years due to their potential to greatly increase design efficiencies by replacing the time-consuming full-wave electromagnetic (EM) simulations. However, a large number of training data with sufficiently diverse and representative samples (antenna structure parameters, scattering pro… ▽ More Deep learning-assisted antenna design methods such as surrogate models have gained significant popularity in recent years due to their potential to greatly increase design efficiencies by replacing the time-consuming full-wave electromagnetic (EM) simulations. However, a large number of training data with sufficiently diverse and representative samples (antenna structure parameters, scattering properties, etc.) is mandatory for these methods to ensure good performance. Traditional antenna modeling methods relying on manual model construction and modification are time-consuming and cannot meet the requirement of efficient training data acquisition. In this study, we proposed a deep learning-assisted and image-based intelligent modeling approach for accelerating the data acquisition of antenna samples with different physical structures. Specifically, our method only needs an image of the antenna structure, usually available in scientific publications, as the input while the corresponding modeling codes (VBA language) are generated automatically. The proposed model mainly consists of two parts: Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) networks. The former is used for capturing features of antenna structure images and the latter is employed to generate the modeling codes. Through training, the proposed model can achieve fast and automatic data acquisition of antenna physical structures based on antenna images. Experiment results show that the proposed method achieves a significant speed enhancement than the manual modeling approach. This approach lays the foundation for efficient data acquisition needed to build robust surrogate models in the future. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.04360 [pdf, other]

doi 10.1109/TAP.2022.3179898

Robust and Efficient Fault Diagnosis of mm-Wave Active Phased Arrays using Baseband Signal

Authors: Martin H. Nielsen, Yufeng Zhang, Changbin Xue, Jian Ren, Yingzeng Yin, Ming Shen, Gert F. Pedersen

Abstract: One key communication block in 5G and 6G radios is the active phased array (APA). To ensure reliable operation, efficient and timely fault diagnosis of APAs on-site is crucial. To date, fault diagnosis has relied on measurement of frequency domain radiation patterns using costly equipment and multiple strictly controlled measurement probes, which are time-consuming, complex, and therefore infeasib… ▽ More One key communication block in 5G and 6G radios is the active phased array (APA). To ensure reliable operation, efficient and timely fault diagnosis of APAs on-site is crucial. To date, fault diagnosis has relied on measurement of frequency domain radiation patterns using costly equipment and multiple strictly controlled measurement probes, which are time-consuming, complex, and therefore infeasible for on-site deployment. This paper proposes a novel method exploiting a Deep Neural Network (DNN) tailored to extract the features hidden in the baseband in-phase and quadrature signals for classifying the different faults. It requires only a single probe in one measurement point for fast and accurate diagnosis of the faulty elements and components in APAs. Validation of the proposed method is done using a commercial 28 GHz APA. Accuracies of 99% and 80% have been demonstrated for single- and multi-element failure detection, respectively. Three different test scenarios are investigated: on-off antenna elements, phase variations, and magnitude attenuation variations. In a low signal to noise ratio of 4 dB, stable fault detection accuracy above 90% is maintained. This is all achieved with a detection time of milliseconds (e.g 6~ms), showing a high potential for on-site deployment. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: 10 pages

Journal ref: in IEEE Transactions on Antennas and Propagation, vol. 70, no. 7, pp. 5044-5053, July 2022

arXiv:2302.03033 [pdf, other]

doi 10.1109/ISCC53001.2021.9631485

Exemplars and Counterexemplars Explanations for Image Classifiers, Targeting Skin Lesion Labeling

Authors: Carlo Metta, Riccardo Guidotti, Yuan Yin, Patrick Gallinari, Salvatore Rinzivillo

Abstract: Explainable AI consists in develo** mechanisms allowing for an interaction between decision systems and humans by making the decisions of the formers understandable. This is particularly important in sensitive contexts like in the medical domain. We propose a use case study, for skin lesion diagnosis, illustrating how it is possible to provide the practitioner with explanations on the decisions… ▽ More Explainable AI consists in develo** mechanisms allowing for an interaction between decision systems and humans by making the decisions of the formers understandable. This is particularly important in sensitive contexts like in the medical domain. We propose a use case study, for skin lesion diagnosis, illustrating how it is possible to provide the practitioner with explanations on the decisions of a state of the art deep neural network classifier trained to characterize skin lesions from examples. Our framework consists of a trained classifier onto which an explanation module operates. The latter is able to offer the practitioner exemplars and counterexemplars for the classification diagnosis thus allowing the physician to interact with the automatic diagnosis system. The exemplars are generated via an adversarial autoencoder. We illustrate the behavior of the system on representative examples. △ Less

Submitted 18 January, 2023; originally announced February 2023.

Comments: arXiv admin note: text overlap with arXiv:2111.11863

Journal ref: 2021 IEEE Symposium on Computers and Communications (ISCC)

arXiv:2210.08868 [pdf, other]

Cerebrovascular Segmentation via Vessel Oriented Filtering Network

Authors: Zhanqiang Guo, Yao Luan, Jianjiang Feng, Wangsheng Lu, Yin Yin, Guangming Yang, Jie Zhou

Abstract: Accurate cerebrovascular segmentation from Magnetic Resonance Angiography (MRA) and Computed Tomography Angiography (CTA) is of great significance in diagnosis and treatment of cerebrovascular pathology. Due to the complexity and topology variability of blood vessels, complete and accurate segmentation of vascular network is still a challenge. In this paper, we proposed a Vessel Oriented Filtering… ▽ More Accurate cerebrovascular segmentation from Magnetic Resonance Angiography (MRA) and Computed Tomography Angiography (CTA) is of great significance in diagnosis and treatment of cerebrovascular pathology. Due to the complexity and topology variability of blood vessels, complete and accurate segmentation of vascular network is still a challenge. In this paper, we proposed a Vessel Oriented Filtering Network (VOF-Net) which embeds domain knowledge into the convolutional neural network. We design oriented filters for blood vessels according to vessel orientation field, which is obtained by orientation estimation network. Features extracted by oriented filtering are injected into segmentation network, so as to make use of the prior information that the blood vessels are slender and curved tubular structure. Experimental results on datasets of CTA and MRA show that the proposed method is effective for vessel segmentation, and embedding the specific vascular filter improves the segmentation performance. △ Less

Submitted 17 October, 2022; originally announced October 2022.

arXiv:2206.05687 [pdf, other]

DRNet: Decomposition and Reconstruction Network for Remote Physiological Measurement

Authors: Yuhang Dong, Gong** Yang, Yilong Yin

Abstract: Remote photoplethysmography (rPPG) based physiological measurement has great application values in affective computing, non-contact health monitoring, telehealth monitoring, etc, which has become increasingly important especially during the COVID-19 pandemic. Existing methods are generally divided into two groups. The first focuses on mining the subtle blood volume pulse (BVP) signals from face vi… ▽ More Remote photoplethysmography (rPPG) based physiological measurement has great application values in affective computing, non-contact health monitoring, telehealth monitoring, etc, which has become increasingly important especially during the COVID-19 pandemic. Existing methods are generally divided into two groups. The first focuses on mining the subtle blood volume pulse (BVP) signals from face videos, but seldom explicitly models the noises that dominate face video content. They are susceptible to the noises and may suffer from poor generalization ability in unseen scenarios. The second focuses on modeling noisy data directly, resulting in suboptimal performance due to the lack of regularity of these severe random noises. In this paper, we propose a Decomposition and Reconstruction Network (DRNet) focusing on the modeling of physiological features rather than noisy data. A novel cycle loss is proposed to constrain the periodicity of physiological information. Besides, a plug-and-play Spatial Attention Block (SAB) is proposed to enhance features along with the spatial location information. Furthermore, an efficient Patch Crop** (PC) augmentation strategy is proposed to synthesize augmented samples with different noise and features. Extensive experiments on different public datasets as well as the cross-database testing demonstrate the effectiveness of our approach. △ Less

Submitted 20 June, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

arXiv:2206.03596 [pdf, other]

Neural Network Compression via Effective Filter Analysis and Hierarchical Pruning

Authors: Ziqi Zhou, Li Lian, Yilong Yin, Ze Wang

Abstract: Network compression is crucial to making the deep networks to be more efficient, faster, and generalizable to low-end hardware. Current network compression methods have two open problems: first, there lacks a theoretical framework to estimate the maximum compression rate; second, some layers may get over-prunned, resulting in significant network performance drop. To solve these two problems, this… ▽ More Network compression is crucial to making the deep networks to be more efficient, faster, and generalizable to low-end hardware. Current network compression methods have two open problems: first, there lacks a theoretical framework to estimate the maximum compression rate; second, some layers may get over-prunned, resulting in significant network performance drop. To solve these two problems, this study propose a gradient-matrix singularity analysis-based method to estimate the maximum network redundancy. Guided by that maximum rate, a novel and efficient hierarchical network pruning algorithm is developed to maximally condense the neuronal network structure without sacrificing network performance. Substantial experiments are performed to demonstrate the efficacy of the new method for pruning several advanced convolutional neural network (CNN) architectures. Compared to existing pruning methods, the proposed pruning algorithm achieved state-of-the-art performance. At the same or similar compression ratio, the new method provided the highest network prediction accuracy as compared to other methods. △ Less

Submitted 7 June, 2022; originally announced June 2022.

arXiv:2206.00901 [pdf]

Musical Instrument Recognition by XGBoost Combining Feature Fusion

Authors: Yijie Liu, Yanfang Yin, Qigang Zhu, Wenzhuo Cui

Abstract: Musical instrument classification is one of the focuses of Music Information Retrieval (MIR). In order to solve the problem of poor performance of current musical instrument classification models, we propose a musical instrument classification algorithm based on multi-channel feature fusion and XGBoost. Based on audio feature extraction and fusion of the dataset, the features are input into the XG… ▽ More Musical instrument classification is one of the focuses of Music Information Retrieval (MIR). In order to solve the problem of poor performance of current musical instrument classification models, we propose a musical instrument classification algorithm based on multi-channel feature fusion and XGBoost. Based on audio feature extraction and fusion of the dataset, the features are input into the XGBoost model for training; secondly, we verified the superior performance of the algorithm in the musical instrument classification task by com-paring different feature combinations and several classical machine learning models such as Naive Bayes. The algorithm achieves an accuracy of 97.65% on the Medley-solos-DB dataset, outperforming existing models. The experiments provide a reference for feature selection in feature engineering for musical instrument classification. △ Less

Submitted 2 June, 2022; originally announced June 2022.

arXiv:2202.08724 [pdf, ps, other]

doi 10.1109/ITSC48978.2021.9565048

Real-Time Cross-Fleet Pareto-Improving Truck Platoon Coordination

Authors: Alexander Johansson, Jonas Mårtensson, Xiaotong Sun, Yafeng Yin

Abstract: This paper studies a multi-fleet platoon coordination system in transport networks that deploy hubs to form trucks into platoons. The trucks belong to different fleets that are interested in increasing their profits by platooning across fleets. The profit of each fleet incorporates platooning rewards and costs for waiting at hubs. Each truck has a fixed route and a waiting time budget to spend at… ▽ More This paper studies a multi-fleet platoon coordination system in transport networks that deploy hubs to form trucks into platoons. The trucks belong to different fleets that are interested in increasing their profits by platooning across fleets. The profit of each fleet incorporates platooning rewards and costs for waiting at hubs. Each truck has a fixed route and a waiting time budget to spend at the hubs along its route. To ensure that all fleets are willing to participate in the system, we develop a cross-fleet Pareto-improving coordination strategy that guarantees higher fleet profits than a coordination strategy without cross-fleet platoons. By leveraging multiple hubs for platoon formation, the coordination strategy can be implemented in a real-time and distributed fashion while largely reducing the amount of travel information to be shared for system-wide coordination. We evaluate the proposed strategy in a simulation study over the Swedish transportation network. The cross-fleet platooning strategy significantly improves fleets' profits compared with single-fleet platooning, especially the profits from smaller fleets. The cross-fleet platooning strategy also shows strong competitiveness in terms of the system-wide profit compared to the case when a system planner optimizes all fleets' total profit. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: ITSC 2021

arXiv:2112.07415 [pdf, ps, other]

Stochastic Planner-Actor-Critic for Unsupervised Deformable Image Registration

Authors: Ziwei Luo, **g Hu, Xin Wang, Shu Hu, Bin Kong, Youbing Yin, Qi Song, Xi Wu, Siwei Lyu

Abstract: Large deformations of organs, caused by diverse shapes and nonlinear shape changes, pose a significant challenge for medical image registration. Traditional registration methods need to iteratively optimize an objective function via a specific deformation model along with meticulous parameter tuning, but which have limited capabilities in registering images with large deformations. While deep lear… ▽ More Large deformations of organs, caused by diverse shapes and nonlinear shape changes, pose a significant challenge for medical image registration. Traditional registration methods need to iteratively optimize an objective function via a specific deformation model along with meticulous parameter tuning, but which have limited capabilities in registering images with large deformations. While deep learning-based methods can learn the complex map** from input images to their respective deformation field, it is regression-based and is prone to be stuck at local minima, particularly when large deformations are involved. To this end, we present Stochastic Planner-Actor-Critic (SPAC), a novel reinforcement learning-based framework that performs step-wise registration. The key notion is war** a moving image successively by each time step to finally align to a fixed image. Considering that it is challenging to handle high dimensional continuous action and state spaces in the conventional reinforcement learning (RL) framework, we introduce a new concept `Plan' to the standard Actor-Critic model, which is of low dimension and can facilitate the actor to generate a tractable high dimensional action. The entire framework is based on unsupervised training and operates in an end-to-end manner. We evaluate our method on several 2D and 3D medical image datasets, some of which contain large deformations. Our empirical results highlight that our work achieves consistent, significant gains and outperforms state-of-the-art methods. △ Less

Submitted 30 April, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

Comments: Accepted by AAAI 2022

arXiv:2108.13038 [pdf]

doi 10.1088/1742-6596/2234/1/012015

Integrated Decision and Control at Multi-Lane Intersections with Mixed Traffic Flow

Authors: Jianhua Jiang, Yangang Ren, Yang Guan, Shengbo Eben Li, Yuming Yin, ** **

Abstract: Autonomous driving at intersections is one of the most complicated and accident-prone traffic scenarios, especially with mixed traffic participants such as vehicles, bicycles and pedestrians. The driving policy should make safe decisions to handle the dynamic traffic conditions and meet the requirements of on-board computation. However, most of the current researches focuses on simplified intersec… ▽ More Autonomous driving at intersections is one of the most complicated and accident-prone traffic scenarios, especially with mixed traffic participants such as vehicles, bicycles and pedestrians. The driving policy should make safe decisions to handle the dynamic traffic conditions and meet the requirements of on-board computation. However, most of the current researches focuses on simplified intersections considering only the surrounding vehicles and idealized traffic lights. This paper improves the integrated decision and control framework and develops a learning-based algorithm to deal with complex intersections with mixed traffic flows, which can not only take account of realistic characteristics of traffic lights, but also learn a safe policy under different safety constraints. We first consider different velocity models for green and red lights in the training process and use a finite state machine to handle different modes of light transformation. Then we design different types of distance constraints for vehicles, traffic lights, pedestrians, bicycles respectively and formulize the constrained optimal control problems (OCPs) to be optimized. Finally, reinforcement learning (RL) with value and policy networks is adopted to solve the series of OCPs. In order to verify the safety and efficiency of the proposed method, we design a multi-lane intersection with the existence of large-scale mixed traffic participants and set practical traffic light phases. The simulation results indicate that the trained decision and control policy can well balance safety and tracking performance. Compared with model predictive control (MPC), the computational time is three orders of magnitude lower. △ Less

Submitted 30 August, 2021; originally announced August 2021.

Comments: 8 pages, 10 figures, 11 equations and 14 conferences

arXiv:2107.11517 [pdf, other]

Crosslink-Net: Double-branch Encoder Segmentation Network via Fusing Vertical and Horizontal Convolutions

Authors: Qian Yu, Lei Qi, Lu** Zhou, Lei Wang, Yilong Yin, Yinghuan Shi, Wuzhang Wang, Yang Gao

Abstract: Accurate image segmentation plays a crucial role in medical image analysis, yet it faces great challenges of various shapes, diverse sizes, and blurry boundaries. To address these difficulties, square kernel-based encoder-decoder architecture has been proposed and widely used, but its performance remains still unsatisfactory. To further cope with these challenges, we present a novel double-branch… ▽ More Accurate image segmentation plays a crucial role in medical image analysis, yet it faces great challenges of various shapes, diverse sizes, and blurry boundaries. To address these difficulties, square kernel-based encoder-decoder architecture has been proposed and widely used, but its performance remains still unsatisfactory. To further cope with these challenges, we present a novel double-branch encoder architecture. Our architecture is inspired by two observations: 1) Since the discrimination of features learned via square convolutional kernels needs to be further improved, we propose to utilize non-square vertical and horizontal convolutional kernels in the double-branch encoder, so features learned by the two branches can be expected to complement each other. 2) Considering that spatial attention can help models to better focus on the target region in a large-sized image, we develop an attention loss to further emphasize the segmentation on small-sized targets. Together, the above two schemes give rise to a novel double-branch encoder segmentation framework for medical image segmentation, namely Crosslink-Net. The experiments validate the effectiveness of our model on four datasets. The code is released at https://github.com/Qianyu1226/Crosslink-Net. △ Less

Submitted 23 July, 2021; originally announced July 2021.

Comments: 13 pages, 12 figures

MSC Class: 68T07 ACM Class: I.4.6

arXiv:2107.11318 [pdf, other]

Heuristics for Customer-focused Ride-pooling Assignment

Authors: Alexander Sundt, Qi Luo, John Vincent, Mehrdad Shahabi, Yafeng Yin

Abstract: Ride-pooling has become an important service option offered by ride-hailing platforms as it serves multiple trip requests in a single ride. By leveraging customer data, connected vehicles, and efficient assignment algorithms, ride-pooling can be a critical instrument to address driver shortages and mitigate the negative externalities of ride-hailing operations. Recent literature has focused on com… ▽ More Ride-pooling has become an important service option offered by ride-hailing platforms as it serves multiple trip requests in a single ride. By leveraging customer data, connected vehicles, and efficient assignment algorithms, ride-pooling can be a critical instrument to address driver shortages and mitigate the negative externalities of ride-hailing operations. Recent literature has focused on computationally intensive optimization-based methods that maximize system throughput or minimize vehicle miles. However, individual customers may experience substantial service quality degradation due to the consequent waiting and detour time. In contrast, this paper examines heuristic methods for real-time ride-pooling assignments that are highly scalable and easily computable. We propose a restricted subgraph method and compare it with other existing heuristic and optimization-based matching algorithms using a variety of metrics. By fusing multiple sources of trip and network data in New York City, we develop a flexible, agent-based simulation platform to test these strategies on different demand levels and examine how they affect both the customer experience and the ride-hailing platform. Our results find a trade-off among heuristics between throughput and customer matching time. We show that our proposed ride-pooling strategy maintains system performance while limiting trip delays and improving customer experience. This work provides insight for policymakers and ride-hailing operators about the performance of simpler heuristics and raises concerns about prioritizing only specific platform metrics without considering service quality. △ Less

Submitted 23 July, 2021; originally announced July 2021.

Comments: 13 pages, 8 figures, 4 tables

arXiv:2103.05505 [pdf]

Approximate Optimal Filter for Linear Gaussian Time-invariant Systems

Authors: Kaiming Tang, Shengbo Eben Li, Yuming Yin, Yang Guan, **gliang Duan, Wenhan Cao, Jie Li

Abstract: State estimation is critical to control systems, especially when the states cannot be directly measured. This paper presents an approximate optimal filter, which enables to use policy iteration technique to obtain the steady-state gain in linear Gaussian time-invariant systems. This design transforms the optimal filtering problem with minimum mean square error into an optimal control problem, call… ▽ More State estimation is critical to control systems, especially when the states cannot be directly measured. This paper presents an approximate optimal filter, which enables to use policy iteration technique to obtain the steady-state gain in linear Gaussian time-invariant systems. This design transforms the optimal filtering problem with minimum mean square error into an optimal control problem, called Approximate Optimal Filtering (AOF) problem. The equivalence holds given certain conditions about initial state distributions and policy formats, in which the system state is the estimation error, control input is the filter gain, and control objective function is the accumulated estimation error. We present a policy iteration algorithm to solve the AOF problem in steady-state. A classic vehicle state estimation problem finally evaluates the approximate filter. The results show that the policy converges to the steady-state Kalman gain, and its accuracy is within 2 %. △ Less

Submitted 9 March, 2021; originally announced March 2021.

arXiv:2103.00714 [pdf]

Diffusion-weighted MRI-guided needle biopsies permit quantitative tumor heterogeneity assessment and cell load estimation

Authors: Yi Yin, Kai Breuhahn, Hans-Ulrich Kauczor, Oliver Sedlaczek, Irene E. Vignon-Clementel, Dirk Drasdo

Abstract: Quantitative information on tumor heterogeneity and cell load could assist in designing effective and refined personalized treatment strategies. It was recently shown by us that such information can be inferred from the diffusion parameter D derived from the diffusion-weighted MRI (DWI) if a relation between D and cell density can be established. However, such relation cannot a priori be assumed t… ▽ More Quantitative information on tumor heterogeneity and cell load could assist in designing effective and refined personalized treatment strategies. It was recently shown by us that such information can be inferred from the diffusion parameter D derived from the diffusion-weighted MRI (DWI) if a relation between D and cell density can be established. However, such relation cannot a priori be assumed to be constant for all patients and tumor types. Hence to assist in clinical decisions in palliative settings, the relation needs to be established without tumor resection. It is here demonstrated that biopsies may contain sufficient information for this purpose if the localization of biopsies is chosen as systematically elaborated in this paper. A superpixel-based method for automated optimal localization of biopsies from the DWI D-map is proposed. The performance of the DWI-guided procedure is evaluated by extensive simulations of biopsies. Needle biopsies yield sufficient histological information to establish a quantitative relationship between D-value and cell density, provided they are taken from regions with high, intermediate, and low D-value in DWI. The automated localization of the biopsy regions is demonstrated from a NSCLC patient tumor. In this case, even two or three biopsies give a reasonable estimate. Simulations of needle biopsies under different conditions indicate that the DWI-guidance highly improves the estimation results. Tumor cellularity and heterogeneity in solid tumors may be reliably investigated from DWI and a few needle biopsies that are sampled in regions of well-separated D-values, excluding adipose tissue. This procedure could provide a way of embedding in the clinical workflow assistance in cancer diagnosis and treatment based on personalized information. △ Less

Submitted 28 February, 2021; originally announced March 2021.

arXiv:2103.00430 [pdf, other]

Training Generative Adversarial Networks in One Stage

Authors: Chengchao Shen, Youtan Yin, Xinchao Wang, Xubin Li, Jie Song, Mingli Song

Abstract: Generative Adversarial Networks (GANs) have demonstrated unprecedented success in various image generation tasks. The encouraging results, however, come at the price of a cumbersome training process, during which the generator and discriminator are alternately updated in two stages. In this paper, we investigate a general training scheme that enables training GANs efficiently in only one stage. Ba… ▽ More Generative Adversarial Networks (GANs) have demonstrated unprecedented success in various image generation tasks. The encouraging results, however, come at the price of a cumbersome training process, during which the generator and discriminator are alternately updated in two stages. In this paper, we investigate a general training scheme that enables training GANs efficiently in only one stage. Based on the adversarial losses of the generator and discriminator, we categorize GANs into two classes, Symmetric GANs and Asymmetric GANs, and introduce a novel gradient decomposition method to unify the two, allowing us to train both classes in one stage and hence alleviate the training effort. We also computationally analyze the efficiency of the proposed method, and empirically demonstrate that, the proposed method yields a solid $1.5\times$ acceleration across various datasets and network architectures. Furthermore, we show that the proposed method is readily applicable to other adversarial-training scenarios, such as data-free knowledge distillation. The code is available at https://github.com/zju-vipa/OSGAN. △ Less

Submitted 16 June, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

Comments: Accepted to CVPR 2021

arXiv:2102.11736 [pdf, other]

Recurrent Model Predictive Control

Authors: Zhengyu Liu, **gliang Duan, Wenxuan Wang, Shengbo Eben Li, Yuming Yin, Ziyu Lin, Qi Sun, Bo Cheng

Abstract: This paper proposes an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems. Unlike traditional Model Predictive Control (MPC) algorithms, it can make full use of the current computing resources and adaptively select the longest model prediction horizon. Our algorithm employs a recurrent function to approximate the… ▽ More This paper proposes an off-line algorithm, called Recurrent Model Predictive Control (RMPC), to solve general nonlinear finite-horizon optimal control problems. Unlike traditional Model Predictive Control (MPC) algorithms, it can make full use of the current computing resources and adaptively select the longest model prediction horizon. Our algorithm employs a recurrent function to approximate the optimal policy, which maps the system states and reference values directly to the control inputs. The number of prediction steps is equal to the number of recurrent cycles of the learned policy function. With an arbitrary initial policy function, the proposed RMPC algorithm can converge to the optimal policy by directly minimizing the designed loss function. We further prove the convergence and optimality of the RMPC algorithm thorough Bellman optimality principle, and demonstrate its generality and efficiency using two numerical examples. △ Less

Submitted 23 February, 2021; originally announced February 2021.

Comments: arXiv admin note: substantial text overlap with arXiv:2102.10289

arXiv:2102.10289 [pdf, other]

doi 10.1109/TIE.2022.3153800

Recurrent Model Predictive Control: Learning an Explicit Recurrent Controller for Nonlinear Systems

Authors: Zhengyu Liu, **gliang Duan, Wenxuan Wang, Shengbo Eben Li, Yuming Yin, Ziyu Lin, Bo Cheng

Abstract: This paper proposes an offline control algorithm, called Recurrent Model Predictive Control (RMPC), to solve large-scale nonlinear finite-horizon optimal control problems. It can be regarded as an explicit solver of traditional Model Predictive Control (MPC) algorithms, which can adaptively select appropriate model prediction horizon according to current computing resources, so as to improve the p… ▽ More This paper proposes an offline control algorithm, called Recurrent Model Predictive Control (RMPC), to solve large-scale nonlinear finite-horizon optimal control problems. It can be regarded as an explicit solver of traditional Model Predictive Control (MPC) algorithms, which can adaptively select appropriate model prediction horizon according to current computing resources, so as to improve the policy performance. Our algorithm employs a recurrent function to approximate the optimal policy, which maps the system states and reference values directly to the control inputs. The output of the learned policy network after N recurrent cycles corresponds to the nearly optimal solution of N-step MPC. A policy optimization objective is designed by decomposing the MPC cost function according to the Bellman's principle of optimality. The optimal recurrent policy can be obtained by directly minimizing the designed objective function, which is applicable for general nonlinear and non input-affine systems. Both simulation-based and real-robot path-tracking tasks are utilized to demonstrate the effectiveness of the proposed method. △ Less

Submitted 8 April, 2022; v1 submitted 20 February, 2021; originally announced February 2021.

Journal ref: IEEE Transactions on Industrial Electronics, 2022

arXiv:2012.10716 [pdf, other]

Model-Based Actor-Critic with Chance Constraint for Stochastic System

Authors: Baiyu Peng, Yao Mu, Yang Guan, Shengbo Eben Li, Yuming Yin, Jianyu Chen

Abstract: Safety is essential for reinforcement learning (RL) applied in real-world situations. Chance constraints are suitable to represent the safety requirements in stochastic systems. Previous chance-constrained RL methods usually have a low convergence rate, or only learn a conservative policy. In this paper, we propose a model-based chance constrained actor-critic (CCAC) algorithm which can efficientl… ▽ More Safety is essential for reinforcement learning (RL) applied in real-world situations. Chance constraints are suitable to represent the safety requirements in stochastic systems. Previous chance-constrained RL methods usually have a low convergence rate, or only learn a conservative policy. In this paper, we propose a model-based chance constrained actor-critic (CCAC) algorithm which can efficiently learn a safe and non-conservative policy. Different from existing methods that optimize a conservative lower bound, CCAC directly solves the original chance constrained problems, where the objective function and safe probability is simultaneously optimized with adaptive weights. In order to improve the convergence rate, CCAC utilizes the gradient of dynamic model to accelerate policy optimization. The effectiveness of CCAC is demonstrated by a stochastic car-following task. Experiments indicate that compared with previous RL methods, CCAC improves the performance while guaranteeing safety, with a five times faster convergence rate. It also has 100 times higher online computation efficiency than traditional safety techniques such as stochastic model predictive control. △ Less

Submitted 16 March, 2021; v1 submitted 19 December, 2020; originally announced December 2020.

arXiv:2012.05509 [pdf]

doi 10.1016/j.patcog.2021.108499

COVID-MTL: Multitask Learning with Shift3D and Random-weighted Loss for Automated Diagnosis and Severity Assessment of COVID-19

Authors: Guoqing Bao, Huai Chen, Tongliang Liu, Guanzhong Gong, Yong Yin, Lisheng Wang, Xiuying Wang

Abstract: There is an urgent need for automated methods to assist accurate and effective assessment of COVID-19. Radiology and nucleic acid test (NAT) are complementary COVID-19 diagnosis methods. In this paper, we present an end-to-end multitask learning (MTL) framework (COVID-MTL) that is capable of automated and simultaneous detection (against both radiology and NAT) and severity assessment of COVID-19.… ▽ More There is an urgent need for automated methods to assist accurate and effective assessment of COVID-19. Radiology and nucleic acid test (NAT) are complementary COVID-19 diagnosis methods. In this paper, we present an end-to-end multitask learning (MTL) framework (COVID-MTL) that is capable of automated and simultaneous detection (against both radiology and NAT) and severity assessment of COVID-19. COVID-MTL learns different COVID-19 tasks in parallel through our novel random-weighted loss function, which assigns learning weights under Dirichlet distribution to prevent task dominance; our new 3D real-time augmentation algorithm (Shift3D) introduces space variances for 3D CNN components by shifting low-level feature representations of volumetric inputs in three dimensions; thereby, the MTL framework is able to accelerate convergence and improve joint learning performance compared to single-task models. By only using chest CT scans, COVID-MTL was trained on 930 CT scans and tested on separate 399 cases. COVID-MTL achieved AUCs of 0.939 and 0.846, and accuracies of 90.23% and 79.20% for detection of COVID-19 against radiology and NAT, respectively, which outperformed the state-of-the-art models. Meanwhile, COVID-MTL yielded AUC of 0.800 $\pm$ 0.020 and 0.813 $\pm$ 0.021 (with transfer learning) for classifying control/suspected, mild/regular, and severe/critically-ill cases. To decipher the recognition mechanism, we also identified high-throughput lung features that were significantly related (P < 0.001) to the positivity and severity of COVID-19. △ Less

Submitted 31 December, 2020; v1 submitted 10 December, 2020; originally announced December 2020.

Comments: COVID-19 research; computer vision and pattern recognition; 13 pages, 10 figures and 5 tables

arXiv:2009.12812 [pdf, other]

TernaryBERT: Distillation-aware Ultra-low Bit BERT

Authors: Wei Zhang, Lu Hou, Yichun Yin, Lifeng Shang, Xiao Chen, Xin Jiang, Qun Liu

Abstract: Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks.However, these models are both computation and memory expensive, hindering their deployment to resource-constrained devices. In this work, we propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model. Specifically, we use both approximation-based and los… ▽ More Transformer-based pre-training models like BERT have achieved remarkable performance in many natural language processing tasks.However, these models are both computation and memory expensive, hindering their deployment to resource-constrained devices. In this work, we propose TernaryBERT, which ternarizes the weights in a fine-tuned BERT model. Specifically, we use both approximation-based and loss-aware ternarization methods and empirically investigate the ternarization granularity of different parts of BERT. Moreover, to reduce the accuracy degradation caused by the lower capacity of low bits, we leverage the knowledge distillation technique in the training process. Experiments on the GLUE benchmark and SQuAD show that our proposed TernaryBERT outperforms the other BERT quantization methods, and even achieves comparable performance as the full-precision model while being 14.9x smaller. △ Less

Submitted 10 October, 2020; v1 submitted 27 September, 2020; originally announced September 2020.

Comments: Accepted by EMNLP 2020

arXiv:2008.02492 [pdf, other]

doi 10.1145/3394171.3413856

Zero-Shot Multi-View Indoor Localization via Graph Location Networks

Authors: Meng-Jiun Chiou, Zhenguang Liu, Yifang Yin, Anan Liu, Roger Zimmermann

Abstract: Indoor localization is a fundamental problem in location-based applications. Current approaches to this problem typically rely on Radio Frequency technology, which requires not only supporting infrastructures but human efforts to measure and calibrate the signal. Moreover, data collection for all locations is indispensable in existing methods, which in turn hinders their large-scale deployment. In… ▽ More Indoor localization is a fundamental problem in location-based applications. Current approaches to this problem typically rely on Radio Frequency technology, which requires not only supporting infrastructures but human efforts to measure and calibrate the signal. Moreover, data collection for all locations is indispensable in existing methods, which in turn hinders their large-scale deployment. In this paper, we propose a novel neural network based architecture Graph Location Networks (GLN) to perform infrastructure-free, multi-view image based indoor localization. GLN makes location predictions based on robust location representations extracted from images through message-passing networks. Furthermore, we introduce a novel zero-shot indoor localization setting and tackle it by extending the proposed GLN to a dedicated zero-shot version, which exploits a novel mechanism Map2Vec to train location-aware embeddings and make predictions on novel unseen locations. Our extensive experiments show that the proposed approach outperforms state-of-the-art methods in the standard setting, and achieves promising accuracy even in the zero-shot setting where data for half of the locations are not available. The source code and datasets are publicly available at https://github.com/coldmanck/zero-shot-indoor-localization-release. △ Less

Submitted 6 August, 2020; originally announced August 2020.

Comments: Accepted at ACM MM 2020. 10 pages, 7 figures. Code and datasets available at https://github.com/coldmanck/zero-shot-indoor-localization-release

ACM Class: I.2.10

Journal ref: Proceedings of the 28th ACM International Conference on Multimedia, 2020

arXiv:2007.06810 [pdf]

Ternary Policy Iteration Algorithm for Nonlinear Robust Control

Authors: Jie Li, Shengbo Eben Li, Yang Guan, **gliang Duan, Wenyu Li, Yuming Yin

Abstract: The uncertainties in plant dynamics remain a challenge for nonlinear control problems. This paper develops a ternary policy iteration (TPI) algorithm for solving nonlinear robust control problems with bounded uncertainties. The controller and uncertainty of the system are considered as game players, and the robust control problem is formulated as a two-player zero-sum differential game. In order t… ▽ More The uncertainties in plant dynamics remain a challenge for nonlinear control problems. This paper develops a ternary policy iteration (TPI) algorithm for solving nonlinear robust control problems with bounded uncertainties. The controller and uncertainty of the system are considered as game players, and the robust control problem is formulated as a two-player zero-sum differential game. In order to solve the differential game, the corresponding Hamilton-Jacobi-Isaacs (HJI) equation is then derived. Three loss functions and three update phases are designed to match the identity equation, minimization and maximization of the HJI equation, respectively. These loss functions are defined by the expectation of the approximate Hamiltonian in a generated state set to prevent operating all the states in the entire state set concurrently. The parameters of value function and policies are directly updated by diminishing the designed loss functions using the gradient descent method. Moreover, zero-initialization can be applied to the parameters of the control policy. The effectiveness of the proposed TPI algorithm is demonstrated through two simulation studies. The simulation results show that the TPI algorithm can converge to the optimal solution for the linear plant, and has high resistance to disturbances for the nonlinear plant. △ Less

Submitted 14 July, 2020; originally announced July 2020.

arXiv:2007.02070 [pdf, other]

Continuous-time finite-horizon ADP for automated vehicle controller design with high efficiency

Authors: Ziyu Lin, **gliang Duan, Shengbo Eben Li, Haitong Ma, Yuming Yin

Abstract: The design of an automated vehicle controller can be generally formulated into an optimal control problem. This paper proposes a continuous-time finite-horizon approximate dynamicprogramming (ADP) method, which can synthesis off-line near-optimal control policy with analytical vehicle dynamics. Lying on the general Policy Iteration framework, it employs value andpolicy neural networks to approxima… ▽ More The design of an automated vehicle controller can be generally formulated into an optimal control problem. This paper proposes a continuous-time finite-horizon approximate dynamicprogramming (ADP) method, which can synthesis off-line near-optimal control policy with analytical vehicle dynamics. Lying on the general Policy Iteration framework, it employs value andpolicy neural networks to approximate the map**s from thesystem states to value function and control inputs, respectively. The proposed method can converge to the near-optimal solutionof the finite-horizon Hamilton-Jacobi-Bellman (HJB) equation. We further applied our algorithm to the simulation of automated vehicle control for the path tracking maneuver. The results suggest that the proposed ADP method can obtain the near-optimal policy with 1% error and less calculation time. What is more, the proposed ADP algorithm is also suitable for nonlinear control systems, where ADP is almost 500 times faster than the nonlinear MPC ipopt solver. △ Less

Submitted 4 July, 2020; originally announced July 2020.

Comments: 7 pages,conference

arXiv:2006.08599 [pdf, other]

"Notic My Speech" -- Blending Speech Patterns With Multimedia

Authors: Dhruva Sahrawat, Yaman Kumar, Shashwat Aggarwal, Yifang Yin, Rajiv Ratn Shah, Roger Zimmermann

Abstract: Speech as a natural signal is composed of three parts - visemes (visual part of speech), phonemes (spoken part of speech), and language (the imposed structure). However, video as a medium for the delivery of speech and a multimedia construct has mostly ignored the cognitive aspects of speech delivery. For example, video applications like transcoding and compression have till now ignored the fact h… ▽ More Speech as a natural signal is composed of three parts - visemes (visual part of speech), phonemes (spoken part of speech), and language (the imposed structure). However, video as a medium for the delivery of speech and a multimedia construct has mostly ignored the cognitive aspects of speech delivery. For example, video applications like transcoding and compression have till now ignored the fact how speech is delivered and heard. To close the gap between speech understanding and multimedia video applications, in this paper, we show the initial experiments by modelling the perception on visual speech and showing its use case on video compression. On the other hand, in the visual speech recognition domain, existing studies have mostly modeled it as a classification problem, while ignoring the correlations between views, phonemes, visemes, and speech perception. This results in solutions which are further away from how human perception works. To bridge this gap, we propose a view-temporal attention mechanism to model both the view dependence and the visemic importance in speech recognition and understanding. We conduct experiments on three public visual speech recognition datasets. The experimental results show that our proposed method outperformed the existing work by 4.99% in terms of the viseme error rate. Moreover, we show that there is a strong correlation between our model's understanding of multi-view speech and the human perception. This characteristic benefits downstream applications such as video compression and streaming where a significant number of less important frames can be compressed or eliminated while being able to maximally preserve human speech understanding with good user experience. △ Less

Submitted 12 June, 2020; originally announced June 2020.

Comments: Under Review

arXiv:2005.08497 [pdf, other]

Attention-based Transducer for Online Speech Recognition

Authors: Bin Wang, Yan Yin, Hui Lin

Abstract: Recent studies reveal the potential of recurrent neural network transducer (RNN-T) for end-to-end (E2E) speech recognition. Among some most popular E2E systems including RNN-T, Attention Encoder-Decoder (AED), and Connectionist Temporal Classification (CTC), RNN-T has some clear advantages given that it supports streaming recognition and does not have frame-independency assumption. Although signif… ▽ More Recent studies reveal the potential of recurrent neural network transducer (RNN-T) for end-to-end (E2E) speech recognition. Among some most popular E2E systems including RNN-T, Attention Encoder-Decoder (AED), and Connectionist Temporal Classification (CTC), RNN-T has some clear advantages given that it supports streaming recognition and does not have frame-independency assumption. Although significant progresses have been made for RNN-T research, it is still facing performance challenges in terms of training speed and accuracy. We propose attention-based transducer with modification over RNN-T in two aspects. First, we introduce chunk-wise attention in the joint network. Second, self-attention is introduced in the encoder. Our proposed model outperforms RNN-T for both training speed and accuracy. For training, we achieves over 1.7x speedup. With 500 hours LAIX non-native English training data, attention-based transducer yields ~10.6% WER reduction over baseline RNN-T. Trained with full set of over 10K hours data, our final system achieves ~5.5% WER reduction over that trained with the best Kaldi TDNN-f recipe. After 8-bit weight quantization without WER degradation, RTF and latency drop to 0.34~0.36 and 268~409 milliseconds respectively on a single CPU core of a production server. △ Less

Submitted 18 May, 2020; originally announced May 2020.

Comments: submitted to Interspeech 2020

arXiv:2004.13577 [pdf, other]

Unifying Neural Learning and Symbolic Reasoning for Spinal Medical Report Generation

Authors: Zhongyi Han, Benzheng Wei, Yilong Yin, Shuo Li

Abstract: Automated medical report generation in spine radiology, i.e., given spinal medical images and directly create radiologist-level diagnosis reports to support clinical decision making, is a novel yet fundamental study in the domain of artificial intelligence in healthcare. However, it is incredibly challenging because it is an extremely complicated task that involves visual perception and high-level… ▽ More Automated medical report generation in spine radiology, i.e., given spinal medical images and directly create radiologist-level diagnosis reports to support clinical decision making, is a novel yet fundamental study in the domain of artificial intelligence in healthcare. However, it is incredibly challenging because it is an extremely complicated task that involves visual perception and high-level reasoning processes. In this paper, we propose the neural-symbolic learning (NSL) framework that performs human-like learning by unifying deep neural learning and symbolic logical reasoning for the spinal medical report generation. Generally speaking, the NSL framework firstly employs deep neural learning to imitate human visual perception for detecting abnormalities of target spinal structures. Concretely, we design an adversarial graph network that interpolates a symbolic graph reasoning module into a generative adversarial network through embedding prior domain knowledge, achieving semantic segmentation of spinal structures with high complexity and variability. NSL secondly conducts human-like symbolic logical reasoning that realizes unsupervised causal effect analysis of detected entities of abnormalities through meta-interpretive learning. NSL finally fills these discoveries of target diseases into a unified template, successfully achieving a comprehensive medical report generation. When it employed in a real-world clinical dataset, a series of empirical studies demonstrate its capacity on spinal medical report generation as well as show that our algorithm remarkably exceeds existing methods in the detection of spinal structures. These indicate its potential as a clinical tool that contributes to computer-aided diagnosis. △ Less

Submitted 28 April, 2020; originally announced April 2020.

Comments: Under review

arXiv:2002.02909 [pdf, other]

Domain Embedded Multi-model Generative Adversarial Networks for Image-based Face Inpainting

Authors: Xian Zhang, Xin Wang, Bin Kong, Canghong Shi, Youbing Yin, Qi Song, Siwei Lyu, Jiancheng Lv, Canghong Shi, Xiaojie Li

Abstract: Prior knowledge of face shape and structure plays an important role in face inpainting. However, traditional face inpainting methods mainly focus on the generated image resolution of the missing portion without consideration of the special particularities of the human face explicitly and generally produce discordant facial parts. To solve this problem, we present a domain embedded multi-model gene… ▽ More Prior knowledge of face shape and structure plays an important role in face inpainting. However, traditional face inpainting methods mainly focus on the generated image resolution of the missing portion without consideration of the special particularities of the human face explicitly and generally produce discordant facial parts. To solve this problem, we present a domain embedded multi-model generative adversarial model for inpainting of face images with large cropped regions. We firstly represent only face regions using the latent variable as the domain knowledge and combine it with the non-face parts textures to generate high-quality face images with plausible contents. Two adversarial discriminators are finally used to judge whether the generated distribution is close to the real distribution or not. It can not only synthesize novel image structures but also explicitly utilize the embedded face domain knowledge to generate better predictions with consistency on structures and appearance. Experiments on both CelebA and CelebA-HQ face datasets demonstrate that our proposed approach achieved state-of-the-art performance and generates higher quality inpainting results than existing ones. △ Less

Submitted 20 June, 2020; v1 submitted 5 February, 2020; originally announced February 2020.

arXiv:1910.08375 [pdf, other]

Detecting intracranial aneurysm rupture from 3D surfaces using a novel GraphNet approach

Authors: Z. Ma, L. Song, X. Feng, G. Yang, W. Zhu, J. Liu, Y. Zhang, X. Yang, Y. Yin

Abstract: Intracranial aneurysm (IA) is a life-threatening blood spot in human's brain if it ruptures and causes cerebral hemorrhage. It is challenging to detect whether an IA has ruptured from medical images. In this paper, we propose a novel graph based neural network named GraphNet to detect IA rupture from 3D surface data. GraphNet is based on graph convolution network (GCN) and is designed for graph-le… ▽ More Intracranial aneurysm (IA) is a life-threatening blood spot in human's brain if it ruptures and causes cerebral hemorrhage. It is challenging to detect whether an IA has ruptured from medical images. In this paper, we propose a novel graph based neural network named GraphNet to detect IA rupture from 3D surface data. GraphNet is based on graph convolution network (GCN) and is designed for graph-level classification and node-level segmentation. The network uses GCN blocks to extract surface local features and pools to global features. 1250 patient data including 385 ruptured and 865 unruptured IAs were collected from clinic for experiments. The performance on randomly selected 234 test patient data was reported. The experiment with the proposed GraphNet achieved accuracy of 0.82, area-under-curve (AUC) of receiver operating characteristic (ROC) curve 0.82 in the classification task, significantly outperforming the baseline approach without using graph based networks. The segmentation output of the model achieved mean graph-node-based dice coefficient (DSC) score 0.88. △ Less

Submitted 17 October, 2019; originally announced October 2019.

Comments: Submitted to ISBI 2020

arXiv:1907.01607 [pdf, other]

MIDI-Sandwich: Multi-model Multi-task Hierarchical Conditional VAE-GAN networks for Symbolic Single-track Music Generation

Authors: Xia Liang, Junmin Wu, Yan Yin

Abstract: Most existing neural network models for music generation explore how to generate music bars, then directly splice the music bars into a song. However, these methods do not explore the relationship between the bars, and the connected song as a whole has no musical form structure and sense of musical direction. To address this issue, we propose a Multi-model Multi-task Hierarchical Conditional VAE-G… ▽ More Most existing neural network models for music generation explore how to generate music bars, then directly splice the music bars into a song. However, these methods do not explore the relationship between the bars, and the connected song as a whole has no musical form structure and sense of musical direction. To address this issue, we propose a Multi-model Multi-task Hierarchical Conditional VAE-GAN (Variational Autoencoder-Generative adversarial networks) networks, named MIDI-Sandwich, which combines musical knowledge, such as musical form, tonic, and melodic motion. The MIDI-Sandwich has two submodels: Hierarchical Conditional Variational Autoencoder (HCVAE) and Hierarchical Conditional Generative Adversarial Network (HCGAN). The HCVAE uses hierarchical structure. The underlying layer of HCVAE uses Local Conditional Variational Autoencoder (L-CVAE) to generate a music bar which is pre-specified by the First and Last Notes (FLN). The upper layer of HCVAE uses Global Variational Autoencoder(G-VAE) to analyze the latent vector sequence generated by the L-CVAE encoder, to explore the musical relationship between the bars, and to produce the song pieced together by multiple music bars generated by the L-CVAE decoder, which makes the song both have musical structure and sense of direction. At the same time, the HCVAE shares a part of itself with the HCGAN to further improve the performance of the generated music. The MIDI-Sandwich is validated on the Nottingham dataset and is able to generate a single-track melody sequence (17x8 beats), which is superior to the length of most of the generated models (8 to 32 beats). Meanwhile, by referring to the experimental methods of many classical kinds of literature, the quality evaluation of the generated music is performed. The above experiments prove the validity of the model. △ Less

Submitted 4 July, 2019; v1 submitted 2 July, 2019; originally announced July 2019.

Comments: cast KSEM2019 on May 3, 2019 (weak rejected)

arXiv:1907.01367 [pdf, other]

Lipper: Synthesizing Thy Speech using Multi-View Lipreading

Authors: Yaman Kumar, Rohit Jain, Khwaja Mohd. Salik, Rajiv Ratn Shah, Yifang yin, Roger Zimmermann

Abstract: Lipreading has a lot of potential applications such as in the domain of surveillance and video conferencing. Despite this, most of the work in building lipreading systems has been limited to classifying silent videos into classes representing text phrases. However, there are multiple problems associated with making lipreading a text-based classification task like its dependence on a particular lan… ▽ More Lipreading has a lot of potential applications such as in the domain of surveillance and video conferencing. Despite this, most of the work in building lipreading systems has been limited to classifying silent videos into classes representing text phrases. However, there are multiple problems associated with making lipreading a text-based classification task like its dependence on a particular language and vocabulary map**. Thus, in this paper we propose a multi-view lipreading to audio system, namely Lipper, which models it as a regression task. The model takes silent videos as input and produces speech as the output. With multi-view silent videos, we observe an improvement over single-view speech reconstruction results. We show this by presenting an exhaustive set of experiments for speaker-dependent, out-of-vocabulary and speaker-independent settings. Further, we compare the delay values of Lipper with other speechreading systems in order to show the real-time nature of audio produced. We also perform a user study for the audios produced in order to understand the level of comprehensibility of audios produced using Lipper. △ Less

Submitted 28 June, 2019; originally announced July 2019.

Comments: Accepted at AAAI 2019

arXiv:1810.13088 [pdf]

Attention-based sequence-to-sequence model for speech recognition: development of state-of-the-art system on LibriSpeech and its application to non-native English

Authors: Yan Yin, Ramon Prieto, Bin Wang, Jianwei Zhou, Yiwei Gu, Yang Liu, Hui Lin

Abstract: Recent research has shown that attention-based sequence-to-sequence models such as Listen, Attend, and Spell (LAS) yield comparable results to state-of-the-art ASR systems on various tasks. In this paper, we describe the development of such a system and demonstrate its performance on two tasks: first we achieve a new state-of-the-art word error rate of 3.43% on the test clean subset of LibriSpeech… ▽ More Recent research has shown that attention-based sequence-to-sequence models such as Listen, Attend, and Spell (LAS) yield comparable results to state-of-the-art ASR systems on various tasks. In this paper, we describe the development of such a system and demonstrate its performance on two tasks: first we achieve a new state-of-the-art word error rate of 3.43% on the test clean subset of LibriSpeech English data; second on non-native English speech, including both read speech and spontaneous speech, we obtain very competitive results compared to a conventional system built with the most updated Kaldi recipe. △ Less

Submitted 5 November, 2018; v1 submitted 30 October, 2018; originally announced October 2018.

arXiv:1804.06586 [pdf, other]

Composite Adaptive Control for Bilateral Teleoperation Systems without Persistency of Excitation

Authors: Yuling Li, Yixin Yin, Sen Zhang, Jie Dong, Rolf Johansson

Abstract: Composite adaptive control schemes, which use both the system tracking errors and the prediction error to drive the update laws, have become widespread in achieving an improvement of system performance. However, a strong persistent-excitation (PE) condition should be satisfied to guarantee the parameter convergence. This paper proposes a novel composite adaptive control to guarantee parameter conv… ▽ More Composite adaptive control schemes, which use both the system tracking errors and the prediction error to drive the update laws, have become widespread in achieving an improvement of system performance. However, a strong persistent-excitation (PE) condition should be satisfied to guarantee the parameter convergence. This paper proposes a novel composite adaptive control to guarantee parameter convergence without PE condition for nonlinear teleoperation systems with dynamic uncertainties and time-varying communication delays. The stability criteria of the closed-loop teleoperation system are given in terms of linear matrix inequalities. New tracking performance measures are proposed to evaluate the position tracking between the master and the slave. Simulation studies are given to show the effectiveness of the proposed method. △ Less

Submitted 18 April, 2018; originally announced April 2018.

Comments: 21 pages, 9 figures, submitted to Journal of The Franklin Institute

arXiv:1804.04290 [pdf, other]

Bilateral Teleoperation of Multiple Robots under Scheduling Communication

Authors: Yuling Li, Kun Liu, Wei He, Yixin Yin, Rolf Johansson, Kai Zhang

Abstract: In this paper, bilateral teleoperation of multiple slaves coupled to a single master under scheduling communication is investigated. The sampled-data transmission between the master and the multiple slaves is fulfilled over a delayed communication network, and at each sampling instant, only one slave is allowed to transmit its current information to the master side according to some scheduling pro… ▽ More In this paper, bilateral teleoperation of multiple slaves coupled to a single master under scheduling communication is investigated. The sampled-data transmission between the master and the multiple slaves is fulfilled over a delayed communication network, and at each sampling instant, only one slave is allowed to transmit its current information to the master side according to some scheduling protocols. To achieve the master-slave synchronization, Round-Robin scheduling protocol and Try-Once-Discard scheduling protocol are employed, respectively. By designing a scheduling-communication-based controller, some sufficient stability criteria related to the controller gain matrices, sampling intervals, and communication delays are obtained for the closed-loop teleoperation system under Round-Robin and Try-Once-Discard scheduling protocols, respectively. Finally, simulation studies are given to validate the effectiveness of the proposed results. △ Less

Submitted 11 April, 2018; originally announced April 2018.

Comments: 13 pages, 12 figures, 4 tables, submitted to IEEE Transactions on Control Systems Technology

Showing 1–39 of 39 results for author: yin, Y