Search | arXiv e-print repository

Flexible Active Safety Motion Control for Robotic Obstacle Avoidance: A CBF-Guided MPC Approach

Authors: **hao Liu, Jun Yang, Jianliang Mao, Tianqi Zhu, Qihang Xie, Yimeng Li, Xiangyu Wang, Shihua Li

Abstract: A flexible active safety motion (FASM) control approach is proposed for the avoidance of dynamic obstacles and the reference tracking in robot manipulators. The distinctive feature of the proposed method lies in its utilization of control barrier functions (CBF) to design flexible CBF-guided safety criteria (CBFSC) with dynamically optimized decay rates, thereby offering flexibility and active saf… ▽ More A flexible active safety motion (FASM) control approach is proposed for the avoidance of dynamic obstacles and the reference tracking in robot manipulators. The distinctive feature of the proposed method lies in its utilization of control barrier functions (CBF) to design flexible CBF-guided safety criteria (CBFSC) with dynamically optimized decay rates, thereby offering flexibility and active safety for robot manipulators in dynamic environments. First, discrete-time CBFs are employed to formulate the novel flexible CBFSC with dynamic decay rates for robot manipulators. Following that, the model predictive control (MPC) philosophy is applied, integrating flexible CBFSC as safety constraints into the receding-horizon optimization problem. Significantly, the decay rates of the designed CBFSC are incorporated as decision variables in the optimization problem, facilitating the dynamic enhancement of flexibility during the obstacle avoidance process. In particular, a novel cost function that integrates a penalty term is designed to dynamically adjust the safety margins of the CBFSC. Finally, experiments are conducted in various scenarios using a Universal Robots 5 (UR5) manipulator to validate the effectiveness of the proposed approach. △ Less

Submitted 20 May, 2024; originally announced May 2024.

Comments: 11 pages, 11 figures

arXiv:2403.19971 [pdf, other]

3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

Authors: Yafeng Chen, Siqi Zheng, Hui Wang, Luyao Cheng, Tinglong Zhu, Changhe Song, Rongjie Huang, Ziyang Ma, Qian Chen, Shiliang Zhang, Xihao Li

Abstract: This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization. It is designed for the needs of academic researchers and industrial practitioners. The 3D-Speaker-Toolkit adeptly leverages the combined strengths of acoustic, semantic, and visual data, seamlessly fusing these modalities to offer robust speaker recognition capabilities. The acous… ▽ More This paper introduces 3D-Speaker-Toolkit, an open source toolkit for multi-modal speaker verification and diarization. It is designed for the needs of academic researchers and industrial practitioners. The 3D-Speaker-Toolkit adeptly leverages the combined strengths of acoustic, semantic, and visual data, seamlessly fusing these modalities to offer robust speaker recognition capabilities. The acoustic module extracts speaker embeddings from acoustic features, employing both fully-supervised and self-supervised learning approaches. The semantic module leverages advanced language models to apprehend the substance and context of spoken language, thereby augmenting the system's proficiency in distinguishing speakers through linguistic patterns. Finally, the visual module applies image processing technologies to scrutinize facial features, which bolsters the precision of speaker diarization in multi-speaker environments. Collectively, these modules empower the 3D-Speaker-Toolkit to attain elevated levels of accuracy and dependability in executing speaker-related tasks, establishing a new benchmark in multi-modal speaker analysis. The 3D-Speaker project also includes a handful of open-sourced state-of-the-art models and a large dataset containing over 10,000 speakers. The toolkit is publicly available at https://github.com/alibaba-damo-academy/3D-Speaker. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2402.14543 [pdf]

Low-frequency Resonances in Grid-Forming Converters: Causes and Dam** Control

Authors: Fangzhou Zhao, Tianhua Zhu, Zejie Li, Xiongfei Wang

Abstract: Grid-forming voltage-source converter (GFM-VSC) may experience low-frequency resonances, such as synchronous resonance (SR) and sub-synchronous resonance (SSR), in the output power. This paper offers a comprehensive study on the root causes of low-frequency resonances with GFM-VSC systems and the dam** control methods. The typical GFM control structures are introduced first, along with a map**… ▽ More Grid-forming voltage-source converter (GFM-VSC) may experience low-frequency resonances, such as synchronous resonance (SR) and sub-synchronous resonance (SSR), in the output power. This paper offers a comprehensive study on the root causes of low-frequency resonances with GFM-VSC systems and the dam** control methods. The typical GFM control structures are introduced first, along with a map** between the resonances and control loops. Then, the causes of SR and SSR are discussed, highlighting the impacts of control interactions on the resonances. Further, the recent advancements in stabilizing control methods for SR and SSR are critically reviewed with experimental tests of a GFM-VSC under different grid conditions. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2401.10345 [pdf, other]

Attack and Defense Analysis of Learned Image Compression

Authors: Tianyu Zhu, Heming Sun, Xiankui Xiong, Xuanpeng Zhu, Yong Gong, Minge **g, Yibo Fan

Abstract: Learned image compression (LIC) is becoming more and more popular these years with its high efficiency and outstanding compression quality. Still, the practicality against modified inputs added with specific noise could not be ignored. White-box attacks such as FGSM and PGD use only gradient to compute adversarial images that mislead LIC models to output unexpected results. Our experiments compare… ▽ More Learned image compression (LIC) is becoming more and more popular these years with its high efficiency and outstanding compression quality. Still, the practicality against modified inputs added with specific noise could not be ignored. White-box attacks such as FGSM and PGD use only gradient to compute adversarial images that mislead LIC models to output unexpected results. Our experiments compare the effects of different dimensions such as attack methods, models, qualities, and targets, concluding that in the worst case, there is a 61.55% decrease in PSNR or a 19.15 times increase in bpp under the PGD attack. To improve their robustness, we conduct adversarial training by adding adversarial images into the training datasets, which obtains a 95.52% decrease in the R-D cost of the most vulnerable LIC model. We further test the robustness of H.266, whose better performance on reconstruction quality extends its possibility to defend one-step or iterative adversarial attacks. △ Less

Submitted 27 March, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.00225 [pdf]

Enhancing dysarthria speech feature representation with empirical mode decomposition and Walsh-Hadamard transform

Authors: Ting Zhu, Shufei Duan, Camille Dingam, Huizhi Liang, Wei Zhang

Abstract: Dysarthria speech contains the pathological characteristics of vocal tract and vocal fold, but so far, they have not yet been included in traditional acoustic feature sets. Moreover, the nonlinearity and non-stationarity of speech have been ignored. In this paper, we propose a feature enhancement algorithm for dysarthria speech called WHFEMD. It combines empirical mode decomposition (EMD) and fast… ▽ More Dysarthria speech contains the pathological characteristics of vocal tract and vocal fold, but so far, they have not yet been included in traditional acoustic feature sets. Moreover, the nonlinearity and non-stationarity of speech have been ignored. In this paper, we propose a feature enhancement algorithm for dysarthria speech called WHFEMD. It combines empirical mode decomposition (EMD) and fast Walsh-Hadamard transform (FWHT) to enhance features. With the proposed algorithm, the fast Fourier transform of the dysarthria speech is first performed and then followed by EMD to get intrinsic mode functions (IMFs). After that, FWHT is used to output new coefficients and to extract statistical features based on IMFs, power spectral density, and enhanced gammatone frequency cepstral coefficients. To evaluate the proposed approach, we conducted experiments on two public pathological speech databases including UA Speech and TORGO. The results show that our algorithm performed better than traditional features in classification. We achieved improvements of 13.8% (UA Speech) and 3.84% (TORGO), respectively. Furthermore, the incorporation of an imbalanced classification algorithm to address data imbalance has resulted in a 12.18% increase in recognition accuracy. This algorithm effectively addresses the challenges of the imbalanced dataset and non-linearity in dysarthric speech and simultaneously provides a robust representation of the local pathological features of the vocal folds and tracts. △ Less

Submitted 30 December, 2023; originally announced January 2024.

arXiv:2312.15153 [pdf]

Design and Implementation Considerations for a Virtual File System Using an Inode Data Structure

Authors: Qin Sun, Grace McKenzie, Guanqun Song, Ting Zhu

Abstract: Virtual file systems are a tool to centralize and mobilize a file system that could otherwise be complex and consist of multiple hierarchies, hard disks, and more. In this paper, we discuss the design of Unix-based file systems and how this type of file system layout using inode data structures and a disk emulator can be implemented as a single-file virtual file system in Linux. We explore the way… ▽ More Virtual file systems are a tool to centralize and mobilize a file system that could otherwise be complex and consist of multiple hierarchies, hard disks, and more. In this paper, we discuss the design of Unix-based file systems and how this type of file system layout using inode data structures and a disk emulator can be implemented as a single-file virtual file system in Linux. We explore the ways that virtual file systems are vulnerable to security attacks and introduce straightforward solutions that can be implemented to help prevent or mitigate the consequences of such attacks. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.08998 [pdf]

Design, construction and evaluation of emotional multimodal pathological speech database

Authors: Ting Zhu, Shufei Duan, Huizhi Liang, Wei Zhang

Abstract: The lack of an available emotion pathology database is one of the key obstacles in studying the emotion expression status of patients with dysarthria. The first Chinese multimodal emotional pathological speech database containing multi-perspective information is constructed in this paper. It includes 29 controls and 39 patients with different degrees of motor dysarthria, expressing happy, sad, ang… ▽ More The lack of an available emotion pathology database is one of the key obstacles in studying the emotion expression status of patients with dysarthria. The first Chinese multimodal emotional pathological speech database containing multi-perspective information is constructed in this paper. It includes 29 controls and 39 patients with different degrees of motor dysarthria, expressing happy, sad, angry and neutral emotions. All emotional speech was labeled for intelligibility, types and discrete dimensional emotions by developed WeChat mini-program. The subjective analysis justifies from emotion discrimination accuracy, speech intelligibility, valence-arousal spatial distribution, and correlation between SCL-90 and disease severity. The automatic recognition tested on speech and glottal data, with average accuracy of 78% for controls and 60% for patients in audio, while 51% for controls and 38% for patients in glottal data, indicating an influence of the disease on emotional expression. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2312.01785 [pdf]

Closed-Form Solutions for Grid-Forming Converters: A Design-Oriented Study

Authors: Fangzhou Zhao, Tianhua Zhu, Lennart Harnefors, Bo Fan, Heng Wu, Zichao Zhou, Yin Sun, Xiongfei Wang

Abstract: This paper derives closed-form solutions for grid-forming converters with power synchronization control (PSC) by subtly simplifying and factorizing the complex closed-loop models. The solutions can offer clear analytical insights into control-loop interactions, enabling guidelines for robust controller design. It is proved that 1) the proportional gains of PSC and alternating voltage control (AVC)… ▽ More This paper derives closed-form solutions for grid-forming converters with power synchronization control (PSC) by subtly simplifying and factorizing the complex closed-loop models. The solutions can offer clear analytical insights into control-loop interactions, enabling guidelines for robust controller design. It is proved that 1) the proportional gains of PSC and alternating voltage control (AVC) can introduce negative resistance, which aggravates synchronous resonance (SR) of power control, 2) the integral gain of AVC is the cause of sub-synchronous resonance (SSR) in stiff-grid interconnections, albeit the proportional gain of AVC can help dampen the SSR, and 3) surprisingly, the current controller that dampens SR actually exacerbates SSR. Controller design guidelines are given based on analytical insights. The findings are verified by simulations and experimental results. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2307.03898

StyleGAN3: Generative Networks for Improving the Equivariance of Translation and Rotation

Authors: Tianlei Zhu, Junqi Chen, Renzhe Zhu, Gaurav Gupta

Abstract: StyleGAN can use style to affect facial posture and identity features, and noise to affect hair, wrinkles, skin color and other details. Among these, the outcomes of the picture processing will vary slightly between different versions of styleGAN. As a result, the comparison of performance differences between styleGAN2 and the two modified versions of styleGAN3 will be the main focus of this study… ▽ More StyleGAN can use style to affect facial posture and identity features, and noise to affect hair, wrinkles, skin color and other details. Among these, the outcomes of the picture processing will vary slightly between different versions of styleGAN. As a result, the comparison of performance differences between styleGAN2 and the two modified versions of styleGAN3 will be the main focus of this study. We used the FFHQ dataset as the dataset and FID, EQ-T, and EQ-R were used to be the assessment of the model. In the end, we discovered that Stylegan3 version is a better generative network to improve the equivariance. Our findings have a positive impact on the creation of animation and videos. △ Less

Submitted 5 February, 2024; v1 submitted 8 July, 2023; originally announced July 2023.

Comments: But now we feel we haven't fully studied our work and have found some new great results. So after careful consideration, we're going to rework this manuscript and try to give a more accurate model

arXiv:2306.02913 [pdf, other]

Decentralized SGD and Average-direction SAM are Asymptotically Equivalent

Authors: Tongtian Zhu, Fengxiang He, Kaixuan Chen, Mingli Song, Dacheng Tao

Abstract: Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server. However, existing theories claim that decentralization invariably undermines generalization. In this paper, we challenge the conventional belief and present a completely new perspective for understanding decentralized learning. We prove that D-S… ▽ More Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server. However, existing theories claim that decentralization invariably undermines generalization. In this paper, we challenge the conventional belief and present a completely new perspective for understanding decentralized learning. We prove that D-SGD implicitly minimizes the loss function of an average-direction Sharpness-aware minimization (SAM) algorithm under general non-convex non-$β$-smooth settings. This surprising asymptotic equivalence reveals an intrinsic regularization-optimization trade-off and three advantages of decentralization: (1) there exists a free uncertainty evaluation mechanism in D-SGD to improve posterior estimation; (2) D-SGD exhibits a gradient smoothing effect; and (3) the sharpness regularization effect of D-SGD does not decrease as total batch size increases, which justifies the potential generalization benefit of D-SGD over centralized SGD (C-SGD) in large-batch scenarios. The code is available at https://github.com/Raiden-Zhu/ICML-2023-DSGD-and-SAM. △ Less

Submitted 9 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 40th International Conference on Machine Learning (ICML 2023)

arXiv:2212.14418 [pdf]

Heterogeneous Computing Systems

Authors: Dimple P. Khatri, Guanqun Song, Ting Zhu

Abstract: This survey of heterogeneous computing systems will help in analyzing the technological trends that will be at the basis of heterogeneous computing systems, highlighting the major opportunities and challenges such technologies will bring with them. This will help to understand the importance of heterogeneous computing systems, which are becoming common architectural elements of not only the modern… ▽ More This survey of heterogeneous computing systems will help in analyzing the technological trends that will be at the basis of heterogeneous computing systems, highlighting the major opportunities and challenges such technologies will bring with them. This will help to understand the importance of heterogeneous computing systems, which are becoming common architectural elements of not only the modern data centers but also highly integrated devices (IoT). Identify problems related to it, such as the resource allocation problem, middleware, processing architectures, programming challenges, etc. from the perspective of heterogeneous resources. △ Less

Submitted 29 December, 2022; originally announced December 2022.

arXiv:2212.08601 [pdf, other]

Source Tracing: Detecting Voice Spoofing

Authors: Tinglong Zhu, Xingming Wang, Xiaoyi Qin, Ming Li

Abstract: Recent anti-spoofing systems focus on spoofing detection, where the task is only to determine whether the test audio is fake. However, there are few studies putting attention to identifying the methods of generating fake speech. Common spoofing attack algorithms in the logical access (LA) scenario, such as voice conversion and speech synthesis, can be divided into several stages: input processing,… ▽ More Recent anti-spoofing systems focus on spoofing detection, where the task is only to determine whether the test audio is fake. However, there are few studies putting attention to identifying the methods of generating fake speech. Common spoofing attack algorithms in the logical access (LA) scenario, such as voice conversion and speech synthesis, can be divided into several stages: input processing, conversion, waveform generation, etc. In this work, we propose a system for classifying different spoofing attributes, representing characteristics of different modules in the whole pipeline. Classifying attributes for the spoofing attack other than determining the whole spoofing pipeline can make the system more robust when encountering complex combinations of different modules at different stages. In addition, our system can also be used as an auxiliary system for anti-spoofing against unseen spoofing methods. The experiments are conducted on ASVspoof 2019 LA data set and the proposed method achieved a 20\% relative improvement against conventional binary spoof detection methods. △ Less

Submitted 16 December, 2022; originally announced December 2022.

Comments: Accepted by APSIPA ASC

arXiv:2202.05397 [pdf, ps, other]

Neural Architecture Search for Energy Efficient Always-on Audio Models

Authors: Daniel T. Speckhard, Karolis Misiunas, Sagi Perel, Tenghui Zhu, Simon Carlile, Malcolm Slaney

Abstract: Mobile and edge computing devices for always-on classification tasks require energy-efficient neural network architectures. In this paper we present several changes to neural architecture searches (NAS) that improve the chance of success in practical situations. Our search simultaneously optimizes for network accuracy, energy efficiency and memory usage. We benchmark the performance of our search… ▽ More Mobile and edge computing devices for always-on classification tasks require energy-efficient neural network architectures. In this paper we present several changes to neural architecture searches (NAS) that improve the chance of success in practical situations. Our search simultaneously optimizes for network accuracy, energy efficiency and memory usage. We benchmark the performance of our search on real hardware, but since running thousands of tests with real hardware is difficult we use a random forest model to roughly predict the energy usage of a candidate network. We present a search strategy that uses both Bayesian and regularized evolutionary search with particle swarms, and employs early-stop** to reduce the computational burden. Our search, evaluated on a sound-event classification dataset based upon AudioSet, results in an order of magnitude less energy per inference and a much smaller memory footprint than our baseline MobileNetV1/V2 implementations while slightly improving task accuracy. We also demonstrate how combining a 2D spectrogram with a convolution with many filters causes a computational bottleneck for audio classification and that alternative approaches reduce the computational burden but sacrifice task accuracy. △ Less

Submitted 1 June, 2023; v1 submitted 9 February, 2022; originally announced February 2022.

arXiv:2109.10499 [pdf, other]

Joint Optical Neuroimaging Denoising with Semantic Tasks

Authors: Tianfang Zhu, Yue Guan, Anan Li

Abstract: Optical neuroimaging is a vital tool for understanding the brain structure and the connection between regions and nuclei. However, the image noise introduced in the sample preparation and the imaging system hinders the extraction of the possible knowlege from the dataset, thus denoising for the optical neuroimaging is usually necessary. The supervised denoisng methods often outperform the unsuperv… ▽ More Optical neuroimaging is a vital tool for understanding the brain structure and the connection between regions and nuclei. However, the image noise introduced in the sample preparation and the imaging system hinders the extraction of the possible knowlege from the dataset, thus denoising for the optical neuroimaging is usually necessary. The supervised denoisng methods often outperform the unsupervised ones, but the training of the supervised denoising models needs the corresponding clean labels, which is not always avaiable due to the high labeling cost. On the other hand, those semantic labels, such as the located soma positions, the reconstructed neuronal fibers, and the nuclei segmentation result, are generally available and accumulated from everyday neuroscience research. This work connects a supervised denoising and a semantic segmentation model together to form a end-to-end model, which can make use of the semantic labels while still provides a denoised image as an intermediate product. We use both the supervised and the self-supervised models for the denoising and introduce a new cost term for the joint denoising and the segmentation setup. We test the proposed approach on both the synthetic data and the real-world data, including the optical neuroimaing dataset and the electron microscope dataset. The result shows that the joint denoising result outperforms the one using the denoising method alone and the joint model benefits the segmentation and other downstream task as well. △ Less

Submitted 21 September, 2021; originally announced September 2021.

arXiv:2105.08876 [pdf, other]

doi 10.1016/j.engappai.2023.107180

A Lightweight Privacy-Preserving Scheme Using Label-based Pixel Block Mixing for Image Classification in Deep Learning

Authors: Yuexin Xiang, Tiantian Li, Wei Ren, Tianqing Zhu, Kim-Kwang Raymond Choo

Abstract: To ensure the privacy of sensitive data used in the training of deep learning models, a number of privacy-preserving methods have been designed by the research community. However, existing schemes are generally designed to work with textual data, or are not efficient when a large number of images is used for training. Hence, in this paper we propose a lightweight and efficient approach to preserve… ▽ More To ensure the privacy of sensitive data used in the training of deep learning models, a number of privacy-preserving methods have been designed by the research community. However, existing schemes are generally designed to work with textual data, or are not efficient when a large number of images is used for training. Hence, in this paper we propose a lightweight and efficient approach to preserve image privacy while maintaining the availability of the training set. Specifically, we design the pixel block mixing algorithm for image classification privacy preservation in deep learning. To evaluate its utility, we use the mixed training set to train the ResNet50, VGG16, InceptionV3 and DenseNet121 models on the WIKI dataset and the CNBC face dataset. Experimental findings on the testing set show that our scheme preserves image privacy while maintaining the availability of the training set in the deep learning models. Additionally, the experimental results demonstrate that we achieve good performance for the VGG16 model on the WIKI dataset and both ResNet50 and DenseNet121 on the CNBC dataset. The pixel block algorithm achieves fairly high efficiency in the mixing of the images, and it is computationally challenging for the attackers to restore the mixed training set to the original training set. Moreover, data augmentation can be applied to the mixed training set to improve the training's effectiveness. △ Less

Submitted 18 May, 2021; originally announced May 2021.

Comments: 11 pages, 16 figures

MSC Class: 68T07 ACM Class: I.2.6; I.2.9

Journal ref: Engineering Applications of Artificial Intelligence 126 (2023): 107180

arXiv:2104.02306 [pdf, other]

Binary Neural Network for Speaker Verification

Authors: Tinglong Zhu, Xiaoyi Qin, Ming Li

Abstract: Although deep neural networks are successful for many tasks in the speech domain, the high computational and memory costs of deep neural networks make it difficult to directly deploy highperformance Neural Network systems on low-resource embedded devices. There are several mechanisms to reduce the size of the neural networks i.e. parameter pruning, parameter quantization, etc. This paper focuses o… ▽ More Although deep neural networks are successful for many tasks in the speech domain, the high computational and memory costs of deep neural networks make it difficult to directly deploy highperformance Neural Network systems on low-resource embedded devices. There are several mechanisms to reduce the size of the neural networks i.e. parameter pruning, parameter quantization, etc. This paper focuses on how to apply binary neural networks to the task of speaker verification. The proposed binarization of training parameters can largely maintain the performance while significantly reducing storage space requirements and computational costs. Experiment results show that, after binarizing the Convolutional Neural Network, the ResNet34-based network achieves an EER of around 5% on the Voxceleb1 testing dataset and even outperforms the traditional real number network on the text-dependent dataset: Xiaole while having a 32x memory saving. △ Less

Submitted 6 April, 2021; originally announced April 2021.

arXiv:2104.02054 [pdf, ps, other]

DeepMI: Deep Multi-lead ECG Fusion for Identifying Myocardial Infarction and its Occurrence-time

Authors: Girmaw Abebe Tadesse, Hamza Javed, Yong Liu, ** Liu, Jiyan Chen, Komminist Weldemariam, Tingting Zhu

Abstract: Myocardial Infarction (MI) has the highest mortality of all cardiovascular diseases (CVDs). Detection of MI and information regarding its occurrence-time in particular, would enable timely interventions that may improve patient outcomes, thereby reducing the global rise in CVD deaths. Electrocardiogram (ECG) recordings are currently used to screen MI patients. However, manual inspection of ECGs is… ▽ More Myocardial Infarction (MI) has the highest mortality of all cardiovascular diseases (CVDs). Detection of MI and information regarding its occurrence-time in particular, would enable timely interventions that may improve patient outcomes, thereby reducing the global rise in CVD deaths. Electrocardiogram (ECG) recordings are currently used to screen MI patients. However, manual inspection of ECGs is time-consuming and prone to subjective bias. Machine learning methods have been adopted for automated ECG diagnosis, but most approaches require extraction of ECG beats or consider leads independently of one another. We propose an end-to-end deep learning approach, DeepMI, to classify MI from normal cases as well as identifying the time-occurrence of MI (defined as acute, recent and old), using a collection of fusion strategies on 12 ECG leads at data-, feature-, and decision-level. In order to minimise computational overhead, we employ transfer learning using existing computer vision networks. Moreover, we use recurrent neural networks to encode the longitudinal information inherent in ECGs. We validated DeepMI on a dataset collected from 17,381 patients, in which over 323,000 samples were extracted per ECG lead. We were able to classify normal cases as well as acute, recent and old onset cases of MI, with AUROCs of 96.7%, 82.9%, 68.6% and 73.8%, respectively. We have demonstrated a multi-lead fusion approach to detect the presence and occurrence-time of MI. Our end-to-end framework provides flexibility for different levels of multi-lead ECG fusion and performs feature extraction via transfer learning. △ Less

Submitted 31 March, 2021; originally announced April 2021.

Comments: 10 pages

arXiv:2012.15387

Indoor Air Quality Improvement

Authors: A**kya Gawade, Aniket Sanap, Vishal Baviskar, Ryan Jahnige, Qingquan Zhang, Ting Zhu

Abstract: Poor indoor air quality can contribute to the development of various chronic respiratory diseases such as asthma, heart disease, and lung cancer. Since air quality is extremely difficult for humans to detect though sensory processing, there is a need for efficient ventilation systems that can provide a healthier environment. In this paper, we have designed an energy efficient ventilation system th… ▽ More Poor indoor air quality can contribute to the development of various chronic respiratory diseases such as asthma, heart disease, and lung cancer. Since air quality is extremely difficult for humans to detect though sensory processing, there is a need for efficient ventilation systems that can provide a healthier environment. In this paper, we have designed an energy efficient ventilation system that predicts sensor occupancy patterns based on historical data to improve indoor air quality. △ Less

Submitted 1 January, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

Comments: Evaluation is incomplete. Difference in air quality improvement when outdoor air is used as opposed to circulating indoor air is not considered

arXiv:2012.04192 [pdf]

Benchmarking Resource Usage of Underlying Datatypes of Apache Spark

Authors: Brittany Nicholls, Mariama Adangwa, Rachel Estes, Hugues Nelson Iradukunda, Qingquan Zhang, Ting Zhu

Abstract: The purpose of this paper is to examine how resource usage of an analytic is affected by the different underlying datatypes of Spark analytics - Resilient Distributed Datasets (RDDs), Datasets, and DataFrames. The resource usage of an analytic is explored as a viable and preferred alternative of benchmarking big data analytics instead of the current common benchmarking performed using execution ti… ▽ More The purpose of this paper is to examine how resource usage of an analytic is affected by the different underlying datatypes of Spark analytics - Resilient Distributed Datasets (RDDs), Datasets, and DataFrames. The resource usage of an analytic is explored as a viable and preferred alternative of benchmarking big data analytics instead of the current common benchmarking performed using execution time. The run time of an analytic is shown to not be guaranteed to be a reproducible metric since many external factors to the job can affect the execution time. Instead, metrics readily available through Spark including peak execution memory are used to benchmark the resource usage of these different datatypes in common applications of Spark analytics, such as counting, caching, repartitioning, and KMeans. △ Less

Submitted 7 December, 2020; originally announced December 2020.

arXiv:2011.14230 [pdf, other]

CROCS: Clustering and Retrieval of Cardiac Signals Based on Patient Disease Class, Sex, and Age

Authors: Dani Kiyasseh, Tingting Zhu, David A. Clifton

Abstract: The process of manually searching for relevant instances in, and extracting information from, clinical databases underpin a multitude of clinical tasks. Such tasks include disease diagnosis, clinical trial recruitment, and continuing medical education. This manual search-and-extract process, however, has been hampered by the growth of large-scale clinical databases and the increased prevalence of… ▽ More The process of manually searching for relevant instances in, and extracting information from, clinical databases underpin a multitude of clinical tasks. Such tasks include disease diagnosis, clinical trial recruitment, and continuing medical education. This manual search-and-extract process, however, has been hampered by the growth of large-scale clinical databases and the increased prevalence of unlabelled instances. To address this challenge, we propose a supervised contrastive learning framework, CROCS, where representations of cardiac signals associated with a set of patient-specific attributes (e.g., disease class, sex, age) are attracted to learnable embeddings entitled clinical prototypes. We exploit such prototypes for both the clustering and retrieval of unlabelled cardiac signals based on multiple patient attributes. We show that CROCS outperforms the state-of-the-art method, DTC, when clustering and also retrieves relevant cardiac signals from a large database. We also show that clinical prototypes adopt a semantically meaningful arrangement based on patient attributes and thus confer a high degree of interpretability. △ Less

Submitted 3 October, 2021; v1 submitted 28 November, 2020; originally announced November 2020.

Comments: Accepted at Advances in Neural Information Processing Systems (NeurIPS) 2021

arXiv:2011.14227 [pdf, other]

PCPs: Patient Cardiac Prototypes

Authors: Dani Kiyasseh, Tingting Zhu, David A. Clifton

Abstract: Many clinical deep learning algorithms are population-based and difficult to interpret. Such properties limit their clinical utility as population-based findings may not generalize to individual patients and physicians are reluctant to incorporate opaque models into their clinical workflow. To overcome these obstacles, we propose to learn patient-specific embeddings, entitled patient cardiac proto… ▽ More Many clinical deep learning algorithms are population-based and difficult to interpret. Such properties limit their clinical utility as population-based findings may not generalize to individual patients and physicians are reluctant to incorporate opaque models into their clinical workflow. To overcome these obstacles, we propose to learn patient-specific embeddings, entitled patient cardiac prototypes (PCPs), that efficiently summarize the cardiac state of the patient. To do so, we attract representations of multiple cardiac signals from the same patient to the corresponding PCP via supervised contrastive learning. We show that the utility of PCPs is multifold. First, they allow for the discovery of similar patients both within and across datasets. Second, such similarity can be leveraged in conjunction with a hypernetwork to generate patient-specific parameters, and in turn, patient-specific diagnoses. Third, we find that PCPs act as a compact substitute for the original dataset, allowing for dataset distillation. △ Less

Submitted 28 November, 2020; originally announced November 2020.

arXiv:2005.13249 [pdf, other]

CLOCS: Contrastive Learning of Cardiac Signals Across Space, Time, and Patients

Authors: Dani Kiyasseh, Tingting Zhu, David A. Clifton

Abstract: The healthcare industry generates troves of unlabelled physiological data. This data can be exploited via contrastive learning, a self-supervised pre-training method that encourages representations of instances to be similar to one another. We propose a family of contrastive learning methods, CLOCS, that encourages representations across space, time, \textit{and} patients to be similar to one anot… ▽ More The healthcare industry generates troves of unlabelled physiological data. This data can be exploited via contrastive learning, a self-supervised pre-training method that encourages representations of instances to be similar to one another. We propose a family of contrastive learning methods, CLOCS, that encourages representations across space, time, \textit{and} patients to be similar to one another. We show that CLOCS consistently outperforms the state-of-the-art methods, BYOL and SimCLR, when performing a linear evaluation of, and fine-tuning on, downstream tasks. We also show that CLOCS achieves strong generalization performance with only 25\% of labelled training data. Furthermore, our training procedure naturally generates patient-specific representations that can be used to quantify patient-similarity. △ Less

Submitted 16 May, 2021; v1 submitted 27 May, 2020; originally announced May 2020.

Comments: Accepted to ICML 2021

arXiv:2005.09059 [pdf, other]

doi 10.1109/JBHI.2020.3014556

Basal Glucose Control in Type 1 Diabetes using Deep Reinforcement Learning: An In Silico Validation

Authors: Taiyu Zhu, Kezhi Li, Pau Herrero, Pantelis Georgiou

Abstract: People with Type 1 diabetes (T1D) require regular exogenous infusion of insulin to maintain their blood glucose concentration in a therapeutically adequate target range. Although the artificial pancreas and continuous glucose monitoring have been proven to be effective in achieving closed-loop control, significant challenges still remain due to the high complexity of glucose dynamics and limitatio… ▽ More People with Type 1 diabetes (T1D) require regular exogenous infusion of insulin to maintain their blood glucose concentration in a therapeutically adequate target range. Although the artificial pancreas and continuous glucose monitoring have been proven to be effective in achieving closed-loop control, significant challenges still remain due to the high complexity of glucose dynamics and limitations in the technology. In this work, we propose a novel deep reinforcement learning model for single-hormone (insulin) and dual-hormone (insulin and glucagon) delivery. In particular, the delivery strategies are developed by double Q-learning with dilated recurrent neural networks. For designing and testing purposes, the FDA-accepted UVA/Padova Type 1 simulator was employed. First, we performed long-term generalized training to obtain a population model. Then, this model was personalized with a small data-set of subject-specific data. In silico results show that the single and dual-hormone delivery strategies achieve good glucose control when compared to a standard basal-bolus therapy with low-glucose insulin suspension. Specifically, in the adult cohort (n=10), percentage time in target range [70, 180] mg/dL improved from 77.6% to 80.9% with single-hormone control, and to $85.6\%$ with dual-hormone control. In the adolescent cohort (n=10), percentage time in target range improved from 55.5% to 65.9% with single-hormone control, and to 78.8% with dual-hormone control. In all scenarios, a significant decrease in hypoglycemia was observed. These results show that the use of deep reinforcement learning is a viable approach for closed-loop glucose control in T1D. △ Less

Submitted 18 May, 2020; originally announced May 2020.

Journal ref: IEEE journal of biomedical and health informatics 2020

arXiv:1912.07383 [pdf, other]

A Survey of Predictive Maintenance: Systems, Purposes and Approaches

Authors: Tianwen Zhu, Yongyi Ran, Xin Zhou, Yonggang Wen

Abstract: This paper highlights the importance of maintenance techniques in the coming industrial revolution, reviews the evolution of maintenance techniques, and presents a comprehensive literature review on the latest advancement of maintenance techniques, i.e., Predictive Maintenance (PdM), with emphasis on system architectures, optimization objectives, and optimization methods. In industry, any outages… ▽ More This paper highlights the importance of maintenance techniques in the coming industrial revolution, reviews the evolution of maintenance techniques, and presents a comprehensive literature review on the latest advancement of maintenance techniques, i.e., Predictive Maintenance (PdM), with emphasis on system architectures, optimization objectives, and optimization methods. In industry, any outages and unplanned downtime of machines or systems would degrade or interrupt a company's core business, potentially resulting in significant penalties and immeasurable reputation and economic loss. Existing traditional maintenance approaches, such as Reactive Maintenance (RM) and Preventive Maintenance (PM), suffer from high prevent and repair costs, inadequate or inaccurate mathematical degradation processes, and manual feature extraction. The incoming fourth industrial revolution is also demanding for a new maintenance paradigm to reduce the maintenance cost and downtime, and increase system availability and reliability. Predictive Maintenance (PdM) is envisioned the solution. In this survey, we first provide a high-level view of the PdM system architectures including PdM 4.0, Open System Architecture for Condition Based Monitoring (OSA-CBM), and cloud-enhanced PdM system. Then, we review the specific optimization objectives, which mainly comprise cost minimization, availability/reliability maximization, and multi-objective optimization. Furthermore, we present the optimization methods to achieve the aforementioned objectives, which include traditional Machine Learning (ML) based and Deep Learning (DL) based approaches. Finally, we highlight the future research directions that are critical to promote the application of DL techniques in the context of PdM. △ Less

Submitted 21 March, 2024; v1 submitted 12 December, 2019; originally announced December 2019.

Comments: 38 pages, 23 figures

arXiv:1912.05345 [pdf, other]

Severity Detection Tool for Patients with Infectious Disease

Authors: Girmaw Abebe Tadesse, Tingting Zhu, Nhan Le Nguyen Thanh, Nguyen Thanh Hung, Ha Thi Hai Duong, Truong Huu Khanh, Pham Van Quang, Duc Duong Tran, LamMinh Yen, H Rogier Van Doorn, Nguyen Van Hao, John Prince, Hamza Javed, DaniKiyasseh, Le Van Tan, Louise Thwaites, David A. Clifton

Abstract: Hand, foot and mouth disease (HFMD) and tetanus are serious infectious diseases in low and middle income countries. Tetanus in particular has a high mortality rate and its treatment is resource-demanding. Furthermore, HFMD often affects a large number of infants and young children. As a result, its treatment consumes enormous healthcare resources, especially when outbreaks occur. Autonomic nervous… ▽ More Hand, foot and mouth disease (HFMD) and tetanus are serious infectious diseases in low and middle income countries. Tetanus in particular has a high mortality rate and its treatment is resource-demanding. Furthermore, HFMD often affects a large number of infants and young children. As a result, its treatment consumes enormous healthcare resources, especially when outbreaks occur. Autonomic nervous system dysfunction (ANSD) is the main cause of death for both HFMD and tetanus patients. However, early detection of ANSD is a difficult and challenging problem. In this paper, we aim to provide a proof-of-principle to detect the ANSD level automatically by applying machine learning techniques to physiological patient data, such as electrocardiogram (ECG) and photoplethysmogram (PPG) waveforms, which can be collected using low-cost wearable sensors. Efficient features are extracted that encode variations in the waveforms in the time and frequency domains. A support vector machine is employed to classify the ANSD levels. The proposed approach is validated on multiple datasets of HFMD and tetanus patients in Vietnam. Results show that encouraging performance is achieved in classifying ANSD levels. Moreover, the proposed features are simple, more generalisable and outperformed the standard heart rate variability (HRV) analysis. The proposed approach would facilitate both the diagnosis and treatment of infectious diseases in low and middle income countries, and thereby improve overall patient care. △ Less

Submitted 10 December, 2019; originally announced December 2019.

arXiv:1911.10468 [pdf]

Extending the dynamic strain sensing rang of phase-OTDR with frequency modulation pulse and frequency interrogation

Authors: **gdong Zhang, Haoting Wu, **gsheng Huang, Hua Zheng, Danqi Feng, Guolu Yin, Tao Zhu

Abstract: We propose and experimentally demonstrate a technique to extend the dynamic sensing range of phase sensitive optical time domain reflectometry system based on the frequency interrogation. Benefitting from the range Doppler coupling feature, the frequency modulation pulse is capable of measuring the frequency shift induced by the dynamic strain, thus the large dynamic strain can be recovered. The p… ▽ More We propose and experimentally demonstrate a technique to extend the dynamic sensing range of phase sensitive optical time domain reflectometry system based on the frequency interrogation. Benefitting from the range Doppler coupling feature, the frequency modulation pulse is capable of measuring the frequency shift induced by the dynamic strain, thus the large dynamic strain can be recovered. The performance of the proposed method is experimentally evaluated by comparing it with phase unwrap**. The strain sensing rang can at least be increased by factor of hundreds, and fast dynamic strain with peak to peak 130 microstrain vibration frequency of 20 kHz is measured. △ Less

Submitted 24 November, 2019; originally announced November 2019.

arXiv:1712.05587 [pdf, ps, other]

DOA and Polarization Estimation for Non-Circular Signals in 3-D Millimeter Wave Polarized Massive MIMO Systems

Authors: Liangtian Wan, Kaihui Liu, Ying-Chang Liang, Tong Zhu

Abstract: In this paper, an algorithm of multiple signal classification (MUSIC) is proposed for two-dimensional (2-D) direction of- arrival (DOA) and polarization estimation of non-circular signal in three-dimensional (3-D) millimeter wave polarized largescale/ massive multiple-input-multiple-output (MIMO) systems. The traditional MUSIC-based algorithms can estimate either the DOA and polarization for circu… ▽ More In this paper, an algorithm of multiple signal classification (MUSIC) is proposed for two-dimensional (2-D) direction of- arrival (DOA) and polarization estimation of non-circular signal in three-dimensional (3-D) millimeter wave polarized largescale/ massive multiple-input-multiple-output (MIMO) systems. The traditional MUSIC-based algorithms can estimate either the DOA and polarization for circular signal or the DOA for non-circular signal by using spectrum search. By contrast, in the proposed algorithm only the DOA estimation needs spectrum search, and the polarization estimation has a closedform expression. First, a novel dimension-reduced MUSIC (DRMUSIC) is proposed for DOA and polarization estimation of circular signal with low computational complexity. Next, based on the quaternion theory, a novel algorithm named quaternion non-circular MUSIC (QNC-MUSIC) is proposed for parameter estimation of non-circular signal with high estimation accuracy. Then based on the DOA estimation result using QNC-MUSIC, the polarization estimation of non-circular signal is acquired by using the closed-form expression of the polarization estimation in DR-MUSIC. In addition, the computational complexity analysis shows that compared with the conventional DOA and polarization estimation algorithms, our proposed QNC-MUSIC and DRMUSIC have much lower computational complexity, especially when the source number is large. The stochastic Cramer-Rao Bound (CRB) for the estimation of the 2-D DOA and polarization parameters of the non-circular signals is derived as well. Finally, numerical examples are provided to demonstrate that the proposed algorithms can improve the parameter estimation performance when the large-scale/massive MIMO systems are employed. △ Less

Submitted 15 December, 2017; originally announced December 2017.

arXiv:1603.01055 [pdf, ps, other]

doi 10.1145/2835776.2835843

Feedback Control of Real-Time Display Advertising

Authors: Weinan Zhang, Yifei Rong, Jun Wang, Tianchi Zhu, Xiaofan Wang

Abstract: Real-Time Bidding (RTB) is revolutionising display advertising by facilitating per-impression auctions to buy ad impressions as they are being generated. Being able to use impression-level data, such as user cookies, encourages user behaviour targeting, and hence has significantly improved the effectiveness of ad campaigns. However, a fundamental drawback of RTB is its instability because the bid… ▽ More Real-Time Bidding (RTB) is revolutionising display advertising by facilitating per-impression auctions to buy ad impressions as they are being generated. Being able to use impression-level data, such as user cookies, encourages user behaviour targeting, and hence has significantly improved the effectiveness of ad campaigns. However, a fundamental drawback of RTB is its instability because the bid decision is made per impression and there are enormous fluctuations in campaigns' key performance indicators (KPIs). As such, advertisers face great difficulty in controlling their campaign performance against the associated costs. In this paper, we propose a feedback control mechanism for RTB which helps advertisers dynamically adjust the bids to effectively control the KPIs, e.g., the auction winning ratio and the effective cost per click. We further formulate an optimisation framework to show that the proposed feedback control mechanism also has the ability of optimising campaign performance. By settling the effective cost per click at an optimal reference value, the number of campaign's ad clicks can be maximised with the budget constraint. Our empirical study based on real-world data verifies the effectiveness and robustness of our RTB control system in various situations. The proposed feedback control mechanism has also been deployed on a commercial RTB platform and the online test has shown its success in generating controllable advertising performance. △ Less

Submitted 3 March, 2016; originally announced March 2016.

Comments: WSDM 2016

Showing 1–28 of 28 results for author: Zhu, T