Search | arXiv e-print repository

Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models

Authors: Vyas Raina, Rao Ma, Charles McGhee, Kate Knill, Mark Gales

Abstract: Recent developments in large speech foundation models like Whisper have led to their widespread use in many automatic speech recognition (ASR) applications. These systems incorporate `special tokens' in their vocabulary, such as $\texttt{<endoftext>}$, to guide their language generation process. However, we demonstrate that these tokens can be exploited by adversarial attacks to manipulate the mod… ▽ More Recent developments in large speech foundation models like Whisper have led to their widespread use in many automatic speech recognition (ASR) applications. These systems incorporate `special tokens' in their vocabulary, such as $\texttt{<endoftext>}$, to guide their language generation process. However, we demonstrate that these tokens can be exploited by adversarial attacks to manipulate the model's behavior. We propose a simple yet effective method to learn a universal acoustic realization of Whisper's $\texttt{<endoftext>}$ token, which, when prepended to any speech signal, encourages the model to ignore the speech and only transcribe the special token, effectively `muting' the model. Our experiments demonstrate that the same, universal 0.64-second adversarial audio segment can successfully mute a target Whisper ASR model for over 97\% of speech samples. Moreover, we find that this universal adversarial audio segment often transfers to new datasets and tasks. Overall this work demonstrates the vulnerability of Whisper models to `muting' adversarial attacks, where such attacks can pose both risks and potential benefits in real-world settings: for example the attack can be used to bypass speech moderation systems, or conversely the attack can also be used to protect private speech data. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2403.15156 [pdf, other]

Infrastructure-Assisted Collaborative Perception in Automated Valet Parking: A Safety Perspective

Authors: Yukuan Jia, Jiawen Zhang, Shimeng Lu, Baokang Fan, Ruiqing Mao, Sheng Zhou, Zhisheng Niu

Abstract: Environmental perception in Automated Valet Parking (AVP) has been a challenging task due to severe occlusions in parking garages. Although Collaborative Perception (CP) can be applied to broaden the field of view of connected vehicles, the limited bandwidth of vehicular communications restricts its application. In this work, we propose a BEV feature-based CP network architecture for infrastructur… ▽ More Environmental perception in Automated Valet Parking (AVP) has been a challenging task due to severe occlusions in parking garages. Although Collaborative Perception (CP) can be applied to broaden the field of view of connected vehicles, the limited bandwidth of vehicular communications restricts its application. In this work, we propose a BEV feature-based CP network architecture for infrastructure-assisted AVP systems. The model takes the roadside camera and LiDAR as optional inputs and adaptively fuses them with onboard sensors in a unified BEV representation. Autoencoder and downsampling are applied for channel-wise and spatial-wise dimension reduction, while sparsification and quantization further compress the feature map with little loss in data precision. Combining these techniques, the size of a BEV feature map is effectively compressed to fit in the feasible data rate of the NR-V2X network. With the synthetic AVP dataset, we observe that CP can effectively increase perception performance, especially for pedestrians. Moreover, the advantage of infrastructure-assisted CP is demonstrated in two typical safety-critical scenarios in the AVP setting, increasing the maximum safe cruising speed by up to 3m/s in both scenarios. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 7 pages, 7 figures, 4 tables, accepted by IEEE VTC2024-Spring

arXiv:2311.05550 [pdf, other]

Towards End-to-End Spoken Grammatical Error Correction

Authors: Stefano Bannò, Rao Ma, Mengjie Qian, Kate M. Knill, Mark J. F. Gales

Abstract: Grammatical feedback is crucial for L2 learners, teachers, and testers. Spoken grammatical error correction (GEC) aims to supply feedback to L2 learners on their use of grammar when speaking. This process usually relies on a cascaded pipeline comprising an ASR system, disfluency removal, and GEC, with the associated concern of propagating errors between these individual modules. In this paper, we… ▽ More Grammatical feedback is crucial for L2 learners, teachers, and testers. Spoken grammatical error correction (GEC) aims to supply feedback to L2 learners on their use of grammar when speaking. This process usually relies on a cascaded pipeline comprising an ASR system, disfluency removal, and GEC, with the associated concern of propagating errors between these individual modules. In this paper, we introduce an alternative "end-to-end" approach to spoken GEC, exploiting a speech recognition foundation model, Whisper. This foundation model can be used to replace the whole framework or part of it, e.g., ASR and disfluency removal. These end-to-end approaches are compared to more standard cascaded approaches on the data obtained from a free-speaking spoken language assessment test, Linguaskill. Results demonstrate that end-to-end spoken GEC is possible within this architecture, but the lack of available data limits current performance compared to a system using large quantities of text-based GEC data. Conversely, end-to-end disfluency detection and removal, which is easier for the attention-based Whisper to learn, does outperform cascaded approaches. Additionally, the paper discusses the challenges of providing feedback to candidates when using end-to-end systems for spoken GEC. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.09603 [pdf, other]

B-Spine: Learning B-Spline Curve Representation for Robust and Interpretable Spinal Curvature Estimation

Authors: Hao Wang, Qiang Song, Ruofeng Yin, Rui Ma, Yizhou Yu, Yi Chang

Abstract: Spinal curvature estimation is important to the diagnosis and treatment of the scoliosis. Existing methods face several issues such as the need of expensive annotations on the vertebral landmarks and being sensitive to the image quality. It is challenging to achieve robust estimation and obtain interpretable results, especially for low-quality images which are blurry and hazy. In this paper, we pr… ▽ More Spinal curvature estimation is important to the diagnosis and treatment of the scoliosis. Existing methods face several issues such as the need of expensive annotations on the vertebral landmarks and being sensitive to the image quality. It is challenging to achieve robust estimation and obtain interpretable results, especially for low-quality images which are blurry and hazy. In this paper, we propose B-Spine, a novel deep learning pipeline to learn B-spline curve representation of the spine and estimate the Cobb angles for spinal curvature estimation from low-quality X-ray images. Given a low-quality input, a novel SegRefine network which employs the unpaired image-to-image translation is proposed to generate a high quality spine mask from the initial segmentation result. Next, a novel mask-based B-spline prediction model is proposed to predict the B-spline curve for the spine centerline. Finally, the Cobb angles are estimated by a hybrid approach which combines the curve slope analysis and a curve-based regression model. We conduct quantitative and qualitative comparisons with the representative and SOTA learning-based methods on the public AASCE2019 dataset and our new proposed CJUH-JLU dataset which contains more challenging low-quality images. The superior performance on both datasets shows our method can achieve both robustness and interpretability for spinal curvature estimation. △ Less

Submitted 14 October, 2023; originally announced October 2023.

arXiv:2310.08292 [pdf, other]

Concealed Electronic Countermeasures of Radar Signal with Adversarial Examples

Authors: Ruinan Ma, Canjie Zhu, Mingfeng Lu, Yunjie Li, Yu-an Tan, Ruibin Zhang, Ran Tao

Abstract: Electronic countermeasures involving radar signals are an important aspect of modern warfare. Traditional electronic countermeasures techniques typically add large-scale interference signals to ensure interference effects, which can lead to attacks being too obvious. In recent years, AI-based attack methods have emerged that can effectively solve this problem, but the attack scenarios are currentl… ▽ More Electronic countermeasures involving radar signals are an important aspect of modern warfare. Traditional electronic countermeasures techniques typically add large-scale interference signals to ensure interference effects, which can lead to attacks being too obvious. In recent years, AI-based attack methods have emerged that can effectively solve this problem, but the attack scenarios are currently limited to time domain radar signal classification. In this paper, we focus on the time-frequency images classification scenario of radar signals. We first propose an attack pipeline under the time-frequency images scenario and DITIMI-FGSM attack algorithm with high transferability. Then, we propose STFT-based time domain signal attack(STDS) algorithm to solve the problem of non-invertibility in time-frequency analysis, thus obtaining the time-domain representation of the interference signal. A large number of experiments show that our attack pipeline is feasible and the proposed attack method has a high success rate. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2307.09378 [pdf, other]

doi 10.21437/SLaTE.2023-20

Adapting an ASR Foundation Model for Spoken Language Assessment

Authors: Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill

Abstract: A crucial part of an accurate and reliable spoken language assessment system is the underlying ASR model. Recently, large-scale pre-trained ASR foundation models such as Whisper have been made available. As the output of these models is designed to be human readable, punctuation is added, numbers are presented in Arabic numeric form and abbreviations are included. Additionally, these models have a… ▽ More A crucial part of an accurate and reliable spoken language assessment system is the underlying ASR model. Recently, large-scale pre-trained ASR foundation models such as Whisper have been made available. As the output of these models is designed to be human readable, punctuation is added, numbers are presented in Arabic numeric form and abbreviations are included. Additionally, these models have a tendency to skip disfluencies and hesitations in the output. Though useful for readability, these attributes are not helpful for assessing the ability of a candidate and providing feedback. Here a precise transcription of what a candidate said is needed. In this paper, we give a detailed analysis of Whisper outputs and propose two solutions: fine-tuning and soft prompt tuning. Experiments are conducted on both public speech corpora and an English learner dataset. Results show that we can effectively alter the decoding behaviour of Whisper to generate the exact words spoken in the response. △ Less

Submitted 10 October, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: Proceedings of SLaTE

arXiv:2307.04172 [pdf, other]

Can Generative Large Language Models Perform ASR Error Correction?

Authors: Rao Ma, Mengjie Qian, Potsawee Manakul, Mark Gales, Kate Knill

Abstract: ASR error correction is an interesting option for post processing speech recognition system outputs. These error correction models are usually trained in a supervised fashion using the decoding results of a target ASR system. This approach can be computationally intensive and the model is tuned to a specific ASR system. Recently generative large language models (LLMs) have been applied to a wide r… ▽ More ASR error correction is an interesting option for post processing speech recognition system outputs. These error correction models are usually trained in a supervised fashion using the decoding results of a target ASR system. This approach can be computationally intensive and the model is tuned to a specific ASR system. Recently generative large language models (LLMs) have been applied to a wide range of natural language processing tasks, as they can operate in a zero-shot or few shot fashion. In this paper we investigate using ChatGPT, a generative LLM, for ASR error correction. Based on the ASR N-best output, we propose both unconstrained and constrained, where a member of the N-best list is selected, approaches. Additionally, zero and 1-shot settings are evaluated. Experiments show that this generative LLM approach can yield performance gains for two different state-of-the-art ASR architectures, transducer and attention-encoder-decoder based, and multiple test sets. △ Less

Submitted 29 September, 2023; v1 submitted 9 July, 2023; originally announced July 2023.

arXiv:2306.01208 [pdf, other]

doi 10.21437/Interspeech.2023-1899

Adapting an Unadaptable ASR System

Authors: Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill

Abstract: As speech recognition model sizes and training data requirements grow, it is increasingly common for systems to only be available via APIs from online service providers rather than having direct access to models themselves. In this scenario it is challenging to adapt systems to a specific target domain. To address this problem we consider the recently released OpenAI Whisper ASR as an example of a… ▽ More As speech recognition model sizes and training data requirements grow, it is increasingly common for systems to only be available via APIs from online service providers rather than having direct access to models themselves. In this scenario it is challenging to adapt systems to a specific target domain. To address this problem we consider the recently released OpenAI Whisper ASR as an example of a large-scale ASR system to assess adaptation methods. An error correction based approach is adopted, as this does not require access to the model, but can be trained from either 1-best or N-best outputs that are normally available via the ASR API. LibriSpeech is used as the primary target domain for adaptation. The generalization ability of the system in two distinct dimensions are then evaluated. First, whether the form of correction model is portable to other speech recognition domains, and secondly whether it can be used for ASR models having a different architecture. △ Less

Submitted 10 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Proceedings of INTERSPEECH

arXiv:2305.18355 [pdf, other]

An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization

Authors: Fei Kong, **hao Duan, RuiPeng Ma, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu

Abstract: Recently, diffusion models have achieved remarkable success in generating tasks, including image and audio generation. However, like other generative models, diffusion models are prone to privacy issues. In this paper, we propose an efficient query-based membership inference attack (MIA), namely Proximal Initialization Attack (PIA), which utilizes groundtruth trajectory obtained by $ε$ initialized… ▽ More Recently, diffusion models have achieved remarkable success in generating tasks, including image and audio generation. However, like other generative models, diffusion models are prone to privacy issues. In this paper, we propose an efficient query-based membership inference attack (MIA), namely Proximal Initialization Attack (PIA), which utilizes groundtruth trajectory obtained by $ε$ initialized in $t=0$ and predicted point to infer memberships. Experimental results indicate that the proposed method can achieve competitive performance with only two queries on both discrete-time and continuous-time diffusion models. Moreover, previous works on the privacy of diffusion models have focused on vision tasks without considering audio tasks. Therefore, we also explore the robustness of diffusion models to MIA in the text-to-speech (TTS) task, which is an audio generation task. To the best of our knowledge, this work is the first to study the robustness of diffusion models to MIA in the TTS task. Experimental results indicate that models with mel-spectrogram (image-like) output are vulnerable to MIA, while models with audio output are relatively robust to MIA. {Code is available at \url{https://github.com/kong13661/PIA}}. △ Less

Submitted 9 October, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.01165 [pdf, other]

Self-similarity-based super-resolution of photoacoustic angiography from hand-drawn doodles

Authors: Yuanzheng Ma, Wangting Zhou, Rui Ma, Sihua Yang, Yansong Tang, Xun Guan

Abstract: Deep-learning-based super-resolution photoacoustic angiography (PAA) is a powerful tool that restores blood vessel images from under-sampled images to facilitate disease diagnosis. Nonetheless, due to the scarcity of training samples, PAA super-resolution models often exhibit inadequate generalization capabilities, particularly in the context of continuous monitoring tasks. To address this challen… ▽ More Deep-learning-based super-resolution photoacoustic angiography (PAA) is a powerful tool that restores blood vessel images from under-sampled images to facilitate disease diagnosis. Nonetheless, due to the scarcity of training samples, PAA super-resolution models often exhibit inadequate generalization capabilities, particularly in the context of continuous monitoring tasks. To address this challenge, we propose a novel approach that employs a super-resolution PAA method trained with forged PAA images. We start by generating realistic PAA images of human lips from hand-drawn curves using a diffusion-based image generation model. Subsequently, we train a self-similarity-based super-resolution model with these forged PAA images. Experimental results show that our method outperforms the super-resolution model trained with authentic PAA images in both original-domain and cross-domain tests. Specially, our approach boosts the quality of super-resolution reconstruction using the images forged by the deep learning model, indicating that the collaboration between deep learning models can facilitate generalization, despite limited initial dataset. This approach shows promising potential for exploring zero-shot learning neural networks for vision tasks. △ Less

Submitted 1 May, 2023; originally announced May 2023.

Comments: 12 pages, 6 figures, journal

arXiv:2304.09528 [pdf, other]

Network Algebraization and Port Relationship for Power-Electronic-Dominated Power Systems

Authors: Rui Ma, Xiaowen Yang, Meng Zhan

Abstract: Different from the quasi-static network in the traditional power system, the dynamic network in the power-electronic-dominated power system should be considered due to rapid response of converters' controls. In this paper, a nonlinear differential-algebraic model framework is established with algebraic equations for dynamic electrical networks and differential equations for the (source) nodes, by… ▽ More Different from the quasi-static network in the traditional power system, the dynamic network in the power-electronic-dominated power system should be considered due to rapid response of converters' controls. In this paper, a nonlinear differential-algebraic model framework is established with algebraic equations for dynamic electrical networks and differential equations for the (source) nodes, by generalizing the Kron reduction. The internal and terminal voltages of source nodes including converters are chosen as ports of nodes and networks. Correspondingly, the impact of dynamic network becomes clear, namely, it serves as a voltage divider and generates the terminal voltage based on the internal voltage of the sources instantaneously, even when the dynamics of inductance are included. With this simplest model, the roles of both nodes and the network become apparent.Simulations verify the proposed model framework in the modified 9-bus system. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: 4 pages, 6 figures

arXiv:2303.00456 [pdf, other]

doi 10.21437/Interspeech.2023-1616

N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space

Authors: Rao Ma, Mark J. F. Gales, Kate M. Knill, Mengjie Qian

Abstract: Error correction models form an important part of Automatic Speech Recognition (ASR) post-processing to improve the readability and quality of transcriptions. Most prior works use the 1-best ASR hypothesis as input and therefore can only perform correction by leveraging the context within one sentence. In this work, we propose a novel N-best T5 model for this task, which is fine-tuned from a T5 mo… ▽ More Error correction models form an important part of Automatic Speech Recognition (ASR) post-processing to improve the readability and quality of transcriptions. Most prior works use the 1-best ASR hypothesis as input and therefore can only perform correction by leveraging the context within one sentence. In this work, we propose a novel N-best T5 model for this task, which is fine-tuned from a T5 model and utilizes ASR N-best lists as model input. By transferring knowledge from the pre-trained language model and obtaining richer information from the ASR decoding space, the proposed approach outperforms a strong Conformer-Transducer baseline. Another issue with standard error correction is that the generation process is not well-guided. To address this a constrained decoding process, either based on the N-best list or an ASR lattice, is used which allows additional information to be propagated. △ Less

Submitted 10 October, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: Proceedings of INTERSPEECH

arXiv:2212.13661

Cost-minimization predictive energy management of a postal-delivery fuel cell electric vehicle with intelligent battery State-of-Charge Planner

Authors: Yang Zhou, Fuzeng Li, Xianfeng Xu, Zhen Zhang, Alexandre Ravey, Marie-Cécile Péra, Ruiqing Ma

Abstract: Fuel cell electric vehicles have earned substantial attentions in recent decades due to their high-efficiency and zero-emission features, while the high operating costs remain the major barrier towards their large-scale commercialization. In such context, this paper aims to devise an energy management strategy for an urban postal-delivery fuel cell electric vehicle for operating cost mitigation. F… ▽ More Fuel cell electric vehicles have earned substantial attentions in recent decades due to their high-efficiency and zero-emission features, while the high operating costs remain the major barrier towards their large-scale commercialization. In such context, this paper aims to devise an energy management strategy for an urban postal-delivery fuel cell electric vehicle for operating cost mitigation. First, a data-driven dual-loop spatial-domain battery state-of-charge reference estimator is designed to guide battery energy depletion, which is trained by real-world driving data collected in postal delivery missions. Then, a fuzzy C-means clustering enhanced Markov speed predictor is constructed to project the upcoming velocity. Lastly, combining the state-of-charge reference and the forecasted speed, a model predictive control-based cost-optimization energy management strategy is established to mitigate vehicle operating costs imposed by energy consumption and power-source degradations. Validation results have shown that 1) the proposed strategy could mitigate the operating cost by 4.43% and 7.30% in average versus benchmark strategies, denoting its superiority in term of cost-reduction and 2) the computation burden per step of the proposed strategy is averaged at 0.123ms, less than the sampling time interval 1s, proving its potential of real-time applications. △ Less

Submitted 2 March, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

Comments: Submission to journal

arXiv:2212.11233 [pdf]

Realization Scheme for Visual Cryptography with Computer-generated Holograms

Authors: Tao Yu, **ge Ma, Guilin Li, Dongyu Yang, Rui Ma, Yishi Shi

Abstract: We propose to realize visual cryptography in an indirect way with the help of computer-generated hologram. At present, the recovery method of visual cryptography is mainly superimposed on transparent film or superimposed by computer equipment, which greatly limits the application range of visual cryptography. In this paper, the shares of the visual cryptography were encoded with computer-generated… ▽ More We propose to realize visual cryptography in an indirect way with the help of computer-generated hologram. At present, the recovery method of visual cryptography is mainly superimposed on transparent film or superimposed by computer equipment, which greatly limits the application range of visual cryptography. In this paper, the shares of the visual cryptography were encoded with computer-generated hologram, and the shares is reproduced by optical means, and then superimposed and decrypted. This method can expand the application range of visual cryptography and further increase the security of visual cryptography. △ Less

Submitted 9 December, 2022; originally announced December 2022.

Comments: International Workshop on Holography and related technologies (IWH) 2018

arXiv:2211.00968 [pdf, ps, other]

Internal Language Model Estimation based Adaptive Language Model Fusion for Domain Adaptation

Authors: Rao Ma, Xiaobo Wu, ** Qiu, Yanan Qin, Haihua Xu, Peihao Wu, Zejun Ma

Abstract: ASR model deployment environment is ever-changing, and the incoming speech can be switched across different domains during a session. This brings a challenge for effective domain adaptation when only target domain text data is available, and our objective is to obtain obviously improved performance on the target domain while the performance on the general domain is less undermined. In this paper,… ▽ More ASR model deployment environment is ever-changing, and the incoming speech can be switched across different domains during a session. This brings a challenge for effective domain adaptation when only target domain text data is available, and our objective is to obtain obviously improved performance on the target domain while the performance on the general domain is less undermined. In this paper, we propose an adaptive LM fusion approach called internal language model estimation based adaptive domain adaptation (ILME-ADA). To realize such an ILME-ADA, an interpolated log-likelihood score is calculated based on the maximum of the scores from the internal LM and the external LM (ELM) respectively. We demonstrate the efficacy of the proposed ILME-ADA method with both RNN-T and LAS modeling frameworks employing neural network and n-gram LMs as ELMs respectively on two domain specific (target) test sets. The proposed method can achieve significantly better performance on the target test sets while it gets minimal performance degradation on the general test set, compared with both shallow and ILME-based LM fusion methods. △ Less

Submitted 2 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted by ICASSP 2023

arXiv:2207.07609 [pdf, other]

doi 10.1007/978-3-031-26348-4_29

DOLPHINS: Dataset for Collaborative Perception enabled Harmonious and Interconnected Self-driving

Authors: Ruiqing Mao, **gyu Guo, Yukuan Jia, Yuxuan Sun, Sheng Zhou, Zhisheng Niu

Abstract: Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving, which is a promising solution to the fundamental defect of stand-alone intelligence including blind zones and long-range perception. However, the lack of datasets has severely blocked the development of collaborative perception algorithms. In this work, we release DOLPHINS: Dataset for cOllaborative Per… ▽ More Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving, which is a promising solution to the fundamental defect of stand-alone intelligence including blind zones and long-range perception. However, the lack of datasets has severely blocked the development of collaborative perception algorithms. In this work, we release DOLPHINS: Dataset for cOllaborative Perception enabled Harmonious and INterconnected Self-driving, as a new simulated large-scale various-scenario multi-view multi-modality autonomous driving dataset, which provides a ground-breaking benchmark platform for interconnected autonomous driving. DOLPHINS outperforms current datasets in six dimensions: temporally-aligned images and point clouds from both vehicles and Road Side Units (RSUs) enabling both Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) based collaborative perception; 6 typical scenarios with dynamic weather conditions make the most various interconnected autonomous driving dataset; meticulously selected viewpoints providing full coverage of the key areas and every object; 42376 frames and 292549 objects, as well as the corresponding 3D annotations, geo-positions, and calibrations, compose the largest dataset for collaborative perception; Full-HD images and 64-line LiDARs construct high-resolution data with sufficient details; well-organized APIs and open-source codes ensure the extensibility of DOLPHINS. We also construct a benchmark of 2D detection, 3D detection, and multi-view collaborative perception tasks on DOLPHINS. The experiment results show that the raw-level fusion scheme through V2X communication can help to improve the precision as well as to reduce the necessity of expensive LiDAR equipment on vehicles when RSUs exist, which may accelerate the popularity of interconnected self-driving vehicles. DOLPHINS is now available on https://dolphins-dataset.net/. △ Less

Submitted 15 July, 2022; originally announced July 2022.

arXiv:2206.05052 [pdf]

Meta-data Study in Autism Spectrum Disorder Classification Based on Structural MRI

Authors: Ruimin Ma, Yanlin Wang, Yanjie Wei, Yi Pan

Abstract: Accurate diagnosis of autism spectrum disorder (ASD) based on neuroimaging data has significant implications, as extracting useful information from neuroimaging data for ASD detection is challenging. Even though machine learning techniques have been leveraged to improve the information extraction from neuroimaging data, the varying data quality caused by different meta-data conditions (i.e., data… ▽ More Accurate diagnosis of autism spectrum disorder (ASD) based on neuroimaging data has significant implications, as extracting useful information from neuroimaging data for ASD detection is challenging. Even though machine learning techniques have been leveraged to improve the information extraction from neuroimaging data, the varying data quality caused by different meta-data conditions (i.e., data collection strategies) limits the effective information that can be extracted, thus leading to data-dependent predictive accuracies in ASD detection, which can be worse than random guess in some cases. In this work, we systematically investigate the impact of three kinds of meta-data on the predictive accuracy of classifying ASD based on structural MRI collected from 20 different sites, where meta-data conditions vary. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2203.16357 [pdf, other]

Practical Learned Lossless JPEG Recompression with Multi-Level Cross-Channel Entropy Model in the DCT Domain

Authors: Lina Guo, Xinjie Shi, Dailan He, Yuanyuan Wang, Rui Ma, Hongwei Qin, Yan Wang

Abstract: JPEG is a popular image compression method widely used by individuals, data center, cloud storage and network filesystems. However, most recent progress on image compression mainly focuses on uncompressed images while ignoring trillions of already-existing JPEG images. To compress these JPEG images adequately and restore them back to JPEG format losslessly when needed, we propose a deep learning b… ▽ More JPEG is a popular image compression method widely used by individuals, data center, cloud storage and network filesystems. However, most recent progress on image compression mainly focuses on uncompressed images while ignoring trillions of already-existing JPEG images. To compress these JPEG images adequately and restore them back to JPEG format losslessly when needed, we propose a deep learning based JPEG recompression method that operates on DCT domain and propose a Multi-Level Cross-Channel Entropy Model to compress the most informative Y component. Experiments show that our method achieves state-of-the-art performance compared with traditional JPEG recompression methods including Lepton, JPEG XL and CMIX. To the best of our knowledge, this is the first learned compression method that losslessly transcodes JPEG images to more storage-saving bitstreams. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: CVPR 2022

arXiv:2203.10886 [pdf, other]

ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding

Authors: Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, Yan Wang

Abstract: Recently, learned image compression techniques have achieved remarkable performance, even surpassing the best manually designed lossy image coders. They are promising to be large-scale adopted. For the sake of practicality, a thorough investigation of the architecture design of learned image compression, regarding both compression performance and running speed, is essential. In this paper, we firs… ▽ More Recently, learned image compression techniques have achieved remarkable performance, even surpassing the best manually designed lossy image coders. They are promising to be large-scale adopted. For the sake of practicality, a thorough investigation of the architecture design of learned image compression, regarding both compression performance and running speed, is essential. In this paper, we first propose uneven channel-conditional adaptive coding, motivated by the observation of energy compaction in learned image compression. Combining the proposed uneven grou** model with existing context models, we obtain a spatial-channel contextual adaptive model to improve the coding performance without damage to running speed. Then we study the structure of the main transform and propose an efficient model, ELIC, to achieve state-of-the-art speed and compression ability. With superior performance, the proposed model also supports extremely fast preview decoding and progressive decoding, which makes the coming application of learning-based image compression more promising. △ Less

Submitted 29 March, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

Comments: accepted by CVPR 2022 (oral)

arXiv:2201.11627 [pdf, other]

Internal Language Model Estimation Through Explicit Context Vector Learning for Attention-based Encoder-decoder ASR

Authors: Yufei Liu, Rao Ma, Haihua Xu, Yi He, Zejun Ma, Weibin Zhang

Abstract: An end-to-end (E2E) ASR model implicitly learns a prior Internal Language Model (ILM) from the training transcripts. To fuse an external LM using Bayes posterior theory, the log likelihood produced by the ILM has to be accurately estimated and subtracted. In this paper we propose two novel approaches to estimate the ILM based on Listen-Attend-Spell (LAS) framework. The first method is to replace t… ▽ More An end-to-end (E2E) ASR model implicitly learns a prior Internal Language Model (ILM) from the training transcripts. To fuse an external LM using Bayes posterior theory, the log likelihood produced by the ILM has to be accurately estimated and subtracted. In this paper we propose two novel approaches to estimate the ILM based on Listen-Attend-Spell (LAS) framework. The first method is to replace the context vector of the LAS decoder at every time step with a vector that is learned with training transcripts. Furthermore, we propose another method that uses a lightweight feed-forward network to directly map query vector to context vector in a dynamic sense. Since the context vectors are learned by minimizing the perplexities on training transcripts, and their estimation is independent of encoder output, hence the ILMs are accurately learned for both methods. Experiments show that the ILMs achieve the lowest perplexity, indicating the efficacy of the proposed methods. In addition, they also significantly outperform the shallow fusion method, as well as two previously proposed ILM Estimation (ILME) approaches on several datasets. △ Less

Submitted 2 November, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

Comments: Proceedings of INTERSPEECH

arXiv:2112.07840 [pdf, other]

A Predictive Online Transient Stability Assessment with Hierarchical Generative Adversarial Networks

Authors: Rui Ma, Sara Eftekharnejad, Chen Zhong, Mustafa Cenk Gursoy

Abstract: Online transient stability assessment (TSA) is essential for secure and stable power system operations. The growing number of Phasor Measurement Units (PMUs) brings about massive sources of data that can enhance online TSA. However, conventional data-driven methods require large amounts of transient data to correctly assess the transient stability state of a system. In this paper, a new data-drive… ▽ More Online transient stability assessment (TSA) is essential for secure and stable power system operations. The growing number of Phasor Measurement Units (PMUs) brings about massive sources of data that can enhance online TSA. However, conventional data-driven methods require large amounts of transient data to correctly assess the transient stability state of a system. In this paper, a new data-driven TSA approach is developed for TSA with fewer data compared to the conventional methods. The data reduction is enabled by learning the dynamic behaviors of the historical transient data using generative and adversarial networks (GAN). This knowledge is used online to predict the voltage time series data after a transient event. A classifier embedded in the generative network deploys the predicted post-contingency data to determine the stability of the system following a fault. The developed GAN-based TSA approach preserves the spatial and temporal correlations that exist in multivariate PMU time series data. Hence, in comparison with the state-of-the-art TSA methods, it achieves a higher assessment accuracy using only one sample of the measured data and a shorter response time. Case studies conducted on the IEEE 118-bus system demonstrate the superior performance of the GAN-based method compared to the conventional data-driven techniques. △ Less

Submitted 14 December, 2021; originally announced December 2021.

arXiv:2111.12566 [pdf, other]

Acoustical Analysis of Speech Under Physical Stress in Relation to Physical Activities and Physical Literacy

Authors: Si-Ioi Ng, Rui-Si Ma, Tan Lee, Raymond Kim-Wai Sum

Abstract: Human speech production encompasses physiological processes that naturally react to physic stress. Stress caused by physical activity (PA), e.g., running, may lead to significant changes in a person's speech. The major changes are related to the aspects of pitch level, speaking rate, pause pattern, and breathiness. The extent of change depends presumably on physical fitness and well-being of the p… ▽ More Human speech production encompasses physiological processes that naturally react to physic stress. Stress caused by physical activity (PA), e.g., running, may lead to significant changes in a person's speech. The major changes are related to the aspects of pitch level, speaking rate, pause pattern, and breathiness. The extent of change depends presumably on physical fitness and well-being of the person, as well as intensity of PA. The general wellness of a person is further related to his/her physical literacy (PL), which refers to a holistic description of engagement in PA. This paper presents the development of a Cantonese speech database that contains audio recordings of speech before and after physical exercises of different intensity levels. The corpus design and data collection process are described. Preliminary results of acoustical analysis are presented to illustrate the impact of PA on pitch level, pitch range, speaking and articulation rate, and time duration of pauses. It is also noted that the effect of PA is correlated to some of the PA and PL measures. △ Less

Submitted 25 April, 2022; v1 submitted 20 November, 2021; originally announced November 2021.

Comments: Accepted to Speech Prosody 2022

arXiv:2111.09637 [pdf, other]

doi 10.1109/PAWR53092.2022.9719754

A Modular 1D-CNN Architecture for Real-time Digital Pre-distortion

Authors: Udara De Silva, Toshiaki Koike-Akino, Rui Ma, Ao Yamashita, Hideyuki Nakamizo

Abstract: This study reports a novel hardware-friendly modular architecture for implementing one dimensional convolutional neural network (1D-CNN) digital predistortion (DPD) technique to linearize RF power amplifier (PA) real-time.The modular nature of our design enables DPD system adaptation for variable resource and timing constraints.Our work also presents a co-simulation architecture to verify the DPD… ▽ More This study reports a novel hardware-friendly modular architecture for implementing one dimensional convolutional neural network (1D-CNN) digital predistortion (DPD) technique to linearize RF power amplifier (PA) real-time.The modular nature of our design enables DPD system adaptation for variable resource and timing constraints.Our work also presents a co-simulation architecture to verify the DPD performance with an actual power amplifier hardware-in-the-loop.The experimental results with 100 MHz signals show that the proposed 1D-CNN obtains superior performance compared with other neural network architectures for real-time DPD application. △ Less

Submitted 18 November, 2021; originally announced November 2021.

Comments: 3 pages, 4 figures, to be published in RWW2022

arXiv:2110.10755 [pdf, other]

Toward Real-world Image Super-resolution via Hardware-based Adaptive Degradation Models

Authors: Rui Ma, Johnathan Czernik, Xian Du

Abstract: Most single image super-resolution (SR) methods are developed on synthetic low-resolution (LR) and high-resolution (HR) image pairs, which are simulated by a predetermined degradation operation, e.g., bicubic downsampling. However, these methods only learn the inverse process of the predetermined operation, so they fail to super resolve the real-world LR images; the true formulation deviates from… ▽ More Most single image super-resolution (SR) methods are developed on synthetic low-resolution (LR) and high-resolution (HR) image pairs, which are simulated by a predetermined degradation operation, e.g., bicubic downsampling. However, these methods only learn the inverse process of the predetermined operation, so they fail to super resolve the real-world LR images; the true formulation deviates from the predetermined operation. To address this problem, we propose a novel supervised method to simulate an unknown degradation process with the inclusion of the prior hardware knowledge of the imaging system. We design an adaptive blurring layer (ABL) in the supervised learning framework to estimate the target LR images. The hyperparameters of the ABL can be adjusted for different imaging hardware. The experiments on the real-world datasets validate that our degradation model can estimate LR images more accurately than the predetermined degradation operation, as well as facilitate existing SR methods to perform reconstructions on real-world LR images more accurately than the conventional approaches. △ Less

Submitted 20 October, 2021; originally announced October 2021.

arXiv:2109.02755 [pdf, other]

Motion Artifact Reduction In Photoplethysmography For Reliable Signal Selection

Authors: Runyu Mao, Mackenzie Tweardy, Stephan W. Wegerich, Craig J. Goergen, George R. Wodicka, Fengqing Zhu

Abstract: Photoplethysmography (PPG) is a non-invasive and economical technique to extract vital signs of the human body. Although it has been widely used in consumer and research grade wrist devices to track a user's physiology, the PPG signal is very sensitive to motion which can corrupt the signal's quality. Existing Motion Artifact (MA) reduction techniques have been developed and evaluated using either… ▽ More Photoplethysmography (PPG) is a non-invasive and economical technique to extract vital signs of the human body. Although it has been widely used in consumer and research grade wrist devices to track a user's physiology, the PPG signal is very sensitive to motion which can corrupt the signal's quality. Existing Motion Artifact (MA) reduction techniques have been developed and evaluated using either synthetic noisy signals or signals collected during high-intensity activities - both of which are difficult to generalize for real-life scenarios. Therefore, it is valuable to collect realistic PPG signals while performing Activities of Daily Living (ADL) to develop practical signal denoising and analysis methods. In this work, we propose an automatic pseudo clean PPG generation process for reliable PPG signal selection. For each noisy PPG segment, the corresponding pseudo clean PPG reduces the MAs and contains rich temporal details depicting cardiac features. Our experimental results show that 71% of the pseudo clean PPG collected from ADL can be considered as high quality segment where the derived MAE of heart rate and respiration rate are 1.46 BPM and 3.93 BrPM, respectively. Therefore, our proposed method can determine the reliability of the raw noisy PPG by considering quality of the corresponding pseudo clean PPG signal. △ Less

Submitted 6 September, 2021; originally announced September 2021.

arXiv:2108.06513 [pdf, ps, other]

A 2D Non-Stationary Channel Model for Underwater Acoustic Communication Systems

Authors: Xiuming Zhu, Cheng-Xiang Wang, Ruofei Ma

Abstract: Underwater acoustic (UWA) communication plays a key role in the process of exploring and studying the ocean. In this paper, a modified non-stationary wideband channel model for UWA communication in shallow water scenarios is proposed. In this geometry-based stochastic model (GBSM), multiple motion effects, time-varying angles, distances, clusters' locations with the channel geometry, and the ultra… ▽ More Underwater acoustic (UWA) communication plays a key role in the process of exploring and studying the ocean. In this paper, a modified non-stationary wideband channel model for UWA communication in shallow water scenarios is proposed. In this geometry-based stochastic model (GBSM), multiple motion effects, time-varying angles, distances, clusters' locations with the channel geometry, and the ultra-wideband property are considered, which makes the proposed model more realistic and capable of supporting long time/distance simulations. Some key statistical properties are investigated, including temporal autocorrelation function (ACF), power delay profile (PDP), average delay, and root mean square (RMS) delay spread. The impacts of multiple motion factors on temporal ACFs are analyzed. Simulation results show that the proposed model can mimic the non-stationarity of UWA channels. Finally, the proposed model is validated with measurement data. △ Less

Submitted 14 August, 2021; originally announced August 2021.

arXiv:2107.13353 [pdf, other]

Fast Wireless Sensor Anomaly Detection based on Data Stream in Edge Computing Enabled Smart Greenhouse

Authors: Yihong Yang, Sheng Ding, Yuwen Liu, Shunmei Meng, Xiaoxiao Chi, Rui Ma, Chao Yan

Abstract: Edge computing enabled smart greenhouse is a representative application of Internet of Things technology, which can monitor the environmental information in real time and employ the information to contribute to intelligent decision-making. In the process, anomaly detection for wireless sensor data plays an important role. However, traditional anomaly detection algorithms originally designed for an… ▽ More Edge computing enabled smart greenhouse is a representative application of Internet of Things technology, which can monitor the environmental information in real time and employ the information to contribute to intelligent decision-making. In the process, anomaly detection for wireless sensor data plays an important role. However, traditional anomaly detection algorithms originally designed for anomaly detection in static data have not properly considered the inherent characteristics of data stream produced by wireless sensor such as infiniteness, correlations and concept drift, which may pose a considerable challenge on anomaly detection based on data stream, and lead to low detection accuracy and efficiency. First, data stream usually generates quickly which means that it is infinite and enormous, so any traditional off-line anomaly detection algorithm that attempts to store the whole dataset or to scan the dataset multiple times for anomaly detection will run out of memory space. Second, there exist correlations among different data streams, which traditional algorithms hardly consider. Third, the underlying data generation process or data distribution may change over time. Thus, traditional anomaly detection algorithms with no model update will lose their effects. Considering these issues, a novel method (called DLSHiForest) on basis of Locality-Sensitive Hashing and time window technique in this paper is proposed to solve these problems while achieving accurate and efficient detection. Comprehensive experiments are executed using real-world agricultural greenhouse dataset to demonstrate the feasibility of our approach. Experimental results show that our proposal is practicable in addressing challenges of traditional anomaly detection while ensuring accuracy and efficiency. △ Less

Submitted 28 July, 2021; originally announced July 2021.

Comments: 12 pages, 8 figures

arXiv:2104.01846 [pdf]

A Lossless Intra Reference Block Recompression Scheme for Bandwidth Reduction in HEVC-IBC

Authors: Jiyuan Hu, Jun Wang, Guangyu Zhong, Jian Cao, Ren Mao, Fan Liang

Abstract: The reference frame memory accesses in inter prediction result in high DRAM bandwidth requirement and power consumption. This problem is more intensive by the adoption of intra block copy (IBC), a new coding tool in the screen content coding (SCC) extension to High Efficiency Video Coding (HEVC). In this paper, we propose a lossless recompression scheme that compresses the reference blocks in intr… ▽ More The reference frame memory accesses in inter prediction result in high DRAM bandwidth requirement and power consumption. This problem is more intensive by the adoption of intra block copy (IBC), a new coding tool in the screen content coding (SCC) extension to High Efficiency Video Coding (HEVC). In this paper, we propose a lossless recompression scheme that compresses the reference blocks in intra prediction, i.e., intra block copy, before storing them into DRAM to alleviate this problem. The proposal performs pixel-wise texture analysis with an edge-based adaptive prediction method yet no signaling for direction in bitstreams, thus achieves a high gain for compression. Experimental results demonstrate that the proposed scheme shows a 72% data reduction rate on average, which solves the memory bandwidth problem. △ Less

Submitted 5 April, 2021; originally announced April 2021.

Comments: ISCAS 2021 accepted as oral

arXiv:2102.09828 [pdf, other]

AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge

Authors: Houjun Huang, Xu Xiang, Yexin Yang, Rao Ma, Yanmin Qian

Abstract: This paper describes the AISpeech-SJTU system for the accent identification track of the Interspeech-2020 Accented English Speech Recognition Challenge. In this challenge track, only 160-hour accented English data collected from 8 countries and the auxiliary Librispeech dataset are provided for training. To build an accurate and robust accent identification system, we explore the whole system pipe… ▽ More This paper describes the AISpeech-SJTU system for the accent identification track of the Interspeech-2020 Accented English Speech Recognition Challenge. In this challenge track, only 160-hour accented English data collected from 8 countries and the auxiliary Librispeech dataset are provided for training. To build an accurate and robust accent identification system, we explore the whole system pipeline in detail. First, we introduce the ASR based phone posteriorgram (PPG) feature to accent identification and verify its efficacy. Then, a novel TTS based approach is carefully designed to augment the very limited accent training data for the first time. Finally, we propose the test time augmentation and embedding fusion schemes to further improve the system performance. Our final system is ranked first in the challenge and outperforms all the other participants by a large margin. The submitted system achieves 83.63\% average accuracy on the challenge evaluation data, ahead of the others by more than 10\% in absolute terms. △ Less

Submitted 19 February, 2021; originally announced February 2021.

Comments: Accepted to ICASSP 2021

arXiv:2010.08031 [pdf]

doi 10.1016/j.eswa.2021.115892

QReLU and m-QReLU: Two novel quantum activation functions to aid medical diagnostics

Authors: L. Parisi, D. Neagu, R. Ma, F. Campean

Abstract: The ReLU activation function (AF) has been extensively applied in deep neural networks, in particular Convolutional Neural Networks (CNN), for image classification despite its unresolved dying ReLU problem, which poses challenges to reliable applications. This issue has obvious important implications for critical applications, such as those in healthcare. Recent approaches are just proposing varia… ▽ More The ReLU activation function (AF) has been extensively applied in deep neural networks, in particular Convolutional Neural Networks (CNN), for image classification despite its unresolved dying ReLU problem, which poses challenges to reliable applications. This issue has obvious important implications for critical applications, such as those in healthcare. Recent approaches are just proposing variations of the activation function within the same unresolved dying ReLU challenge. This contribution reports a different research direction by investigating the development of an innovative quantum approach to the ReLU AF that avoids the dying ReLU problem by disruptive design. The Leaky ReLU was leveraged as a baseline on which the two quantum principles of entanglement and superposition were applied to derive the proposed Quantum ReLU (QReLU) and the modified-QReLU (m-QReLU) activation functions. Both QReLU and m-QReLU are implemented and made freely available in TensorFlow and Keras. This original approach is effective and validated extensively in case studies that facilitate the detection of COVID-19 and Parkinson Disease (PD) from medical images. The two novel AFs were evaluated in a two-layered CNN against nine ReLU-based AFs on seven benchmark datasets, including images of spiral drawings taken via graphic tablets from patients with Parkinson Disease and healthy subjects, and point-of-care ultrasound images on the lungs of patients with COVID-19, those with pneumonia and healthy controls. Despite a higher computational cost, results indicated an overall higher classification accuracy, precision, recall and F1-score brought about by either quantum AFs on five of the seven bench-mark datasets, thus demonstrating its potential to be the new benchmark or gold standard AF in CNNs and aid image classification tasks involved in critical applications, such as medical diagnoses of COVID-19 and PD. △ Less

Submitted 15 October, 2020; originally announced October 2020.

Comments: 30 pages, 4 listings/Python code snippets, 2 figures, 8 tables

MSC Class: 68T07; 68T10; 68T45; 68U35 ACM Class: I.2.1; I.2.10; I.4.9; I.5.1; I.5.4; I.5.5

arXiv:2007.14374 [pdf, other]

doi 10.1109/TPDS.2020.3040867

Accelerating Federated Learning over Reliability-Agnostic Clients in Mobile Edge Computing Systems

Authors: Wentai Wu, Ligang He, Weiwei Lin, Rui Mao

Abstract: Mobile Edge Computing (MEC), which incorporates the Cloud, edge nodes and end devices, has shown great potential in bringing data processing closer to the data sources. Meanwhile, Federated learning (FL) has emerged as a promising privacy-preserving approach to facilitating AI applications. However, it remains a big challenge to optimize the efficiency and effectiveness of FL when it is integrated… ▽ More Mobile Edge Computing (MEC), which incorporates the Cloud, edge nodes and end devices, has shown great potential in bringing data processing closer to the data sources. Meanwhile, Federated learning (FL) has emerged as a promising privacy-preserving approach to facilitating AI applications. However, it remains a big challenge to optimize the efficiency and effectiveness of FL when it is integrated with the MEC architecture. Moreover, the unreliable nature (e.g., stragglers and intermittent drop-out) of end devices significantly slows down the FL process and affects the global model's quality Xin such circumstances. In this paper, a multi-layer federated learning protocol called HybridFL is designed for the MEC architecture. HybridFL adopts two levels (the edge level and the cloud level) of model aggregation enacting different aggregation strategies. Moreover, in order to mitigate stragglers and end device drop-out, we introduce regional slack factors into the stage of client selection performed at the edge nodes using a probabilistic approach without identifying or probing the state of end devices (whose reliability is agnostic). We demonstrate the effectiveness of our method in modulating the proportion of clients selected and present the convergence analysis for our protocol. We have conducted extensive experiments with machine learning tasks in different scales of MEC system. The results show that HybridFL improves the FL training process significantly in terms of shortening the federated round length, speeding up the global model's convergence (by up to 12X) and reducing end device energy consumption (by up to 58%). △ Less

Submitted 23 April, 2021; v1 submitted 28 July, 2020; originally announced July 2020.

Comments: 14 pages, 7 figures, with Appendix

Journal ref: IEEE Transactions on Parallel and Distributed Systems. Vol.32, no.7, pp.1539-1551 (2020)

arXiv:2004.12747 [pdf]

Energy Efficient Software Matching in Distributed Vehicular Fog Based Architecture with Cloud and Fixed Fog Nodes

Authors: Rui Ma, Amal A. Alahmadi, Taisir E. H. El-Gorashi, Jaafar M. H. Elmirghani

Abstract: The rapid development of vehicles on-board units and the proliferation of autonomous vehicles in modern cities create a potential for a new fog computing paradigm, referred to as vehicular fog computing (VFC). In this paper, we propose an architecture that integrates a vehicular fog (VF) composed of vehicles clustered in a parking lot with a fixed fog node at the access network and the central clo… ▽ More The rapid development of vehicles on-board units and the proliferation of autonomous vehicles in modern cities create a potential for a new fog computing paradigm, referred to as vehicular fog computing (VFC). In this paper, we propose an architecture that integrates a vehicular fog (VF) composed of vehicles clustered in a parking lot with a fixed fog node at the access network and the central cloud. We investigate the problem of energy efficient software matching in the VF considering different approaches to deploy software packages in vehicles. △ Less

Submitted 27 April, 2020; originally announced April 2020.

arXiv:1911.09830 [pdf]

Identify the cells' nuclei based on the deep learning neural network

Authors: Tianyang Zhang, Rui Ma

Abstract: Identify the cells' nuclei is the important point for most medical analyses. To assist doctors finding the accurate cell' nuclei location automatically is highly demanded in the clinical practice. Recently, fully convolutional neural network (FCNs) serve as the back-bone in many image segmentation, like liver and tumer segmentation in medical field, human body block in technical filed. The cells'… ▽ More Identify the cells' nuclei is the important point for most medical analyses. To assist doctors finding the accurate cell' nuclei location automatically is highly demanded in the clinical practice. Recently, fully convolutional neural network (FCNs) serve as the back-bone in many image segmentation, like liver and tumer segmentation in medical field, human body block in technical filed. The cells' nuclei identification task is also kind of image segmentation. To achieve this, we prefer to use deep learning algorithms. we construct three general frameworks, one is Mask Region-based Convolutional Neural Network (Mask RCNN), which has the high performance in many image segmentations, one is U-net, which has the high generalization performance on small dataset and the other is DenseUNet, which is mixture network architecture with Dense Net and U-net. we compare the performance of these three frameworks. And we evaluated our method on the dataset of data science bowl 2018 challenge. For single model without any ensemble, they all have good performance. △ Less

Submitted 21 November, 2019; originally announced November 2019.

arXiv:1812.00246 [pdf]

A PMU-based Multivariate Model for Classifying Power System Events

Authors: Rui Ma, Sagnik Basumallik, Sara Eftekharnejad

Abstract: Real-time transient event identification is essential for power system situational awareness and protection. The increased penetration of Phasor Measurement Units (PMUs) enhance power system visualization and real time monitoring and control. However, a malicious false data injection attack on PMUs can provide wrong data that might prompt the operator to take incorrect actions which can eventually… ▽ More Real-time transient event identification is essential for power system situational awareness and protection. The increased penetration of Phasor Measurement Units (PMUs) enhance power system visualization and real time monitoring and control. However, a malicious false data injection attack on PMUs can provide wrong data that might prompt the operator to take incorrect actions which can eventually jeopardize system reliability. In this paper, a multivariate method based on text mining is applied to detect false data and identify transient events by analyzing the attributes of each individual PMU time series and their relationship. It is shown that the proposed approach is efficient in detecting false data and identifying each transient event regardless of the system topology and loading condition as well as the coverage rate and placement of PMUs. The proposed method is tested on IEEE 30-bus system and the classification results are provided. △ Less

Submitted 1 December, 2018; originally announced December 2018.

arXiv:1702.03596 [pdf, other]

Novel Baseband Equivalent Models of Quadrature Modulated All-Digital Transmitters

Authors: Omer Tanovic, Rui Ma, Koon Hoo Teo

Abstract: In this paper an exact baseband equivalent model of a quadrature modulated all-digital transmitter is derived. No restrictions on the number of levels of a digital switched-mode power amplifier (SMPA) driving input, nor the pulse encoding scheme employed, are made. This implies a high level of generality of the proposed model. We show that all-digital transmitter (ADT) can be represented as a seri… ▽ More In this paper an exact baseband equivalent model of a quadrature modulated all-digital transmitter is derived. No restrictions on the number of levels of a digital switched-mode power amplifier (SMPA) driving input, nor the pulse encoding scheme employed, are made. This implies a high level of generality of the proposed model. We show that all-digital transmitter (ADT) can be represented as a series connection of the pulse encoder, discrete-time Volterra series model of fixed degree and memory depth, and a linear time-varying system with special properties. This result suggests a new analytically motivated structure of a digital predistortion (DPD) of SMPA nonlinearities in ADT. Numerical simulations in MATLAB are used to verify proposed baseband equivalent model. △ Less

Submitted 12 February, 2017; originally announced February 2017.

Comments: Accepted to the Radio & Wireless Week 2017 (RWW2017)

arXiv:1609.03628 [pdf, other]

Co-active Learning to Adapt Humanoid Movement for Manipulation

Authors: Ren Mao, John S. Baras, Yezhou Yang, Cornelia Fermuller

Abstract: In this paper we address the problem of robot movement adaptation under various environmental constraints interactively. Motion primitives are generally adopted to generate target motion from demonstrations. However, their generalization capability is weak while facing novel environments. Additionally, traditional motion generation methods do not consider the versatile constraints from various use… ▽ More In this paper we address the problem of robot movement adaptation under various environmental constraints interactively. Motion primitives are generally adopted to generate target motion from demonstrations. However, their generalization capability is weak while facing novel environments. Additionally, traditional motion generation methods do not consider the versatile constraints from various users, tasks, and environments. In this work, we propose a co-active learning framework for learning to adapt robot end-effector's movement for manipulation tasks. It is designed to adapt the original imitation trajectories, which are learned from demonstrations, to novel situations with various constraints. The framework also considers user's feedback towards the adapted trajectories, and it learns to adapt movement through human-in-the-loop interactions. The implemented system generalizes trained motion primitives to various situations with different constraints considering user preferences. Experiments on a humanoid platform validate the effectiveness of our approach. △ Less

Submitted 12 September, 2016; originally announced September 2016.

arXiv:1103.5441 [pdf, ps, other]

Nobody but You: Sensor Selection for Voltage Regulation in Smart Grid

Authors: Rukun Mao, Husheng Li

Abstract: The increasing availability of distributed energy resources (DERs) and sensors in smart grid, as well as overlaying communication network, provides substantial potential benefits for improving the power system's reliability. In this paper, the problem of sensor selection is studied for the MAC layer design of wireless sensor networks for regulating the voltages in smart grid. The framework of hybr… ▽ More The increasing availability of distributed energy resources (DERs) and sensors in smart grid, as well as overlaying communication network, provides substantial potential benefits for improving the power system's reliability. In this paper, the problem of sensor selection is studied for the MAC layer design of wireless sensor networks for regulating the voltages in smart grid. The framework of hybrid dynamical system is proposed, using Kalman filter for voltage state estimation and LQR feedback control for voltage adjustment. The approach to obtain the optimal sensor selection sequence is studied. A sub- optimal sequence is obtained by applying the sliding window algorithm. Simulation results show that the proposed sensor selection strategy achieves a 40% performance gain over the baseline algorithm of the round-robin sensor polling. △ Less

Submitted 28 March, 2011; originally announced March 2011.

Comments: 6 pages, submitted to GlOBECOM 2011

Showing 1–37 of 37 results for author: Ma, R