Search | arXiv e-print repository

arXiv:2405.01242 [pdf, other]

TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

Authors: Yueyuan Sui, Minghui Zhao, Junxi Xia, Xiaofan Jiang, Stephen Xia

Abstract: We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art m… ▽ More We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art models with memory footprints of hundreds of MBs and methods better suited for resource-constrained systems. To adapt TRAMBA to vibration-based sensing modalities, we pre-train TRAMBA with audio speech datasets that are widely available. Then, users fine-tune with a small amount of bone conduction data. TRAMBA outperforms state-of-art GANs by up to 7.3% in PESQ and 1.8% in STOI, with an order of magnitude smaller memory footprint and an inference speed up of up to 465 times. We integrate TRAMBA into real systems and show that TRAMBA (i) improves battery life of wearables by up to 160% by requiring less data sampling and transmission; (ii) generates higher quality voice in noisy environments than over-the-air speech; (iii) requires a memory footprint of less than 20.0 MB. △ Less

Submitted 29 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.15284 [pdf, other]

Global 4D Ionospheric STEC Prediction based on DeepONet for GNSS Rays

Authors: Dijia Cai, Zenghui Shi, Haiyang Fu, Huan Liu, Hongyi Qian, Yun Sui, Feng Xu, Ya-Qiu **

Abstract: The ionosphere is a vitally dynamic charged particle region in the Earth's upper atmosphere, playing a crucial role in applications such as radio communication and satellite navigation. The Slant Total Electron Contents (STEC) is an important parameter for characterizing wave propagation, representing the integrated electron density along the ray of radio signals passing through the ionosphere. Th… ▽ More The ionosphere is a vitally dynamic charged particle region in the Earth's upper atmosphere, playing a crucial role in applications such as radio communication and satellite navigation. The Slant Total Electron Contents (STEC) is an important parameter for characterizing wave propagation, representing the integrated electron density along the ray of radio signals passing through the ionosphere. The accurate prediction of STEC is essential for mitigating the ionospheric impact particularly on Global Navigation Satellite Systems (GNSS). In this work, we propose a high-precision STEC prediction model named DeepONet-STEC, which learns nonlinear operators to predict the 4D temporal-spatial integrated parameter for specified ground station - satellite ray path globally. As a demonstration, we validate the performance of the model based on GNSS observation data for global and US-CORS regimes under ionospheric quiet and storm conditions. The DeepONet-STEC model results show that the three-day 72 hour prediction in quiet periods could achieve high accuracy using observation data by the Precise Point Positioning (PPP) with temporal resolution 30s. Under active solar magnetic storm periods, the DeepONet-STEC also demonstrated its robustness and superiority than traditional deep learning methods. This work presents a neural operator regression architecture for predicting the 4D temporal-spatial ionospheric parameter for satellite navigation system performance, which may be further extended for various space applications and beyond. △ Less

Submitted 12 March, 2024; originally announced April 2024.

arXiv:2401.03115 [pdf, other]

Transferable Learned Image Compression-Resistant Adversarial Perturbations

Authors: Yang Sui, Zhuohang Li, Ding Ding, Xiang Pan, Xiaozhong Xu, Shan Liu, Zhenzhong Chen

Abstract: Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. While existing adversarial perturbations are primarily applied to uncompressed images or compressed images by the traditional image compression method, i.e., JPEG, limited studies have investigated the robustness of models for image classification in the context of D… ▽ More Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. While existing adversarial perturbations are primarily applied to uncompressed images or compressed images by the traditional image compression method, i.e., JPEG, limited studies have investigated the robustness of models for image classification in the context of DNN-based image compression. With the rapid evolution of advanced image compression, DNN-based learned image compression has emerged as the promising approach for transmitting images in many security-critical applications, such as cloud-based face recognition and autonomous driving, due to its superior performance over traditional compression. Therefore, there is a pressing need to fully investigate the robustness of a classification system post-processed by learned image compression. To bridge this research gap, we explore the adversarial attack on a new pipeline that targets image classification models that utilize learned image compressors as pre-processing modules. Furthermore, to enhance the transferability of perturbations across various quality levels and architectures of learned image compression models, we introduce a saliency score-based sampling method to enable the fast generation of transferable perturbation. Extensive experiments with popular attack methods demonstrate the enhanced transferability of our proposed method when attacking images that have been post-processed with different learned image compression models. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: Accepted as poster at Data Compression Conference 2024 (DCC 2024)

arXiv:2312.10343 [pdf, other]

In-Sensor Radio Frequency Computing for Energy-Efficient Intelligent Radar

Authors: Yang Sui, Minning Zhu, Lingyi Huang, Chung-Tse Michael Wu, Bo Yuan

Abstract: Radio Frequency Neural Networks (RFNNs) have demonstrated advantages in realizing intelligent applications across various domains. However, as the model size of deep neural networks rapidly increases, implementing large-scale RFNN in practice requires an extensive number of RF interferometers and consumes a substantial amount of energy. To address this challenge, we propose to utilize low-rank dec… ▽ More Radio Frequency Neural Networks (RFNNs) have demonstrated advantages in realizing intelligent applications across various domains. However, as the model size of deep neural networks rapidly increases, implementing large-scale RFNN in practice requires an extensive number of RF interferometers and consumes a substantial amount of energy. To address this challenge, we propose to utilize low-rank decomposition to transform a large-scale RFNN into a compact RFNN while almost preserving its accuracy. Specifically, we develop a Tensor-Train RFNN (TT-RFNN) where each layer comprises a sequence of low-rank third-order tensors, leading to a notable reduction in parameter count, thereby optimizing RF interferometer utilization in comparison to the original large-scale RFNN. Additionally, considering the inherent physical errors when map** TT-RFNN to RF device parameters in real-world deployment, from a general perspective, we construct the Robust TT-RFNN (RTT-RFNN) by incorporating a robustness solver on TT-RFNN to enhance its robustness. To adapt the RTT-RFNN to varying requirements of resha** operations, we further provide a reconfigurable resha** solution employing RF switch matrices. Empirical evaluations conducted on MNIST and CIFAR-10 datasets show the effectiveness of our proposed method. △ Less

Submitted 16 December, 2023; originally announced December 2023.

arXiv:2311.18103 [pdf, other]

Corner-to-Center Long-range Context Model for Efficient Learned Image Compression

Authors: Yang Sui, Ding Ding, Xiang Pan, Xiaozhong Xu, Shan Liu, Bo Yuan, Zhenzhong Chen

Abstract: In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. To reduce the decoding time resulting from the serial autoregressive context model, the parallel context model has been proposed as an alternative that necessitates only two passes during the decoding phase, thus facilitating efficient image compression… ▽ More In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. To reduce the decoding time resulting from the serial autoregressive context model, the parallel context model has been proposed as an alternative that necessitates only two passes during the decoding phase, thus facilitating efficient image compression in real-world scenarios. However, performance degradation occurs due to its incomplete casual context. To tackle this issue, we conduct an in-depth analysis of the performance degradation observed in existing parallel context models, focusing on two aspects: the Quantity and Quality of information utilized for context prediction and decoding. Based on such analysis, we propose the \textbf{Corner-to-Center transformer-based Context Model (C$^3$M)} designed to enhance context and latent predictions and improve rate-distortion performance. Specifically, we leverage the logarithmic-based prediction order to predict more context features from corner to center progressively. In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder to capture the long-range semantic information by assigning the different window shapes in different channels. Extensive experimental evaluations show that the proposed method is effective and outperforms the state-of-the-art parallel methods. Finally, according to the subjective analysis, we suggest that improving the detailed representation in transformer-based image compression is a promising direction to be explored. △ Less

Submitted 29 November, 2023; originally announced November 2023.

arXiv:2304.14508 [pdf]

3D Brainformer: 3D Fusion Transformer for Brain Tumor Segmentation

Authors: Rui Nian, Guoyao Zhang, Yao Sui, Yuqi Qian, Qiuying Li, Mingzhang Zhao, Jianhui Li, Ali Gholipour, Simon K. Warfield

Abstract: Magnetic resonance imaging (MRI) is critically important for brain map** in both scientific research and clinical studies. Precise segmentation of brain tumors facilitates clinical diagnosis, evaluations, and surgical planning. Deep learning has recently emerged to improve brain tumor segmentation and achieved impressive results. Convolutional architectures are widely used to implement those neu… ▽ More Magnetic resonance imaging (MRI) is critically important for brain map** in both scientific research and clinical studies. Precise segmentation of brain tumors facilitates clinical diagnosis, evaluations, and surgical planning. Deep learning has recently emerged to improve brain tumor segmentation and achieved impressive results. Convolutional architectures are widely used to implement those neural networks. By the nature of limited receptive fields, however, those architectures are subject to representing long-range spatial dependencies of the voxel intensities in MRI images. Transformers have been leveraged recently to address the above limitations of convolutional networks. Unfortunately, the majority of current Transformers-based methods in segmentation are performed with 2D MRI slices, instead of 3D volumes. Moreover, it is difficult to incorporate the structures between layers because each head is calculated independently in the Multi-Head Self-Attention mechanism (MHSA). In this work, we proposed a 3D Transformer-based segmentation approach. We developed a Fusion-Head Self-Attention mechanism (FHSA) to combine each attention head through attention logic and weight map**, for the exploration of the long-range spatial dependencies in 3D MRI images. We implemented a plug-and-play self-attention module, named the Infinite Deformable Fusion Transformer Module (IDFTM), to extract features on any deformable feature maps. We applied our approach to the task of brain tumor segmentation, and assessed it on the public BRATS datasets. The experimental results demonstrated that our proposed approach achieved superior performance, in comparison to several state-of-the-art segmentation methods. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: 10 pages, 4 figures

MSC Class: 68T07 ACM Class: I.4.6; I.5.1

arXiv:2302.03227 [pdf, other]

Automatic Sleep Stage Classification with Cross-modal Self-supervised Features from Deep Brain Signals

Authors: Chen Gong, Yue Chen, Yanan Sui, Luming Li

Abstract: The detection of human sleep stages is widely used in the diagnosis and intervention of neurological and psychiatric diseases. Some patients with deep brain stimulator implanted could have their neural activities recorded from the deep brain. Sleep stage classification based on deep brain recording has great potential to provide more precise treatment for patients. The accuracy and generalizabilit… ▽ More The detection of human sleep stages is widely used in the diagnosis and intervention of neurological and psychiatric diseases. Some patients with deep brain stimulator implanted could have their neural activities recorded from the deep brain. Sleep stage classification based on deep brain recording has great potential to provide more precise treatment for patients. The accuracy and generalizability of existing sleep stage classifiers based on local field potentials are still limited. We proposed an applicable cross-modal transfer learning method for sleep stage classification with implanted devices. This end-to-end deep learning model contained cross-modal self-supervised feature representation, self-attention, and classification framework. We tested the model with deep brain recording data from 12 patients with Parkinson's disease. The best total accuracy reached 83.2% for sleep stage classification. Results showed speech self-supervised features catch the conversion pattern of sleep stages effectively. We provide a new method on transfer learning from acoustic signals to local field potentials. This method supports an effective solution for the insufficient scale of clinical data. This sleep stage classification model could be adapted to chronic and continuous monitor sleep for Parkinson's patients in daily life, and potentially utilized for more precise treatment in deep brain-machine interfaces, such as closed-loop deep brain stimulation. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Comments: 4 pages, 5 figures, 11th International IEEE EMBS Conference on Neural Engineering (NER)

arXiv:2211.10012 [pdf, other]

doi 10.1109/MC.2022.3223646

A Tale of Two Cities: Data and Configuration Variances in Robust Deep Learning

Authors: Guanqin Zhang, Jiankun Sun, Feng Xu, H. M. N. Dilum Bandara, Shi** Chen, Yulei Sui, Tim Menzies

Abstract: Deep neural networks (DNNs), are widely used in many industries such as image recognition, supply chain, medical diagnosis, and autonomous driving. However, prior work has shown the high accuracy of a DNN model does not imply high robustness (i.e., consistent performances on new and future datasets) because the input data and external environment (e.g., software and model configurations) for a dep… ▽ More Deep neural networks (DNNs), are widely used in many industries such as image recognition, supply chain, medical diagnosis, and autonomous driving. However, prior work has shown the high accuracy of a DNN model does not imply high robustness (i.e., consistent performances on new and future datasets) because the input data and external environment (e.g., software and model configurations) for a deployed model are constantly changing. Hence, ensuring the robustness of deep learning is not an option but a priority to enhance business and consumer confidence. Previous studies mostly focus on the data aspect of model variance. In this article, we systematically summarize DNN robustness issues and formulate them in a holistic view through two important aspects, i.e., data and software configuration variances in DNNs. We also provide a predictive framework to generate representative variances (counterexamples) by considering both data and configurations for robust learning through the lens of search-based optimization. △ Less

Submitted 25 November, 2022; v1 submitted 17 November, 2022; originally announced November 2022.

arXiv:2206.10119 [pdf]

Optimization simulation of reflow welding based on prediction of regional center temperature field

Authors: Yuan Sui, Fan-yang Bu, Zi-long Shao, Wei Yan

Abstract: Before reflow soldering of integrated electronic products, the numerical simulation of temperature control curve of reflow furnace is crucial for selecting proper parameters and improving the overall efficiency of reflow soldering process and product quality. According to the heat conduction law and the specific heat capacity formula, the first-order ordinary differential equation of the central t… ▽ More Before reflow soldering of integrated electronic products, the numerical simulation of temperature control curve of reflow furnace is crucial for selecting proper parameters and improving the overall efficiency of reflow soldering process and product quality. According to the heat conduction law and the specific heat capacity formula, the first-order ordinary differential equation of the central temperature curve of the welding area with respect to the temperature distribution function in the furnace on the conveyor belt displacement is obtained. For the gap with small temperature difference, the sigmoid function is used to obtain a smooth interval temperature transition curve; For the gap with large temperature difference, the linear combination of exponential function and primary function is used to approach the actual concave function, so as to obtain the complete temperature distribution function in the furnace. The welding parameters are obtained by solving the ordinary differential equation, and a set of optimal process parameters consistent with the process boundary are obtained by calculating the mean square error between the predicted temperature field and the real temperature distribution. At the same time, a set of reflow optimization strategies are designed for speed interval prediction strategy, minimum parameter interval prediction strategy, and the most symmetrical parameter interval prediction of solder paste melting reflow area. The simulation results show that the temperature field prediction results obtained by this method are highly consistent with the actual sensor data, and have strong correlation. This method can help to select appropriate process parameters, optimize the production process, reduce equipment commissioning practice and optimize the solder joint quality of production products. △ Less

Submitted 21 June, 2022; originally announced June 2022.

Comments: in Chinese language. Journal of Computer Simulation

arXiv:2105.14704 [pdf, other]

Parkinsonian Chinese Speech Analysis towards Automatic Classification of Parkinson's Disease

Authors: Hao Fang, Chen Gong, Chen Zhang, Yanan Sui, Luming Li

Abstract: Speech disorders often occur at the early stage of Parkinson's disease (PD). The speech impairments could be indicators of the disorder for early diagnosis, while motor symptoms are not obvious. In this study, we constructed a new speech corpus of Mandarin Chinese and addressed classification of patients with PD. We implemented classical machine learning methods with ranking algorithms for feature… ▽ More Speech disorders often occur at the early stage of Parkinson's disease (PD). The speech impairments could be indicators of the disorder for early diagnosis, while motor symptoms are not obvious. In this study, we constructed a new speech corpus of Mandarin Chinese and addressed classification of patients with PD. We implemented classical machine learning methods with ranking algorithms for feature selection, convolutional and recurrent deep networks, and an end to end system. Our classification accuracy significantly surpassed state-of-the-art studies. The result suggests that free talk has stronger classification power than standard speech tasks, which could help the design of future speech tasks for efficient early diagnosis of the disease. Based on existing classification methods and our natural speech study, the automatic detection of PD from daily conversation could be accessible to the majority of the clinical population. △ Less

Submitted 31 May, 2021; originally announced May 2021.

Comments: 12 pages, 5 figures, proceedings of the Machine Learning for Health NeurIPS Workshop, PMLR 136:114-125, 2020

arXiv:2011.06341 [pdf]

Hardware Complexity Aware Design Strategy for a Fused Logarithmic and Anti-Logarithmic Converter

Authors: Botao Xiong, Yuanfeng Sui

Abstract: The logarithmic and anti-logarithmic converters are realized with the piecewise linear approximation method, which is implemented by the shift-and-add architecture. This brief utilizes the similarities of Log and Antilog functions so that the adder tree block and multiplexer block can be shared by the Log and Antilog converters. As a result, the Antilog function can be implemented by the Log conve… ▽ More The logarithmic and anti-logarithmic converters are realized with the piecewise linear approximation method, which is implemented by the shift-and-add architecture. This brief utilizes the similarities of Log and Antilog functions so that the adder tree block and multiplexer block can be shared by the Log and Antilog converters. As a result, the Antilog function can be implemented by the Log converter at the cost of additional 14% area and 6% latency. It implies the shift-and-add architecture can approximate multiple similar nonlinear functions with a slightly hardware cost. In addition, this brief proposes a set of formulas to predict the area and latency of shift-and-add architecture with different quantized coefficients that can facilitate the finding of a trade-off point in the Latency-Area-Precision space. △ Less

Submitted 12 November, 2020; originally announced November 2020.

Showing 1–11 of 11 results for author: Sui, Y