-
TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms
Authors:
Yueyuan Sui,
Minghui Zhao,
Junxi Xia,
Xiaofan Jiang,
Stephen Xia
Abstract:
We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art m…
▽ More
We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art models with memory footprints of hundreds of MBs and methods better suited for resource-constrained systems. To adapt TRAMBA to vibration-based sensing modalities, we pre-train TRAMBA with audio speech datasets that are widely available. Then, users fine-tune with a small amount of bone conduction data. TRAMBA outperforms state-of-art GANs by up to 7.3% in PESQ and 1.8% in STOI, with an order of magnitude smaller memory footprint and an inference speed up of up to 465 times. We integrate TRAMBA into real systems and show that TRAMBA (i) improves battery life of wearables by up to 160% by requiring less data sampling and transmission; (ii) generates higher quality voice in noisy environments than over-the-air speech; (iii) requires a memory footprint of less than 20.0 MB.
△ Less
Submitted 29 May, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Global 4D Ionospheric STEC Prediction based on DeepONet for GNSS Rays
Authors:
Dijia Cai,
Zenghui Shi,
Haiyang Fu,
Huan Liu,
Hongyi Qian,
Yun Sui,
Feng Xu,
Ya-Qiu **
Abstract:
The ionosphere is a vitally dynamic charged particle region in the Earth's upper atmosphere, playing a crucial role in applications such as radio communication and satellite navigation. The Slant Total Electron Contents (STEC) is an important parameter for characterizing wave propagation, representing the integrated electron density along the ray of radio signals passing through the ionosphere. Th…
▽ More
The ionosphere is a vitally dynamic charged particle region in the Earth's upper atmosphere, playing a crucial role in applications such as radio communication and satellite navigation. The Slant Total Electron Contents (STEC) is an important parameter for characterizing wave propagation, representing the integrated electron density along the ray of radio signals passing through the ionosphere. The accurate prediction of STEC is essential for mitigating the ionospheric impact particularly on Global Navigation Satellite Systems (GNSS). In this work, we propose a high-precision STEC prediction model named DeepONet-STEC, which learns nonlinear operators to predict the 4D temporal-spatial integrated parameter for specified ground station - satellite ray path globally. As a demonstration, we validate the performance of the model based on GNSS observation data for global and US-CORS regimes under ionospheric quiet and storm conditions. The DeepONet-STEC model results show that the three-day 72 hour prediction in quiet periods could achieve high accuracy using observation data by the Precise Point Positioning (PPP) with temporal resolution 30s. Under active solar magnetic storm periods, the DeepONet-STEC also demonstrated its robustness and superiority than traditional deep learning methods. This work presents a neural operator regression architecture for predicting the 4D temporal-spatial ionospheric parameter for satellite navigation system performance, which may be further extended for various space applications and beyond.
△ Less
Submitted 12 March, 2024;
originally announced April 2024.
-
Transferable Learned Image Compression-Resistant Adversarial Perturbations
Authors:
Yang Sui,
Zhuohang Li,
Ding Ding,
Xiang Pan,
Xiaozhong Xu,
Shan Liu,
Zhenzhong Chen
Abstract:
Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. While existing adversarial perturbations are primarily applied to uncompressed images or compressed images by the traditional image compression method, i.e., JPEG, limited studies have investigated the robustness of models for image classification in the context of D…
▽ More
Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. While existing adversarial perturbations are primarily applied to uncompressed images or compressed images by the traditional image compression method, i.e., JPEG, limited studies have investigated the robustness of models for image classification in the context of DNN-based image compression. With the rapid evolution of advanced image compression, DNN-based learned image compression has emerged as the promising approach for transmitting images in many security-critical applications, such as cloud-based face recognition and autonomous driving, due to its superior performance over traditional compression. Therefore, there is a pressing need to fully investigate the robustness of a classification system post-processed by learned image compression. To bridge this research gap, we explore the adversarial attack on a new pipeline that targets image classification models that utilize learned image compressors as pre-processing modules. Furthermore, to enhance the transferability of perturbations across various quality levels and architectures of learned image compression models, we introduce a saliency score-based sampling method to enable the fast generation of transferable perturbation. Extensive experiments with popular attack methods demonstrate the enhanced transferability of our proposed method when attacking images that have been post-processed with different learned image compression models.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
In-Sensor Radio Frequency Computing for Energy-Efficient Intelligent Radar
Authors:
Yang Sui,
Minning Zhu,
Lingyi Huang,
Chung-Tse Michael Wu,
Bo Yuan
Abstract:
Radio Frequency Neural Networks (RFNNs) have demonstrated advantages in realizing intelligent applications across various domains. However, as the model size of deep neural networks rapidly increases, implementing large-scale RFNN in practice requires an extensive number of RF interferometers and consumes a substantial amount of energy. To address this challenge, we propose to utilize low-rank dec…
▽ More
Radio Frequency Neural Networks (RFNNs) have demonstrated advantages in realizing intelligent applications across various domains. However, as the model size of deep neural networks rapidly increases, implementing large-scale RFNN in practice requires an extensive number of RF interferometers and consumes a substantial amount of energy. To address this challenge, we propose to utilize low-rank decomposition to transform a large-scale RFNN into a compact RFNN while almost preserving its accuracy. Specifically, we develop a Tensor-Train RFNN (TT-RFNN) where each layer comprises a sequence of low-rank third-order tensors, leading to a notable reduction in parameter count, thereby optimizing RF interferometer utilization in comparison to the original large-scale RFNN. Additionally, considering the inherent physical errors when map** TT-RFNN to RF device parameters in real-world deployment, from a general perspective, we construct the Robust TT-RFNN (RTT-RFNN) by incorporating a robustness solver on TT-RFNN to enhance its robustness. To adapt the RTT-RFNN to varying requirements of resha** operations, we further provide a reconfigurable resha** solution employing RF switch matrices. Empirical evaluations conducted on MNIST and CIFAR-10 datasets show the effectiveness of our proposed method.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
Corner-to-Center Long-range Context Model for Efficient Learned Image Compression
Authors:
Yang Sui,
Ding Ding,
Xiang Pan,
Xiaozhong Xu,
Shan Liu,
Bo Yuan,
Zhenzhong Chen
Abstract:
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. To reduce the decoding time resulting from the serial autoregressive context model, the parallel context model has been proposed as an alternative that necessitates only two passes during the decoding phase, thus facilitating efficient image compression…
▽ More
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. To reduce the decoding time resulting from the serial autoregressive context model, the parallel context model has been proposed as an alternative that necessitates only two passes during the decoding phase, thus facilitating efficient image compression in real-world scenarios. However, performance degradation occurs due to its incomplete casual context. To tackle this issue, we conduct an in-depth analysis of the performance degradation observed in existing parallel context models, focusing on two aspects: the Quantity and Quality of information utilized for context prediction and decoding. Based on such analysis, we propose the \textbf{Corner-to-Center transformer-based Context Model (C$^3$M)} designed to enhance context and latent predictions and improve rate-distortion performance. Specifically, we leverage the logarithmic-based prediction order to predict more context features from corner to center progressively. In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder to capture the long-range semantic information by assigning the different window shapes in different channels. Extensive experimental evaluations show that the proposed method is effective and outperforms the state-of-the-art parallel methods. Finally, according to the subjective analysis, we suggest that improving the detailed representation in transformer-based image compression is a promising direction to be explored.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
3D Brainformer: 3D Fusion Transformer for Brain Tumor Segmentation
Authors:
Rui Nian,
Guoyao Zhang,
Yao Sui,
Yuqi Qian,
Qiuying Li,
Mingzhang Zhao,
Jianhui Li,
Ali Gholipour,
Simon K. Warfield
Abstract:
Magnetic resonance imaging (MRI) is critically important for brain map** in both scientific research and clinical studies. Precise segmentation of brain tumors facilitates clinical diagnosis, evaluations, and surgical planning. Deep learning has recently emerged to improve brain tumor segmentation and achieved impressive results. Convolutional architectures are widely used to implement those neu…
▽ More
Magnetic resonance imaging (MRI) is critically important for brain map** in both scientific research and clinical studies. Precise segmentation of brain tumors facilitates clinical diagnosis, evaluations, and surgical planning. Deep learning has recently emerged to improve brain tumor segmentation and achieved impressive results. Convolutional architectures are widely used to implement those neural networks. By the nature of limited receptive fields, however, those architectures are subject to representing long-range spatial dependencies of the voxel intensities in MRI images. Transformers have been leveraged recently to address the above limitations of convolutional networks. Unfortunately, the majority of current Transformers-based methods in segmentation are performed with 2D MRI slices, instead of 3D volumes. Moreover, it is difficult to incorporate the structures between layers because each head is calculated independently in the Multi-Head Self-Attention mechanism (MHSA). In this work, we proposed a 3D Transformer-based segmentation approach. We developed a Fusion-Head Self-Attention mechanism (FHSA) to combine each attention head through attention logic and weight map**, for the exploration of the long-range spatial dependencies in 3D MRI images. We implemented a plug-and-play self-attention module, named the Infinite Deformable Fusion Transformer Module (IDFTM), to extract features on any deformable feature maps. We applied our approach to the task of brain tumor segmentation, and assessed it on the public BRATS datasets. The experimental results demonstrated that our proposed approach achieved superior performance, in comparison to several state-of-the-art segmentation methods.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
Automatic Sleep Stage Classification with Cross-modal Self-supervised Features from Deep Brain Signals
Authors:
Chen Gong,
Yue Chen,
Yanan Sui,
Luming Li
Abstract:
The detection of human sleep stages is widely used in the diagnosis and intervention of neurological and psychiatric diseases. Some patients with deep brain stimulator implanted could have their neural activities recorded from the deep brain. Sleep stage classification based on deep brain recording has great potential to provide more precise treatment for patients. The accuracy and generalizabilit…
▽ More
The detection of human sleep stages is widely used in the diagnosis and intervention of neurological and psychiatric diseases. Some patients with deep brain stimulator implanted could have their neural activities recorded from the deep brain. Sleep stage classification based on deep brain recording has great potential to provide more precise treatment for patients. The accuracy and generalizability of existing sleep stage classifiers based on local field potentials are still limited. We proposed an applicable cross-modal transfer learning method for sleep stage classification with implanted devices. This end-to-end deep learning model contained cross-modal self-supervised feature representation, self-attention, and classification framework. We tested the model with deep brain recording data from 12 patients with Parkinson's disease. The best total accuracy reached 83.2% for sleep stage classification. Results showed speech self-supervised features catch the conversion pattern of sleep stages effectively. We provide a new method on transfer learning from acoustic signals to local field potentials. This method supports an effective solution for the insufficient scale of clinical data. This sleep stage classification model could be adapted to chronic and continuous monitor sleep for Parkinson's patients in daily life, and potentially utilized for more precise treatment in deep brain-machine interfaces, such as closed-loop deep brain stimulation.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
A Tale of Two Cities: Data and Configuration Variances in Robust Deep Learning
Authors:
Guanqin Zhang,
Jiankun Sun,
Feng Xu,
H. M. N. Dilum Bandara,
Shi** Chen,
Yulei Sui,
Tim Menzies
Abstract:
Deep neural networks (DNNs), are widely used in many industries such as image recognition, supply chain, medical diagnosis, and autonomous driving. However, prior work has shown the high accuracy of a DNN model does not imply high robustness (i.e., consistent performances on new and future datasets) because the input data and external environment (e.g., software and model configurations) for a dep…
▽ More
Deep neural networks (DNNs), are widely used in many industries such as image recognition, supply chain, medical diagnosis, and autonomous driving. However, prior work has shown the high accuracy of a DNN model does not imply high robustness (i.e., consistent performances on new and future datasets) because the input data and external environment (e.g., software and model configurations) for a deployed model are constantly changing. Hence, ensuring the robustness of deep learning is not an option but a priority to enhance business and consumer confidence. Previous studies mostly focus on the data aspect of model variance. In this article, we systematically summarize DNN robustness issues and formulate them in a holistic view through two important aspects, i.e., data and software configuration variances in DNNs. We also provide a predictive framework to generate representative variances (counterexamples) by considering both data and configurations for robust learning through the lens of search-based optimization.
△ Less
Submitted 25 November, 2022; v1 submitted 17 November, 2022;
originally announced November 2022.
-
Optimization simulation of reflow welding based on prediction of regional center temperature field
Authors:
Yuan Sui,
Fan-yang Bu,
Zi-long Shao,
Wei Yan
Abstract:
Before reflow soldering of integrated electronic products, the numerical simulation of temperature control curve of reflow furnace is crucial for selecting proper parameters and improving the overall efficiency of reflow soldering process and product quality. According to the heat conduction law and the specific heat capacity formula, the first-order ordinary differential equation of the central t…
▽ More
Before reflow soldering of integrated electronic products, the numerical simulation of temperature control curve of reflow furnace is crucial for selecting proper parameters and improving the overall efficiency of reflow soldering process and product quality. According to the heat conduction law and the specific heat capacity formula, the first-order ordinary differential equation of the central temperature curve of the welding area with respect to the temperature distribution function in the furnace on the conveyor belt displacement is obtained. For the gap with small temperature difference, the sigmoid function is used to obtain a smooth interval temperature transition curve; For the gap with large temperature difference, the linear combination of exponential function and primary function is used to approach the actual concave function, so as to obtain the complete temperature distribution function in the furnace. The welding parameters are obtained by solving the ordinary differential equation, and a set of optimal process parameters consistent with the process boundary are obtained by calculating the mean square error between the predicted temperature field and the real temperature distribution. At the same time, a set of reflow optimization strategies are designed for speed interval prediction strategy, minimum parameter interval prediction strategy, and the most symmetrical parameter interval prediction of solder paste melting reflow area. The simulation results show that the temperature field prediction results obtained by this method are highly consistent with the actual sensor data, and have strong correlation. This method can help to select appropriate process parameters, optimize the production process, reduce equipment commissioning practice and optimize the solder joint quality of production products.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
Parkinsonian Chinese Speech Analysis towards Automatic Classification of Parkinson's Disease
Authors:
Hao Fang,
Chen Gong,
Chen Zhang,
Yanan Sui,
Luming Li
Abstract:
Speech disorders often occur at the early stage of Parkinson's disease (PD). The speech impairments could be indicators of the disorder for early diagnosis, while motor symptoms are not obvious. In this study, we constructed a new speech corpus of Mandarin Chinese and addressed classification of patients with PD. We implemented classical machine learning methods with ranking algorithms for feature…
▽ More
Speech disorders often occur at the early stage of Parkinson's disease (PD). The speech impairments could be indicators of the disorder for early diagnosis, while motor symptoms are not obvious. In this study, we constructed a new speech corpus of Mandarin Chinese and addressed classification of patients with PD. We implemented classical machine learning methods with ranking algorithms for feature selection, convolutional and recurrent deep networks, and an end to end system. Our classification accuracy significantly surpassed state-of-the-art studies. The result suggests that free talk has stronger classification power than standard speech tasks, which could help the design of future speech tasks for efficient early diagnosis of the disease. Based on existing classification methods and our natural speech study, the automatic detection of PD from daily conversation could be accessible to the majority of the clinical population.
△ Less
Submitted 31 May, 2021;
originally announced May 2021.
-
Hardware Complexity Aware Design Strategy for a Fused Logarithmic and Anti-Logarithmic Converter
Authors:
Botao Xiong,
Yuanfeng Sui
Abstract:
The logarithmic and anti-logarithmic converters are realized with the piecewise linear approximation method, which is implemented by the shift-and-add architecture. This brief utilizes the similarities of Log and Antilog functions so that the adder tree block and multiplexer block can be shared by the Log and Antilog converters. As a result, the Antilog function can be implemented by the Log conve…
▽ More
The logarithmic and anti-logarithmic converters are realized with the piecewise linear approximation method, which is implemented by the shift-and-add architecture. This brief utilizes the similarities of Log and Antilog functions so that the adder tree block and multiplexer block can be shared by the Log and Antilog converters. As a result, the Antilog function can be implemented by the Log converter at the cost of additional 14% area and 6% latency. It implies the shift-and-add architecture can approximate multiple similar nonlinear functions with a slightly hardware cost. In addition, this brief proposes a set of formulas to predict the area and latency of shift-and-add architecture with different quantized coefficients that can facilitate the finding of a trade-off point in the Latency-Area-Precision space.
△ Less
Submitted 12 November, 2020;
originally announced November 2020.