Search | arXiv e-print repository

Learning System Dynamics without Forgetting

Authors: Xikun Zhang, Dong** Song, Yushan Jiang, Yixin Chen, Dacheng Tao

Abstract: Predicting the trajectories of systems with unknown dynamics (\textit{i.e.} the governing rules) is crucial in various research fields, including physics and biology. This challenge has gathered significant attention from diverse communities. Most existing works focus on learning fixed system dynamics within one single system. However, real-world applications often involve multiple systems with di… ▽ More Predicting the trajectories of systems with unknown dynamics (\textit{i.e.} the governing rules) is crucial in various research fields, including physics and biology. This challenge has gathered significant attention from diverse communities. Most existing works focus on learning fixed system dynamics within one single system. However, real-world applications often involve multiple systems with different types of dynamics or evolving systems with non-stationary dynamics (dynamics shifts). When data from those systems are continuously collected and sequentially fed to machine learning models for training, these models tend to be biased toward the most recently learned dynamics, leading to catastrophic forgetting of previously observed/learned system dynamics. To this end, we aim to learn system dynamics via continual learning. Specifically, we present a novel framework of Mode-switching Graph ODE (MS-GODE), which can continually learn varying dynamics and encode the system-specific dynamics into binary masks over the model parameters. During the inference stage, the model can select the most confident mask based on the observational data to identify the system and predict future trajectories accordingly. Empirically, we systematically investigate the task configurations and compare the proposed MS-GODE with state-of-the-art techniques. More importantly, we construct a novel benchmark of biological dynamic systems, featuring diverse systems with disparate dynamics and significantly enriching the research field of machine learning for dynamic systems. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2405.07260 [pdf]

A Supervised Information Enhanced Multi-Granularity Contrastive Learning Framework for EEG Based Emotion Recognition

Authors: Xiang Li, Jian Song, Zhigang Zhao, Chunxiao Wang, Dawei Song, Bin Hu

Abstract: This study introduces a novel Supervised Info-enhanced Contrastive Learning framework for EEG based Emotion Recognition (SICLEER). SI-CLEER employs multi-granularity contrastive learning to create robust EEG contextual representations, potentiallyn improving emotion recognition effectiveness. Unlike existing methods solely guided by classification loss, we propose a joint learning model combining… ▽ More This study introduces a novel Supervised Info-enhanced Contrastive Learning framework for EEG based Emotion Recognition (SICLEER). SI-CLEER employs multi-granularity contrastive learning to create robust EEG contextual representations, potentiallyn improving emotion recognition effectiveness. Unlike existing methods solely guided by classification loss, we propose a joint learning model combining self-supervised contrastive learning loss and supervised classification loss. This model optimizes both loss functions, capturing subtle EEG signal differences specific to emotion detection. Extensive experiments demonstrate SI-CLEER's robustness and superior accuracy on the SEED dataset compared to state-of-the-art methods. Furthermore, we analyze electrode performance, highlighting the significance of central frontal and temporal brain region EEGs in emotion detection. This study offers an universally applicable approach with potential benefits for diverse EEG classification tasks. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: 5 pages, 3 figures, 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2312.03900 [pdf, other]

Community Detection in High-Dimensional Graph Ensembles

Authors: Robert Malinas, Dogyoon Song, Alfred O. Hero III

Abstract: Detecting communities in high-dimensional graphs can be achieved by applying random matrix theory where the adjacency matrix of the graph is modeled by a Stochastic Block Model (SBM). However, the SBM makes an unrealistic assumption that the edge probabilities are homogeneous within communities, i.e., the edges occur with the same probabilities. The Degree-Corrected SBM is a generalization of the… ▽ More Detecting communities in high-dimensional graphs can be achieved by applying random matrix theory where the adjacency matrix of the graph is modeled by a Stochastic Block Model (SBM). However, the SBM makes an unrealistic assumption that the edge probabilities are homogeneous within communities, i.e., the edges occur with the same probabilities. The Degree-Corrected SBM is a generalization of the SBM that allows these edge probabilities to be different, but existing results from random matrix theory are not directly applicable to this heterogeneous model. In this paper, we derive a transformation of the adjacency matrix that eliminates this heterogeneity and preserves the relevant eigenstructure for community detection. We propose a test based on the extreme eigenvalues of this transformed matrix and (1) provide a method for controlling the significance level, (2) formulate a conjecture that the test achieves power one for all positive significance levels in the limit as the number of nodes approaches infinity, and (3) provide empirical evidence and theory supporting these claims. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: 8 pages, 3 figures

arXiv:2307.16109 [pdf, other]

doi 10.1016/j.dsp.2024.104633

A Message Passing Detection based Affine Frequency Division Multiplexing Communication System

Authors: Lifan Wu, Shan Luo, Dongxiao Song, Fan Yang, Rong** Lin

Abstract: The next generation of wireless communication technology is anticipated to address the communication reliability challenges encountered in high-speed mobile communication scenarios. An Orthogonal Time Frequency Space (OTFS) system has been introduced as a solution that effectively mitigates these issues. However, OTFS is associated with relatively high pilot overhead and multiuser multiplexing ove… ▽ More The next generation of wireless communication technology is anticipated to address the communication reliability challenges encountered in high-speed mobile communication scenarios. An Orthogonal Time Frequency Space (OTFS) system has been introduced as a solution that effectively mitigates these issues. However, OTFS is associated with relatively high pilot overhead and multiuser multiplexing overhead. In response to these concerns within the OTFS framework, a novel modulation technology known as Affine Frequency Division Multiplexing (AFDM) which is based on the discrete affine Fourier transform has emerged. AFDM effectively resolves the challenges by achieving full diversity through parameter adjustments aligned with the channel's delay-Doppler profile. Consequently, AFDM is capable of achieving performance levels comparable to OTFS. As the research on AFDM detection is currently limited, we present a low-complexity yet efficient message passing (MP) algorithm. This algorithm handles joint interference cancellation and detection while capitalizing on the inherent sparsity of the channel. Based on simulation results, the MP detection algorithm outperforms Minimum Mean Square Error (MMSE) and Maximal Ratio Combining (MRC) detection techniques. △ Less

Submitted 30 August, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

Comments: 8 pages, 7 figures

arXiv:2306.10125 [pdf, other]

Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects

Authors: Kexin Zhang, Qingsong Wen, Chaoli Zhang, Rongyao Cai, Ming **, Yong Liu, James Zhang, Yuxuan Liang, Guansong Pang, Dong** Song, Shirui Pan

Abstract: Self-supervised learning (SSL) has recently achieved impressive performance on various time series tasks. The most prominent advantage of SSL is that it reduces the dependence on labeled data. Based on the pre-training and fine-tuning strategy, even a small amount of labeled data can achieve high performance. Compared with many published self-supervised surveys on computer vision and natural langu… ▽ More Self-supervised learning (SSL) has recently achieved impressive performance on various time series tasks. The most prominent advantage of SSL is that it reduces the dependence on labeled data. Based on the pre-training and fine-tuning strategy, even a small amount of labeled data can achieve high performance. Compared with many published self-supervised surveys on computer vision and natural language processing, a comprehensive survey for time series SSL is still missing. To fill this gap, we review current state-of-the-art SSL methods for time series data in this article. To this end, we first comprehensively review existing surveys related to SSL and time series, and then provide a new taxonomy of existing time series SSL methods by summarizing them from three perspectives: generative-based, contrastive-based, and adversarial-based. These methods are further divided into ten subcategories with detailed reviews and discussions about their key intuitions, main frameworks, advantages and disadvantages. To facilitate the experiments and validation of time series SSL methods, we also summarize datasets commonly used in time series forecasting, classification, anomaly detection, and clustering tasks. Finally, we present the future directions of SSL for time series analysis. △ Less

Submitted 8 April, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI); 26 pages, 200+ references; the first work to comprehensively and systematically summarize self-supervised learning for time series analysis (SSL4TS). The GitHub repository is https://github.com/qingsongedu/Awesome-SSL4TS

arXiv:2306.08998 [pdf, other]

Team AcieLee: Technical Report for EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023

Authors: Yuqi Li, Yizhi Luo, Xiaoshuai Hao, Chuanguang Yang, Zhulin An, Dantong Song, Wei Yi

Abstract: In this report, we describe the technical details of our submission to the EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023, by Team "AcieLee" (username: Yuqi\_Li). The task is to classify the audio caused by interactions between objects, or from events of the camera wearer. We conducted exhaustive experiments and found learning rate step decay, backbone frozen, label smoothing and f… ▽ More In this report, we describe the technical details of our submission to the EPIC-SOUNDS Audio-Based Interaction Recognition Challenge 2023, by Team "AcieLee" (username: Yuqi\_Li). The task is to classify the audio caused by interactions between objects, or from events of the camera wearer. We conducted exhaustive experiments and found learning rate step decay, backbone frozen, label smoothing and focal loss contribute most to the performance improvement. After training, we combined multiple models from different stages and integrated them into a single model by assigning fusion weights. This proposed method allowed us to achieve 3rd place in the CVPR 2023 workshop of EPIC-SOUNDS Audio-Based Interaction Recognition Challenge. △ Less

Submitted 15 June, 2023; originally announced June 2023.

arXiv:2306.01232 [pdf, other]

Deep Reinforcement Learning Framework for Thoracic Diseases Classification via Prior Knowledge Guidance

Authors: Weizhi Nie, Chen Zhang, Dan Song, Lina Zhao, Yunpeng Bai, Keliang Xie, Anan Liu

Abstract: The chest X-ray is often utilized for diagnosing common thoracic diseases. In recent years, many approaches have been proposed to handle the problem of automatic diagnosis based on chest X-rays. However, the scarcity of labeled data for related diseases still poses a huge challenge to an accurate diagnosis. In this paper, we focus on the thorax disease diagnostic problem and propose a novel deep r… ▽ More The chest X-ray is often utilized for diagnosing common thoracic diseases. In recent years, many approaches have been proposed to handle the problem of automatic diagnosis based on chest X-rays. However, the scarcity of labeled data for related diseases still poses a huge challenge to an accurate diagnosis. In this paper, we focus on the thorax disease diagnostic problem and propose a novel deep reinforcement learning framework, which introduces prior knowledge to direct the learning of diagnostic agents and the model parameters can also be continuously updated as the data increases, like a person's learning process. Especially, 1) prior knowledge can be learned from the pre-trained model based on old data or other domains' similar data, which can effectively reduce the dependence on target domain data, and 2) the framework of reinforcement learning can make the diagnostic agent as exploratory as a human being and improve the accuracy of diagnosis through continuous exploration. The method can also effectively solve the model learning problem in the case of few-shot data and improve the generalization ability of the model. Finally, our approach's performance was demonstrated using the well-known NIH ChestX-ray 14 and CheXpert datasets, and we achieved competitive results. The source code can be found here: \url{https://github.com/NeaseZ/MARL}. △ Less

Submitted 1 June, 2023; originally announced June 2023.

arXiv:2305.12072 [pdf, other]

Chest X-ray Image Classification: A Causal Perspective

Authors: Weizhi Nie, Chen Zhang, Dan Song, Lina Zhao, Yunpeng Bai, Keliang Xie, Anan Liu

Abstract: The chest X-ray (CXR) is one of the most common and easy-to-get medical tests used to diagnose common diseases of the chest. Recently, many deep learning-based methods have been proposed that are capable of effectively classifying CXRs. Even though these techniques have worked quite well, it is difficult to establish whether what these algorithms actually learn is the cause-and-effect link between… ▽ More The chest X-ray (CXR) is one of the most common and easy-to-get medical tests used to diagnose common diseases of the chest. Recently, many deep learning-based methods have been proposed that are capable of effectively classifying CXRs. Even though these techniques have worked quite well, it is difficult to establish whether what these algorithms actually learn is the cause-and-effect link between diseases and their causes or just how to map labels to photos.In this paper, we propose a causal approach to address the CXR classification problem, which constructs a structural causal model (SCM) and uses the backdoor adjustment to select effective visual information for CXR classification. Specially, we design different probability optimization functions to eliminate the influence of confounders on the learning of real causality. Experimental results demonstrate that our proposed method outperforms the open-source NIH ChestX-ray14 in terms of classification performance. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2305.12070 [pdf, other]

Instrumental Variable Learning for Chest X-ray Classification

Authors: Weizhi Nie, Chen Zhang, Dan song, Yunpeng Bai, Keliang Xie, Anan Liu

Abstract: The chest X-ray (CXR) is commonly employed to diagnose thoracic illnesses, but the challenge of achieving accurate automatic diagnosis through this method persists due to the complex relationship between pathology. In recent years, various deep learning-based approaches have been suggested to tackle this problem but confounding factors such as image resolution or noise problems often damage model… ▽ More The chest X-ray (CXR) is commonly employed to diagnose thoracic illnesses, but the challenge of achieving accurate automatic diagnosis through this method persists due to the complex relationship between pathology. In recent years, various deep learning-based approaches have been suggested to tackle this problem but confounding factors such as image resolution or noise problems often damage model performance. In this paper, we focus on the chest X-ray classification task and proposed an interpretable instrumental variable (IV) learning framework, to eliminate the spurious association and obtain accurate causal representation. Specifically, we first construct a structural causal model (SCM) for our task and learn the confounders and the preliminary representations of IV, we then leverage electronic health record (EHR) as auxiliary information and we fuse the above feature with our transformer-based semantic fusion module, so the IV has the medical semantic. Meanwhile, the reliability of IV is further guaranteed via the constraints of mutual information between related causal variables. Finally, our approach's performance is demonstrated using the MIMIC-CXR, NIH ChestX-ray 14, and CheXpert datasets, and we achieve competitive results. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2304.05340 [pdf, other]

Unified Multi-Modal Image Synthesis for Missing Modality Imputation

Authors: Yue Zhang, Chengtao Peng, Qiuli Wang, Dan Song, Kaiyan Li, S. Kevin Zhou

Abstract: Multi-modal medical images provide complementary soft-tissue characteristics that aid in the screening and diagnosis of diseases. However, limited scanning time, image corruption and various imaging protocols often result in incomplete multi-modal images, thus limiting the usage of multi-modal data for clinical purposes. To address this issue, in this paper, we propose a novel unified multi-modal… ▽ More Multi-modal medical images provide complementary soft-tissue characteristics that aid in the screening and diagnosis of diseases. However, limited scanning time, image corruption and various imaging protocols often result in incomplete multi-modal images, thus limiting the usage of multi-modal data for clinical purposes. To address this issue, in this paper, we propose a novel unified multi-modal image synthesis method for missing modality imputation. Our method overall takes a generative adversarial architecture, which aims to synthesize missing modalities from any combination of available ones with a single model. To this end, we specifically design a Commonality- and Discrepancy-Sensitive Encoder for the generator to exploit both modality-invariant and specific information contained in input modalities. The incorporation of both types of information facilitates the generation of images with consistent anatomy and realistic details of the desired distribution. Besides, we propose a Dynamic Feature Unification Module to integrate information from a varying number of available modalities, which enables the network to be robust to random missing modalities. The module performs both hard integration and soft integration, ensuring the effectiveness of feature combination while avoiding information loss. Verified on two public multi-modal magnetic resonance datasets, the proposed method is effective in handling various synthesis tasks and shows superior performance compared to previous methods. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: 10 pages, 9 figures

arXiv:2208.01643 [pdf, other]

CTooth+: A Large-scale Dental Cone Beam Computed Tomography Dataset and Benchmark for Tooth Volume Segmentation

Authors: Weiwei Cui, Yaqi Wang, Yilong Li, Dan Song, Xingyong Zuo, Jiaojiao Wang, Yifan Zhang, Huiyu Zhou, Bung san Chong, Liaoyuan Zeng, Qianni Zhang

Abstract: Accurate tooth volume segmentation is a prerequisite for computer-aided dental analysis. Deep learning-based tooth segmentation methods have achieved satisfying performances but require a large quantity of tooth data with ground truth. The dental data publicly available is limited meaning the existing methods can not be reproduced, evaluated and applied in clinical practice. In this paper, we esta… ▽ More Accurate tooth volume segmentation is a prerequisite for computer-aided dental analysis. Deep learning-based tooth segmentation methods have achieved satisfying performances but require a large quantity of tooth data with ground truth. The dental data publicly available is limited meaning the existing methods can not be reproduced, evaluated and applied in clinical practice. In this paper, we establish a 3D dental CBCT dataset CTooth+, with 22 fully annotated volumes and 146 unlabeled volumes. We further evaluate several state-of-the-art tooth volume segmentation strategies based on fully-supervised learning, semi-supervised learning and active learning, and define the performance principles. This work provides a new benchmark for the tooth volume segmentation task, and the experiment can serve as the baseline for future AI-based dental imaging research and clinical application development. △ Less

Submitted 2 August, 2022; originally announced August 2022.

arXiv:2203.11279 [pdf, ps, other]

doi 10.1145/3524499

EEG based Emotion Recognition: A Tutorial and Review

Authors: Xiang Li, Yazhou Zhang, Prayag Tiwari, Dawei Song, Bin Hu, Meihong Yang, Zhigang Zhao, Neeraj Kumar, Pekka Marttinen

Abstract: Emotion recognition technology through analyzing the EEG signal is currently an essential concept in Artificial Intelligence and holds great potential in emotional health care, human-computer interaction, multimedia content recommendation, etc. Though there have been several works devoted to reviewing EEG-based emotion recognition, the content of these reviews needs to be updated. In addition, tho… ▽ More Emotion recognition technology through analyzing the EEG signal is currently an essential concept in Artificial Intelligence and holds great potential in emotional health care, human-computer interaction, multimedia content recommendation, etc. Though there have been several works devoted to reviewing EEG-based emotion recognition, the content of these reviews needs to be updated. In addition, those works are either fragmented in content or only focus on specific techniques adopted in this area but neglect the holistic perspective of the entire technical routes. Hence, in this paper, we review from the perspective of researchers who try to take the first step on this topic. We review the recent representative works in the EEG-based emotion recognition research and provide a tutorial to guide the researchers to start from the beginning. The scientific basis of EEG-based emotion recognition in the psychological and physiological levels is introduced. Further, we categorize these reviewed works into different technical routes and illustrate the theoretical basis and the research motivation, which will help the readers better understand why those techniques are studied and employed. At last, existing challenges and future investigations are also discussed in this paper, which guides the researchers to decide potential future research directions. △ Less

Submitted 16 March, 2022; originally announced March 2022.

arXiv:2202.04878 [pdf, ps, other]

Space-Time Adaptive Processing Using Random Matrix Theory Under Limited Training Samples

Authors: Di Song, Shengyao Chen, Feng Xi, Zhong Liu

Abstract: Space-time adaptive processing (STAP) is one of the most effective approaches to suppressing ground clutters in airborne radar systems. It basically takes two forms, i.e., full-dimension STAP (FD-STAP) and reduced-dimension STAP (RD-STAP). When the numbers of clutter training samples are less than two times their respective system degrees-of-freedom (DOF), the performances of both FD-STAP and RD-S… ▽ More Space-time adaptive processing (STAP) is one of the most effective approaches to suppressing ground clutters in airborne radar systems. It basically takes two forms, i.e., full-dimension STAP (FD-STAP) and reduced-dimension STAP (RD-STAP). When the numbers of clutter training samples are less than two times their respective system degrees-of-freedom (DOF), the performances of both FD-STAP and RD-STAP degrade severely due to inaccurate clutter estimation. To enhance STAP performance under the limited training samples, this paper develops a STAP theory with random matrix theory (RMT). By minimizing the output clutter-plus-noise power, the estimate of the inversion of clutter plus noise covariance matrix (CNCM) can be obtained through optimally manipulating its eigenvalues, and thus producing the optimal STAP weight vector. Two STAP algorithms, FD-STAP using RMT (RMT-FD-STAP) and RD-STAP using RMT (RMT-RD-STAP), are proposed. It is found that both RMT-FD-STAP and RMT-RD-STAP greatly outperform other-related STAP algorithms when the numbers of training samples are larger than their respective clutter DOFs, which are much less than the corresponding system DOFs. Theoretical analyses and simulation demonstrate the effectiveness and the performance advantages of the proposed STAP algorithms. △ Less

Submitted 10 February, 2022; originally announced February 2022.

Comments: 24 pages, 5 figures

arXiv:2111.08756 [pdf, other]

Achieving Short-Blocklength RCU bound via CRC List Decoding of TCM with Probabilistic Sha**

Authors: Linfang Wang, Dan Song, Felipe Areces, Richard D. Wesel

Abstract: This paper applies probabilistic amplitude sha** (PAS) to a cyclic redundancy check (CRC) aided trellis coded modulation (TCM) to achieve the short-blocklength random coding union (RCU) bound. In the transmitter, the equally likely message bits are first encoded by distribution matcher to generate amplitude symbols with the desired distribution. The binary representations of the distribution mat… ▽ More This paper applies probabilistic amplitude sha** (PAS) to a cyclic redundancy check (CRC) aided trellis coded modulation (TCM) to achieve the short-blocklength random coding union (RCU) bound. In the transmitter, the equally likely message bits are first encoded by distribution matcher to generate amplitude symbols with the desired distribution. The binary representations of the distribution matcher outputs are then encoded by a CRC. Finally, the CRC-encoded bits are encoded and modulated by Ungerboeck's TCM scheme, which consists of a $\frac{k_0}{k_0+1}$ systematic tail-biting convolutional code and a map** function that maps coded bits to channel signals with capacity-achieving distribution. This paper proves that, for the proposed transmitter, the CRC bits have uniform distribution and that the channel signals have symmetric distribution. In the receiver, the serial list Viterbi decoding (S-LVD) is used to estimate the information bits. Simulation results show that, for the proposed CRC-TCM-PAS system with 87 input bits and 65-67 8-AM coded output symbols, the decoding performance under additive white Gaussian noise channel achieves the RCU bound with properly designed CRC and convolutional codes. △ Less

Submitted 16 November, 2021; originally announced November 2021.

arXiv:2109.12634 [pdf, other]

A Novel Hybrid Convolutional Neural Network for Accurate Organ Segmentation in 3D Head and Neck CT Images

Authors: Zijie Chen, Cheng Li, Junjun He, ** Ye, Di** Song, Shanshan Wang, Lixu Gu, Yu Qiao

Abstract: Radiation therapy (RT) is widely employed in the clinic for the treatment of head and neck (HaN) cancers. An essential step of RT planning is the accurate segmentation of various organs-at-risks (OARs) in HaN CT images. Nevertheless, segmenting OARs manually is time-consuming, tedious, and error-prone considering that typical HaN CT images contain tens to hundreds of slices. Automated segmentation… ▽ More Radiation therapy (RT) is widely employed in the clinic for the treatment of head and neck (HaN) cancers. An essential step of RT planning is the accurate segmentation of various organs-at-risks (OARs) in HaN CT images. Nevertheless, segmenting OARs manually is time-consuming, tedious, and error-prone considering that typical HaN CT images contain tens to hundreds of slices. Automated segmentation algorithms are urgently required. Recently, convolutional neural networks (CNNs) have been extensively investigated on this task. Particularly, 3D CNNs are frequently adopted to process 3D HaN CT images. There are two issues with naïve 3D CNNs. First, the depth resolution of 3D CT images is usually several times lower than the in-plane resolution. Direct employment of 3D CNNs without distinguishing this difference can lead to the extraction of distorted image features and influence the final segmentation performance. Second, a severe class imbalance problem exists, and large organs can be orders of times larger than small organs. It is difficult to simultaneously achieve accurate segmentation for all the organs. To address these issues, we propose a novel hybrid CNN that fuses 2D and 3D convolutions to combat the different spatial resolutions and extract effective edge and semantic features from 3D HaN CT images. To accommodate large and small organs, our final model, named OrganNet2.5D, consists of only two instead of the classic four downsampling operations, and hybrid dilated convolutions are introduced to maintain the respective field. Experiments on the MICCAI 2015 challenge dataset demonstrate that OrganNet2.5D achieves promising performance compared to state-of-the-art methods. △ Less

Submitted 26 September, 2021; originally announced September 2021.

Comments: 10 pages, 2 figures

arXiv:2109.12629 [pdf, ps, other]

Group Shift Pointwise Convolution for Volumetric Medical Image Segmentation

Authors: Junjun He, ** Ye, Cheng Li, Di** Song, Wanli Chen, Shanshan Wang, Lixu Gu, Yu Qiao

Abstract: Recent studies have witnessed the effectiveness of 3D convolutions on segmenting volumetric medical images. Compared with the 2D counterparts, 3D convolutions can capture the spatial context in three dimensions. Nevertheless, models employing 3D convolutions introduce more trainable parameters and are more computationally complex, which may lead easily to model overfitting especially for medical a… ▽ More Recent studies have witnessed the effectiveness of 3D convolutions on segmenting volumetric medical images. Compared with the 2D counterparts, 3D convolutions can capture the spatial context in three dimensions. Nevertheless, models employing 3D convolutions introduce more trainable parameters and are more computationally complex, which may lead easily to model overfitting especially for medical applications with limited available training data. This paper aims to improve the effectiveness and efficiency of 3D convolutions by introducing a novel Group Shift Pointwise Convolution (GSP-Conv). GSP-Conv simplifies 3D convolutions into pointwise ones with 1x1x1 kernels, which dramatically reduces the number of model parameters and FLOPs (e.g. 27x fewer than 3D convolutions with 3x3x3 kernels). Naïve pointwise convolutions with limited receptive fields cannot make full use of the spatial image context. To address this problem, we propose a parameter-free operation, Group Shift (GS), which shifts the feature maps along with different spatial directions in an elegant way. With GS, pointwise convolutions can access features from different spatial locations, and the limited receptive fields of pointwise convolutions can be compensated. We evaluate the proposed methods on two datasets, PROMISE12 and BraTS18. Results show that our method, with substantially decreased model complexity, achieves comparable or even better performance than models employing 3D convolutions. △ Less

Submitted 26 September, 2021; originally announced September 2021.

Comments: 10 pages, 2 figures

arXiv:2103.08357 [pdf, other]

Learning Frequency-aware Dynamic Network for Efficient Super-Resolution

Authors: Wenbin Xie, Dehua Song, Chang Xu, Chun**g Xu, Hui Zhang, Yunhe Wang

Abstract: Deep learning based methods, especially convolutional neural networks (CNNs) have been successfully applied in the field of single image super-resolution (SISR). To obtain better fidelity and visual quality, most of existing networks are of heavy design with massive computation. However, the computation resources of modern mobile devices are limited, which cannot easily support the expensive cost.… ▽ More Deep learning based methods, especially convolutional neural networks (CNNs) have been successfully applied in the field of single image super-resolution (SISR). To obtain better fidelity and visual quality, most of existing networks are of heavy design with massive computation. However, the computation resources of modern mobile devices are limited, which cannot easily support the expensive cost. To this end, this paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain. In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden. Since pixels or image patches belong to low-frequency areas contain relatively few textural details, this dynamic network will not affect the quality of resulting super-resolution images. In addition, we embed predictors into the proposed dynamic network to end-to-end fine-tune the handcrafted frequency-aware masks. Extensive experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures to obtain the better tradeoff between visual quality and computational complexity. For instance, we can reduce the FLOPs of SR models by approximate 50% while preserving state-of-the-art SISR performance. △ Less

Submitted 16 August, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

arXiv:2009.08891 [pdf, other]

AdderSR: Towards Energy Efficient Image Super-Resolution

Authors: Dehua Song, Yunhe Wang, Hanting Chen, Chang Xu, Chun**g Xu, Dacheng Tao

Abstract: This paper studies the single image super-resolution problem using adder neural networks (AdderNet). Compared with convolutional neural networks, AdderNet utilizing additions to calculate the output features thus avoid massive energy consumptions of conventional multiplications. However, it is very hard to directly inherit the existing success of AdderNet on large-scale image classification to the… ▽ More This paper studies the single image super-resolution problem using adder neural networks (AdderNet). Compared with convolutional neural networks, AdderNet utilizing additions to calculate the output features thus avoid massive energy consumptions of conventional multiplications. However, it is very hard to directly inherit the existing success of AdderNet on large-scale image classification to the image super-resolution task due to the different calculation paradigm. Specifically, the adder operation cannot easily learn the identity map**, which is essential for image processing tasks. In addition, the functionality of high-pass filters cannot be ensured by AdderNet. To this end, we thoroughly analyze the relationship between an adder operation and the identity map** and insert shortcuts to enhance the performance of SR models using adder networks. Then, we develop a learnable power activation for adjusting the feature distribution and refining details. Experiments conducted on several benchmark models and datasets demonstrate that, our image super-resolution models using AdderNet can achieve comparable performance and visual quality to that of their CNN baselines with an about 2$\times$ reduction on the energy consumption. △ Less

Submitted 4 May, 2021; v1 submitted 18 September, 2020; originally announced September 2020.

arXiv:2008.04481 [pdf, other]

Transformer with Bidirectional Decoder for Speech Recognition

Authors: Xi Chen, Songyang Zhang, Dandan Song, Peng Ouyang, Shouyi Yin

Abstract: Attention-based models have made tremendous progress on end-to-end automatic speech recognition(ASR) recently. However, the conventional transformer-based approaches usually generate the sequence results token by token from left to right, leaving the right-to-left contexts unexploited. In this work, we introduce a bidirectional speech transformer to utilize the different directional contexts simul… ▽ More Attention-based models have made tremendous progress on end-to-end automatic speech recognition(ASR) recently. However, the conventional transformer-based approaches usually generate the sequence results token by token from left to right, leaving the right-to-left contexts unexploited. In this work, we introduce a bidirectional speech transformer to utilize the different directional contexts simultaneously. Specifically, the outputs of our proposed transformer include a left-to-right target, and a right-to-left target. In inference stage, we use the introduced bidirectional beam search method, which can not only generate left-to-right candidates but also generate right-to-left candidates, and determine the best hypothesis by the score. To demonstrate our proposed speech transformer with a bidirectional decoder(STBD), we conduct extensive experiments on the AISHELL-1 dataset. The results of experiments show that STBD achieves a 3.6\% relative CER reduction(CERR) over the unidirectional speech transformer baseline. Besides, the strongest model in this paper called STBD-Big can achieve 6.64\% CER on the test set, without language model rescoring and any extra data augmentation strategies. △ Less

Submitted 10 August, 2020; originally announced August 2020.

Comments: Accepted by InterSpeech 2020

arXiv:1912.11585 [pdf, other]

THUEE system description for NIST 2019 SRE CTS Challenge

Authors: Yi Liu, Tianyu Liang, Can Xu, Xianwei Zhang, Xianhong Chen, Wei-Qiang Zhang, Liang He, Dandan song, Ruyun Li, Yangcheng Wu, Peng Ouyang, Shouyi Yin

Abstract: This paper describes the systems submitted by the department of electronic engineering, institute of microelectronics of Tsinghua university and TsingMicro Co. Ltd. (THUEE) to the NIST 2019 speaker recognition evaluation CTS challenge. Six subsystems, including etdnn/ams, ftdnn/as, eftdnn/ams, resnet, multitask and c-vector are developed in this evaluation. This paper describes the systems submitted by the department of electronic engineering, institute of microelectronics of Tsinghua university and TsingMicro Co. Ltd. (THUEE) to the NIST 2019 speaker recognition evaluation CTS challenge. Six subsystems, including etdnn/ams, ftdnn/as, eftdnn/ams, resnet, multitask and c-vector are developed in this evaluation. △ Less

Submitted 24 December, 2019; originally announced December 2019.

Comments: This is the system description of THUEE submitted to NIST SRE 2019

arXiv:1912.05124 [pdf, other]

Small-footprint Keyword Spotting with Graph Convolutional Network

Authors: Xi Chen, Shouyi Yin, Dandan Song, Peng Ouyang, Leibo Liu, Shaojun Wei

Abstract: Despite the recent successes of deep neural networks, it remains challenging to achieve high precision keyword spotting task (KWS) on resource-constrained devices. In this study, we propose a novel context-aware and compact architecture for keyword spotting task. Based on residual connection and bottleneck structure, we design a compact and efficient network for KWS task. To leverage the long rang… ▽ More Despite the recent successes of deep neural networks, it remains challenging to achieve high precision keyword spotting task (KWS) on resource-constrained devices. In this study, we propose a novel context-aware and compact architecture for keyword spotting task. Based on residual connection and bottleneck structure, we design a compact and efficient network for KWS task. To leverage the long range dependencies and global context of the convolutional feature maps, the graph convolutional network is introduced to encode the non-local relations. By evaluated on the Google Speech Command Dataset, the proposed method achieves state-of-the-art performance and outperforms the prior works by a large margin with lower computational cost. △ Less

Submitted 11 December, 2019; originally announced December 2019.

Comments: Accepted by the IEEE Automatic Speech Recognition and Understanding Workshop(ASRU 2019)

arXiv:1809.10875 [pdf, other]

Characterizing Audio Adversarial Examples Using Temporal Dependency

Authors: Zhuolin Yang, Bo Li, Pin-Yu Chen, Dawn Song

Abstract: Recent studies have highlighted adversarial examples as a ubiquitous threat to different neural network models and many downstream applications. Nonetheless, as unique data properties have inspired distinct and powerful learning principles, this paper aims to explore their potentials towards mitigating adversarial inputs. In particular, our results reveal the importance of using the temporal depen… ▽ More Recent studies have highlighted adversarial examples as a ubiquitous threat to different neural network models and many downstream applications. Nonetheless, as unique data properties have inspired distinct and powerful learning principles, this paper aims to explore their potentials towards mitigating adversarial inputs. In particular, our results reveal the importance of using the temporal dependency in audio data to gain discriminate power against adversarial examples. Tested on the automatic speech recognition (ASR) tasks and three recent audio adversarial attacks, we find that (i) input transformation developed from image adversarial defense provides limited robustness improvement and is subtle to advanced attacks; (ii) temporal dependency can be exploited to gain discriminative power against audio adversarial examples and is resistant to adaptive attacks considered in our experiments. Our results not only show promising means of improving the robustness of ASR systems, but also offer novel insights in exploiting domain-specific data properties to mitigate negative effects of adversarial examples. △ Less

Submitted 5 June, 2019; v1 submitted 28 September, 2018; originally announced September 2018.

Showing 1–22 of 22 results for author: song, D