-
Balancing Performance and Cost for Two-Hop Cooperative Communications: Stackelberg Game and Distributed Multi-Agent Reinforcement Learning
Authors:
Yuanzhe Geng,
Erwu Liu,
Wei Ni,
Rui Wang,
Yan Liu,
Hao Xu,
Chen Cai,
Abbas Jamalipour
Abstract:
This paper aims to balance performance and cost in a two-hop wireless cooperative communication network where the source and relays have contradictory optimization goals and make decisions in a distributed manner. This differs from most existing works that have typically assumed that source and relay nodes follow a schedule created implicitly by a central controller. We propose that the relays for…
▽ More
This paper aims to balance performance and cost in a two-hop wireless cooperative communication network where the source and relays have contradictory optimization goals and make decisions in a distributed manner. This differs from most existing works that have typically assumed that source and relay nodes follow a schedule created implicitly by a central controller. We propose that the relays form an alliance in an attempt to maximize the benefit of relaying while the source aims to increase the channel capacity cost-effectively. To this end, we establish the trade problem as a Stackelberg game, and prove the existence of its equilibrium. Another important aspect is that we use multi-agent reinforcement learning (MARL) to approach the equilibrium in a situation where the instantaneous channel state information (CSI) is unavailable, and the source and relays do not have knowledge of each other's goal. A multi-agent deep deterministic policy gradient-based framework is designed, where the relay alliance and the source act as agents. Experiments demonstrate that the proposed method can obtain an acceptable performance that is close to the game-theoretic equilibrium for all players under time-invariant environments, which considerably outperforms its potential alternatives and is only about 2.9% away from the optimal solution.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI
Authors:
Yirong Zhou,
Chengyan Wang,
Mengtian Lu,
Kunyuan Guo,
Zi Wang,
Dan Ruan,
Rui Guo,
Peijun Zhao,
Jianhua Wang,
Naiming Wu,
Jianzhong Lin,
Yinyin Chen,
Hang **,
Lianxin Xie,
Lilan Wu,
Liuhong Zhu,
Jianjun Zhou,
Congbo Cai,
He Wang,
Xiaobo Qu
Abstract:
In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features…
▽ More
In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features a T2-refine fusion decoder for quantitative analysis, leveraging global features from the Transformer, and a segmentation decoder with multiple local region supervision for enhanced accuracy. A tight coupling module aligns and fuses CNN and Transformer branch features, enabling SQNet to focus on myocardium regions. Evaluation on healthy controls (HC) and acute myocardial infarction patients (AMI) demonstrates superior segmentation dice scores (89.3/89.2) compared to state-of-the-art methods (87.7/87.9). T2 quantification yields strong linear correlations (Pearson coefficients: 0.84/0.93) with label values for HC/AMI, indicating accurate map**. Radiologist evaluations confirm SQNet's superior image quality scores (4.60/4.58 for segmentation, 4.32/4.42 for T2 quantification) over state-of-the-art methods (4.50/4.44 for segmentation, 3.59/4.37 for T2 quantification). SQNet thus offers accurate simultaneous segmentation and quantification, enhancing cardiac disease diagnosis, such as AMI.
△ Less
Submitted 29 May, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
Deep Separable Spatiotemporal Learning for Fast Dynamic Cardiac MRI
Authors:
Zi Wang,
Min Xiao,
Yirong Zhou,
Chengyan Wang,
Naiming Wu,
Yi Li,
Yiwen Gong,
Shufu Chang,
Yinyin Chen,
Liuhong Zhu,
Jianjun Zhou,
Congbo Cai,
He Wang,
Di Guo,
Guang Yang,
Xiaobo Qu
Abstract:
Dynamic magnetic resonance imaging (MRI) plays an indispensable role in cardiac diagnosis. To enable fast imaging, the k-space data can be undersampled but the image reconstruction poses a great challenge of high-dimensional processing. This challenge leads to necessitate extensive training data in many deep learning reconstruction methods. This work proposes a novel and efficient approach, levera…
▽ More
Dynamic magnetic resonance imaging (MRI) plays an indispensable role in cardiac diagnosis. To enable fast imaging, the k-space data can be undersampled but the image reconstruction poses a great challenge of high-dimensional processing. This challenge leads to necessitate extensive training data in many deep learning reconstruction methods. This work proposes a novel and efficient approach, leveraging a dimension-reduced separable learning scheme that excels even with highly limited training data. We further integrate it with spatiotemporal priors to develop a Deep Separable Spatiotemporal Learning network (DeepSSL), which unrolls an iteration process of a reconstruction model with both temporal low-rankness and spatial sparsity. Intermediate outputs are visualized to provide insights into the network's behavior and enhance its interpretability. Extensive results on cardiac cine datasets show that the proposed DeepSSL is superior to the state-of-the-art methods visually and quantitatively, while reducing the demand for training cases by up to 75%. And its preliminary adaptability to cardiac patients has been verified through experienced radiologists' and cardiologists' blind reader study. Additionally, DeepSSL also benefits for achieving the downstream task of cardiac segmentation with higher accuracy and shows robustness in prospective real-time cardiac MRI.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
Deep Learning Predicts Biomarker Status and Discovers Related Histomorphology Characteristics for Low-Grade Glioma
Authors:
Zijie Fang,
Yihan Liu,
Yifeng Wang,
Xiangyang Zhang,
Yang Chen,
Chang**g Cai,
Yiyang Lin,
Ying Han,
Zhi Wang,
Shan Zeng,
Hong Shen,
Jun Tan,
Yongbing Zhang
Abstract:
Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a…
▽ More
Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a Multi-Biomarker Histomorphology Discoverer (Multi-Beholder) model based on the multiple instance learning (MIL) framework, to predict the status of five biomarkers in LGG using only hematoxylin and eosin-stained whole slide images and slide-level biomarker status labels. Specifically, by incorporating the one-class classification into the MIL framework, accurate instance pseudo-labeling is realized for instance-level supervision, which greatly complements the slide-level labels and improves the biomarker prediction performance. Multi-Beholder demonstrates superior prediction performance and generalizability for five LGG biomarkers (AUROC=0.6469-0.9735) in two cohorts (n=607) with diverse races and scanning protocols. Moreover, the excellent interpretability of Multi-Beholder allows for discovering the quantitative and qualitative correlations between biomarker status and histomorphology characteristics. Our pipeline not only provides a novel approach for biomarker prediction, enhancing the applicability of molecular treatments for LGG patients but also facilitates the discovery of new mechanisms in molecular functionality and LGG progression.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.
-
Multi-Device Task-Oriented Communication via Maximal Coding Rate Reduction
Authors:
Chang Cai,
Xiaojun Yuan,
Ying-Jun Angela Zhang
Abstract:
In task-oriented communications, most existing work designed the physical-layer communication modules and learning based codecs with distinct objectives: learning is targeted at accurate execution of specific tasks, while communication aims at optimizing conventional communication metrics, such as throughput maximization, delay minimization, or bit error rate minimization. The inconsistency betwee…
▽ More
In task-oriented communications, most existing work designed the physical-layer communication modules and learning based codecs with distinct objectives: learning is targeted at accurate execution of specific tasks, while communication aims at optimizing conventional communication metrics, such as throughput maximization, delay minimization, or bit error rate minimization. The inconsistency between the design objectives may hinder the exploitation of the full benefits of task-oriented communications. In this paper, we consider a task-oriented multi-device edge inference system over a multiple-input multiple-output (MIMO) multiple-access channel, where the learning (i.e., feature encoding and classification) and communication (i.e., precoding) modules are designed with the same goal of inference accuracy maximization. Instead of end-to-end learning which involves both the task dataset and wireless channel during training, we advocate a separate design of learning and communication to achieve the consistent goal. Specifically, we leverage the maximal coding rate reduction (MCR2) objective as a surrogate to represent the inference accuracy, which allows us to explicitly formulate the precoding optimization problem. We cast valuable insights into this formulation and develop a block coordinate ascent (BCA) algorithm for efficient problem-solving. Moreover, the MCR2 objective serves the loss function for feature encoding and guides the classification design. Simulation results on the synthetic features explain the mechanism of MCR2 precoding at different SNRs. We also validate on the CIFAR-10 and ModelNet10 datasets that the proposed design achieves a better latency-accuracy tradeoff compared to various baselines. As such, our work paves the way for further exploration into the synergistic alignment of learning and communication objectives in task-oriented communication systems.
△ Less
Submitted 28 May, 2024; v1 submitted 6 September, 2023;
originally announced September 2023.
-
FlexDTI: Flexible diffusion gradient encoding scheme-based highly efficient diffusion tensor imaging using deep learning
Authors:
Zejun Wu,
Jiechao Wang,
Zunquan Chen,
Qinqin Yang,
Zhen Xing,
Dairong Cao,
Jianfeng Bao,
Taishan Kang,
Jianzhong Lin,
Shuhui Cai,
Zhong Chen,
Congbo Cai
Abstract:
Objective: Most deep neural network-based diffusion tensor imaging methods require the diffusion gradients' number and directions in the data to be reconstructed to match those in the training data. This work aims to develop and evaluate a novel dynamic-convolution-based method called FlexDTI for highly efficient diffusion tensor reconstruction with flexible diffusion encoding gradient scheme. App…
▽ More
Objective: Most deep neural network-based diffusion tensor imaging methods require the diffusion gradients' number and directions in the data to be reconstructed to match those in the training data. This work aims to develop and evaluate a novel dynamic-convolution-based method called FlexDTI for highly efficient diffusion tensor reconstruction with flexible diffusion encoding gradient scheme. Approach: FlexDTI was developed to achieve high-quality DTI parametric map** with flexible number and directions of diffusion encoding gradients. The method used dynamic convolution kernels to embed diffusion gradient direction information into feature maps of the corresponding diffusion signal. Furthermore, it realized the generalization of a flexible number of diffusion gradient directions by setting the maximum number of input channels of the network. The network was trained and tested using datasets from the Human Connectome Project and local hospitals. Results from FlexDTI and other advanced tensor parameter estimation methods were compared. Main results: Compared to other methods, FlexDTI successfully achieves high-quality diffusion tensor-derived parameters even if the number and directions of diffusion encoding gradients change. It reduces normalized root mean squared error (NRMSE) by about 50% on fractional anisotropy (FA) and 15% on mean diffusivity (MD), compared with the state-of-the-art deep learning method with flexible diffusion encoding gradient scheme. Significance: FlexDTI can well learn diffusion gradient direction information to achieve generalized DTI reconstruction with flexible diffusion gradient scheme. Both flexibility and reconstruction quality can be taken into account in this network.
△ Less
Submitted 21 December, 2023; v1 submitted 2 August, 2023;
originally announced August 2023.
-
One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction
Authors:
Zi Wang,
Xiaotong Yu,
Chengyan Wang,
Weibo Chen,
Jiazheng Wang,
Ying-Hua Chu,
Hongwei Sun,
Rushuai Li,
Peiyong Li,
Fan Yang,
Haiwei Han,
Taishan Kang,
Jianzhong Lin,
Chen Yang,
Shufu Chang,
Zhang Shi,
Sha Hua,
Yan Li,
Juan Hu,
Liuhong Zhu,
Jianjun Zhou,
Mei**g Lin,
Jiefeng Guo,
Congbo Cai,
Zhong Chen
, et al. (3 additional authors not shown)
Abstract:
Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep…
▽ More
Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep Learning (DL) has proven effective for fast MRI image reconstruction, its broader applicability across various imaging scenarios has been constrained. Challenges include the high cost and privacy restrictions associated with acquiring large-scale, diverse training data, coupled with the inherent difficulty of addressing mismatches between training and target data in existing DL methodologies. Here, we present a novel Physics-Informed Synthetic data learning framework for Fast MRI, called PISF. PISF marks a breakthrough by enabling generalized DL for multi-scenario MRI reconstruction through a single trained model. Our approach separates the reconstruction of a 2D image into many 1D basic problems, commencing with 1D data synthesis to facilitate generalization. We demonstrate that training DL models on synthetic data, coupled with enhanced learning techniques, yields in vivo MRI reconstructions comparable to or surpassing those of models trained on matched realistic datasets, reducing the reliance on real-world MRI data by up to 96%. Additionally, PISF exhibits remarkable generalizability across multiple vendors and imaging centers. Its adaptability to diverse patient populations has been validated through evaluations by ten experienced medical professionals. PISF presents a feasible and cost-effective way to significantly boost the widespread adoption of DL in various fast MRI applications.
△ Less
Submitted 28 February, 2024; v1 submitted 24 July, 2023;
originally announced July 2023.
-
Rate-Perception Optimized Preprocessing for Video Coding
Authors:
Chengqian Ma,
Zhiqiang Wu,
Chunlei Cai,
Pengwei Zhang,
Yi Wang,
Long Zheng,
Chao Chen,
Quan Zhou
Abstract:
In the past decades, lots of progress have been done in the video compression field including traditional video codec and learning-based video codec. However, few studies focus on using preprocessing techniques to improve the rate-distortion performance. In this paper, we propose a rate-perception optimized preprocessing (RPP) method. We first introduce an adaptive Discrete Cosine Transform loss f…
▽ More
In the past decades, lots of progress have been done in the video compression field including traditional video codec and learning-based video codec. However, few studies focus on using preprocessing techniques to improve the rate-distortion performance. In this paper, we propose a rate-perception optimized preprocessing (RPP) method. We first introduce an adaptive Discrete Cosine Transform loss function which can save the bitrate and keep essential high frequency components as well. Furthermore, we also combine several state-of-the-art techniques from low-level vision fields into our approach, such as the high-order degradation model, efficient lightweight network design, and Image Quality Assessment model. By jointly using these powerful techniques, our RPP approach can achieve on average, 16.27% bitrate saving with different video encoders like AVC, HEVC, and VVC under multiple quality metrics. In the deployment stage, our RPP method is very simple and efficient which is not required any changes in the setting of video encoding, streaming, and decoding. Each input frame only needs to make a single pass through RPP before sending into video encoders. In addition, in our subjective visual quality test, 87% of users think videos with RPP are better or equal to videos by only using the codec to compress, while these videos with RPP save about 12% bitrate on average. Our RPP framework has been integrated into the production environment of our video transcoding services which serve millions of users every day.
△ Less
Submitted 25 January, 2023;
originally announced January 2023.
-
High-efficient Bloch simulation of magnetic resonance imaging sequences based on deep learning
Authors:
Haitao Huang,
Qinqin Yang,
Jiechao Wang,
Pujie Zhang,
Shuhui Cai,
Congbo Cai
Abstract:
Objective: Bloch simulation constitutes an essential part of magnetic resonance imaging (MRI) development. However, even with the graphics processing unit (GPU) acceleration, the heavy computational load remains a major challenge, especially in large-scale, high-accuracy simulation scenarios. This work aims to develop a deep learning-based simulator to accelerate Bloch simulation. Approach: The si…
▽ More
Objective: Bloch simulation constitutes an essential part of magnetic resonance imaging (MRI) development. However, even with the graphics processing unit (GPU) acceleration, the heavy computational load remains a major challenge, especially in large-scale, high-accuracy simulation scenarios. This work aims to develop a deep learning-based simulator to accelerate Bloch simulation. Approach: The simulator model, called Simu-Net, is based on an end-to-end convolutional neural network and is trained with synthetic data generated by traditional Bloch simulation. It uses dynamic convolution to fuse spatial and physical information with different dimensions and introduces position encoding templates to achieve position-specific labeling and overcome the receptive field limitation of the convolutional network. Main Results: Compared with mainstream GPU-based MRI simulation software, Simu-Net successfully accelerates simulations by hundreds of times in both traditional and advanced MRI pulse sequences. The accuracy and robustness of the proposed framework were verified qualitatively and quantitatively. Besides, the trained Simu-Net was applied to generate sufficient customized training samples for deep learning-based T2 map** and comparable results to conventional methods were obtained in the human brain. Significance: As a proof-of-concept work, Simu-Net shows the potential to apply deep learning for rapidly approximating the forward physical process of MRI and may increase the efficiency of Bloch simulation for optimization of MRI pulse sequences and deep learning-based methods.
△ Less
Submitted 15 March, 2023; v1 submitted 19 October, 2022;
originally announced October 2022.
-
Quality-Constant Per-Shot Encoding by Two-Pass Learning-based Rate Factor Prediction
Authors:
Chunlei Cai,
Yi Wang,
Xiaobo Li,
Tianxiao Ye
Abstract:
Providing quality-constant streams can simultaneously guarantee user experience and prevent wasting bit-rate. In this paper, we propose a novel deep learning based two-pass encoder parameter prediction framework to decide rate factor (RF), with which encoder can output streams with constant quality. For each one-shot segment in a video, the proposed method firstly extracts spatial, temporal and pr…
▽ More
Providing quality-constant streams can simultaneously guarantee user experience and prevent wasting bit-rate. In this paper, we propose a novel deep learning based two-pass encoder parameter prediction framework to decide rate factor (RF), with which encoder can output streams with constant quality. For each one-shot segment in a video, the proposed method firstly extracts spatial, temporal and pre-coding features by an ultra fast pre-process. Based on these features, a RF parameter is predicted by a deep neural network. Video encoder uses the RF to compress segment as the first encoding pass. Then VMAF quality of the first pass encoding is measured. If the quality doesn't meet target, a second pass RF prediction and encoding will be performed. With the help of first pass predicted RF and corresponding actual quality as feedback, the second pass prediction will be highly accurate. Experiments show the proposed method requires only 1.55 times encoding complexity on average, meanwhile the accuracy, that the compressed video's actual VMAF is within $\pm1$ around the target VMAF, reaches 98.88%.
△ Less
Submitted 23 August, 2022;
originally announced August 2022.
-
Bullwhip Effect of Supply Networks: Joint Impact of Network Structure and Market Demand
Authors:
**-Zhu Yü,
Chencheng Cai,
Jianxi Gao
Abstract:
The progressive amplification of fluctuations in demand as the demand travels upstream the supply chains is known as the bullwhip effect. We first analytically characterize the bullwhip effect in general supply chain networks in two cases: (i) all suppliers have a unique layer position, where our method is founded on the control-theoretic approach, and (ii) not all suppliers have a unique layer po…
▽ More
The progressive amplification of fluctuations in demand as the demand travels upstream the supply chains is known as the bullwhip effect. We first analytically characterize the bullwhip effect in general supply chain networks in two cases: (i) all suppliers have a unique layer position, where our method is founded on the control-theoretic approach, and (ii) not all suppliers have a unique layer position due to the presence of intra-layer links or inter-layer links between suppliers that are not positioned in consecutive layers, where we use both the absorbing Markov chain and the control-theoretic approach. We then investigate how network structures impact the BWE of supply chain networks. In particular, we analytically show that (i) if the market demand is generated from the same stationary process, the structure of supply networks does not affect the layer-wise bullwhip effect of supply networks, and (ii) if the market demand is generated from different stationary or non-stationary market processes, wider supply networks lead to a lower level of layer-wise bullwhip effect. Finally, numerical simulations are used to validate our propositions.
△ Less
Submitted 8 August, 2022;
originally announced August 2022.
-
ADFF: Attention Based Deep Feature Fusion Approach for Music Emotion Recognition
Authors:
Zi Huang,
Shulei Ji,
Zhilan Hu,
Chuangjian Cai,
**g Luo,
Xinyu Yang
Abstract:
Music emotion recognition (MER), a sub-task of music information retrieval (MIR), has developed rapidly in recent years. However, the learning of affect-salient features remains a challenge. In this paper, we propose an end-to-end attention-based deep feature fusion (ADFF) approach for MER. Only taking log Mel-spectrogram as input, this method uses adapted VGGNet as spatial feature learning module…
▽ More
Music emotion recognition (MER), a sub-task of music information retrieval (MIR), has developed rapidly in recent years. However, the learning of affect-salient features remains a challenge. In this paper, we propose an end-to-end attention-based deep feature fusion (ADFF) approach for MER. Only taking log Mel-spectrogram as input, this method uses adapted VGGNet as spatial feature learning module (SFLM) to obtain spatial features across different levels. Then, these features are fed into squeeze-and-excitation (SE) attention-based temporal feature learning module (TFLM) to get multi-level emotion-related spatial-temporal features (ESTFs), which can discriminate emotions well in the final emotion space. In addition, a novel data processing is devised to cut the single-channel input into multi-channel to improve calculative efficiency while ensuring the quality of MER. Experiments show that our proposed method achieves 10.43% and 4.82% relative improvement of valence and arousal respectively on the R2 score compared to the state-of-the-art model, meanwhile, performs better on datasets with distinct scales and in multi-task learning.
△ Less
Submitted 30 June, 2022; v1 submitted 12 April, 2022;
originally announced April 2022.
-
EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion Recognition
Authors:
Haiyang Sun,
Zheng Lian,
Bin Liu,
Ying Li,
Licai Sun,
Cong Cai,
Jianhua Tao,
Meng Wang,
Yuan Cheng
Abstract:
Speech emotion recognition (SER) is an important research topic in human-computer interaction. Existing works mainly rely on human expertise to design models. Despite their success, different datasets often require distinct structures and hyperparameters. Searching for an optimal model for each dataset is time-consuming and labor-intensive. To address this problem, we propose a two-stream neural a…
▽ More
Speech emotion recognition (SER) is an important research topic in human-computer interaction. Existing works mainly rely on human expertise to design models. Despite their success, different datasets often require distinct structures and hyperparameters. Searching for an optimal model for each dataset is time-consuming and labor-intensive. To address this problem, we propose a two-stream neural architecture search (NAS) based framework, called \enquote{EmotionNAS}. Specifically, we take two-stream features (i.e., handcrafted and deep features) as the inputs, followed by NAS to search for the optimal structure for each stream. Furthermore, we incorporate complementary information in different streams through an efficient information supplement module. Experimental results demonstrate that our method outperforms existing manually-designed and NAS-based models, setting the new state-of-the-art record.
△ Less
Submitted 9 June, 2023; v1 submitted 25 March, 2022;
originally announced March 2022.
-
Physics-driven Synthetic Data Learning for Biomedical Magnetic Resonance
Authors:
Qinqin Yang,
Zi Wang,
Kunyuan Guo,
Congbo Cai,
Xiaobo Qu
Abstract:
Deep learning has innovated the field of computational imaging. One of its bottlenecks is unavailable or insufficient training data. This article reviews an emerging paradigm, imaging physics-based data synthesis (IPADS), that can provide huge training data in biomedical magnetic resonance without or with few real data. Following the physical law of magnetic resonance, IPADS generates signals from…
▽ More
Deep learning has innovated the field of computational imaging. One of its bottlenecks is unavailable or insufficient training data. This article reviews an emerging paradigm, imaging physics-based data synthesis (IPADS), that can provide huge training data in biomedical magnetic resonance without or with few real data. Following the physical law of magnetic resonance, IPADS generates signals from differential equations or analytical solution models, making the learning more scalable, explainable, and better protecting privacy. Key components of IPADS learning, including signal generation models, basic deep learning network structures, enhanced data generation, and learning methods are discussed. Great potentials of IPADS have been demonstrated by representative applications in fast imaging, ultrafast signal reconstruction and accurate parameter quantification. Finally, open questions and future work have been discussed.
△ Less
Submitted 21 May, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.
-
RIS Partitioning Based Scalable Beamforming Design for Large-Scale MIMO: Asymptotic Analysis and Optimization
Authors:
Chang Cai,
Xiaojun Yuan,
Ying-Jun Angela Zhang
Abstract:
In next-generation wireless networks, reconfigurable intelligent surface (RIS)-assisted multiple-input multiple-output (MIMO) systems are foreseeable to support a large number of antennas at the transceiver as well as a large number of reflecting elements at the RIS. To fully unleash the potential of RIS, the phase shifts of RIS elements should be carefully designed, resulting in a high-dimensiona…
▽ More
In next-generation wireless networks, reconfigurable intelligent surface (RIS)-assisted multiple-input multiple-output (MIMO) systems are foreseeable to support a large number of antennas at the transceiver as well as a large number of reflecting elements at the RIS. To fully unleash the potential of RIS, the phase shifts of RIS elements should be carefully designed, resulting in a high-dimensional non-convex optimization problem that is hard to solve. In this paper, we address this scalability issue by partitioning RIS into sub-surfaces, so as to optimize the phase shifts in sub-surface levels to reduce complexity. Specifically, each subsurface employs a linear phase variation structure to anomalously reflect the incident signal to a desired direction, and the sizes of sub-surfaces can be adaptively adjusted according to channel conditions. We formulate the achievable rate maximization problem by jointly optimizing the transmit covariance matrix and the RIS phase shifts. Under the RIS partitioning framework, the RIS phase shifts optimization reduces to the manipulation of the sub-surface sizes, the phase gradients of sub-surfaces, and the common phase shifts of sub-surfaces. Then, we characterize the asymptotic behavior of the system with an infinitely large number of transceiver antennas and RIS elements. The asymptotic analysis provides useful insights on the understanding of the fundamental performance-complexity tradeoff in RIS partitioning design. We show that in the asymptotic domain, the achievable rate maximization problem has a rather simple form. We develop an efficient algorithm to find an approximate optimal solution via a 1D grid search. By applying the asymptotic result to a finite-size system with necessary modifications, we show by numerical results that the proposed design achieves a favorable tradeoff between system performance and computational complexity.
△ Less
Submitted 21 January, 2023; v1 submitted 15 March, 2022;
originally announced March 2022.
-
MoRe-Fi: Motion-robust and Fine-grained Respiration Monitoring via Deep-Learning UWB Radar
Authors:
Tianyue Zheng,
Zhe Chen,
Shujie Zhang,
Chao Cai,
Jun Luo
Abstract:
Crucial for healthcare and biomedical applications, respiration monitoring often employs wearable sensors in practice, causing inconvenience due to their direct contact with human bodies. Therefore, researchers have been constantly searching for contact-free alternatives. Nonetheless, existing contact-free designs mostly require human subjects to remain static, largely confining their adoptions in…
▽ More
Crucial for healthcare and biomedical applications, respiration monitoring often employs wearable sensors in practice, causing inconvenience due to their direct contact with human bodies. Therefore, researchers have been constantly searching for contact-free alternatives. Nonetheless, existing contact-free designs mostly require human subjects to remain static, largely confining their adoptions in everyday environments where body movements are inevitable. Fortunately, radio-frequency (RF) enabled contact-free sensing, though suffering motion interference inseparable by conventional filtering, may offer a potential to distill respiratory waveform with the help of deep learning. To realize this potential, we introduce MoRe-Fi to conduct fine-grained respiration monitoring under body movements. MoRe-Fi leverages an IR-UWB radar to achieve contact-free sensing, and it fully exploits the complex radar signal for data augmentation. The core of MoRe-Fi is a novel variational encoder-decoder network; it aims to single out the respiratory waveforms that are modulated by body movements in a non-linear manner. Our experiments with 12 subjects and 66-hour data demonstrate that MoRe-Fi accurately recovers respiratory waveform despite the interference caused by body movements. We also discuss potential applications of MoRe-Fi for pulmonary disease diagnoses.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Efficient Hierarchical Bayesian Inference for Spatio-temporal Regression Models in Neuroimaging
Authors:
Ali Hashemi,
Yi**g Gao,
Chang Cai,
Sanjay Ghosh,
Klaus-Robert Müller,
Srikantan S. Nagarajan,
Stefan Haufe
Abstract:
Several problems in neuroimaging and beyond require inference on the parameters of multi-task sparse hierarchical regression models. Examples include M/EEG inverse problems, neural encoding models for task-based fMRI analyses, and climate science. In these domains, both the model parameters to be inferred and the measurement noise may exhibit a complex spatio-temporal structure. Existing work eith…
▽ More
Several problems in neuroimaging and beyond require inference on the parameters of multi-task sparse hierarchical regression models. Examples include M/EEG inverse problems, neural encoding models for task-based fMRI analyses, and climate science. In these domains, both the model parameters to be inferred and the measurement noise may exhibit a complex spatio-temporal structure. Existing work either neglects the temporal structure or leads to computationally demanding inference schemes. Overcoming these limitations, we devise a novel flexible hierarchical Bayesian framework within which the spatio-temporal dynamics of model parameters and noise are modeled to have Kronecker product covariance structure. Inference in our framework is based on majorization-minimization optimization and has guaranteed convergence properties. Our highly efficient algorithms exploit the intrinsic Riemannian geometry of temporal autocovariance matrices. For stationary dynamics described by Toeplitz matrices, the theory of circulant embeddings is employed. We prove convex bounding properties and derive update rules of the resulting algorithms. On both synthetic and real neural data from M/EEG, we demonstrate that our methods lead to improved performance.
△ Less
Submitted 23 November, 2021; v1 submitted 2 November, 2021;
originally announced November 2021.
-
V2iFi: in-Vehicle Vital Sign Monitoring via Compact RF Sensing
Authors:
Tianyue Zheng,
Zhe Chen,
Chao Cai,
Jun Luo,
Xu Zhang
Abstract:
Given the significant amount of time people spend in vehicles, health issues under driving condition have become a major concern. Such issues may vary from fatigue, asthma, stroke, to even heart attack, yet they can be adequately indicated by vital signs and abnormal activities. Therefore, in-vehicle vital sign monitoring can help us predict and hence prevent these issues. Whereas existing sensor-…
▽ More
Given the significant amount of time people spend in vehicles, health issues under driving condition have become a major concern. Such issues may vary from fatigue, asthma, stroke, to even heart attack, yet they can be adequately indicated by vital signs and abnormal activities. Therefore, in-vehicle vital sign monitoring can help us predict and hence prevent these issues. Whereas existing sensor-based (including camera) methods could be used to detect these indicators, privacy concern and system complexity both call for a convenient yet effective and robust alternative. This paper aims to develop V2iFi, an intelligent system performing monitoring tasks using a COTS impulse radio mounted on the windshield. V2iFi is capable of reliably detecting driver's vital signs under driving condition and with the presence of passengers, thus allowing for potentially inferring corresponding health issues. Compared with prior work based on Wi-Fi CSI, V2iFi is able to distinguish reflected signals from multiple users, and hence provide finer-grained measurements under more realistic settings. We evaluate V2iFi both in lab environments and during real-life road tests; the results demonstrate that respiratory rate, heart rate, and heart rate variability can all be estimated accurately. Based on these estimation results, we further discuss how machine learning models can be applied on top of V2iFi so as to improve both physiological and psychological wellbeing in driving environments.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
RF-Based Human Activity Recognition Using Signal Adapted Convolutional Neural Network
Authors:
Zhe Chen,
Chao Cai,
Tianyue Zheng,
Jun Luo,
Jie Xiong,
Xin Wang
Abstract:
Human Activity Recognition (HAR) plays a critical role in a wide range of real-world applications, and it is traditionally achieved via wearable sensing. Recently, to avoid the burden and discomfort caused by wearable devices, device-free approaches exploiting RF signals arise as a promising alternative for HAR. Most of the latest device-free approaches require training a large deep neural network…
▽ More
Human Activity Recognition (HAR) plays a critical role in a wide range of real-world applications, and it is traditionally achieved via wearable sensing. Recently, to avoid the burden and discomfort caused by wearable devices, device-free approaches exploiting RF signals arise as a promising alternative for HAR. Most of the latest device-free approaches require training a large deep neural network model in either time or frequency domain, entailing extensive storage to contain the model and intensive computations to infer activities. Consequently, even with some major advances on device-free HAR, current device-free approaches are still far from practical in real-world scenarios where the computation and storage resources possessed by, for example, edge devices, are limited. Therefore, we introduce HAR-SAnet which is a novel RF-based HAR framework. It adopts an original signal adapted convolutional neural network architecture: instead of feeding the handcraft features of RF signals into a classifier, HAR-SAnet fuses them adaptively from both time and frequency domains to design an end-to-end neural network model. We apply point-wise grouped convolution and depth-wise separable convolutions to confine the model scale and to speed up the inference execution time. The experiment results show that the recognition accuracy of HAR-SAnet outperforms state-of-the-art algorithms and systems.
△ Less
Submitted 27 October, 2021; v1 submitted 27 October, 2021;
originally announced October 2021.
-
Explaining the Attention Mechanism of End-to-End Speech Recognition Using Decision Trees
Authors:
Yuanchao Wang,
Wenji Du,
Chenghao Cai,
Yanyan Xu
Abstract:
The attention mechanism has largely improved the performance of end-to-end speech recognition systems. However, the underlying behaviours of attention is not yet clearer. In this study, we use decision trees to explain how the attention mechanism impact itself in speech recognition. The results indicate that attention levels are largely impacted by their previous states rather than the encoder and…
▽ More
The attention mechanism has largely improved the performance of end-to-end speech recognition systems. However, the underlying behaviours of attention is not yet clearer. In this study, we use decision trees to explain how the attention mechanism impact itself in speech recognition. The results indicate that attention levels are largely impacted by their previous states rather than the encoder and decoder patterns. Additionally, the default attention mechanism seems to put more weights on closer states, but behaves poorly on modelling long-term dependencies of attention states.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures
Authors:
Dengfeng Ke,
Yuxing Lu,
Xudong Liu,
Yanyan Xu,
**g Sun,
Cheng-Hao Cai
Abstract:
With the rapid development of neural network architectures and speech processing models, singing voice synthesis with neural networks is becoming the cutting-edge technique of digital music production. In this work, in order to explore how to improve the quality and efficiency of singing voice synthesis, in this work, we use encoder-decoder neural models and a number of vocoders to achieve singing…
▽ More
With the rapid development of neural network architectures and speech processing models, singing voice synthesis with neural networks is becoming the cutting-edge technique of digital music production. In this work, in order to explore how to improve the quality and efficiency of singing voice synthesis, in this work, we use encoder-decoder neural models and a number of vocoders to achieve singing voice synthesis. We conduct experiments to demonstrate that the models can be trained using voice data with pitch information, lyrics and beat information, and the trained models can produce smooth, clear and natural singing voice that is close to real human voice. As the models work in the end-to-end manner, they allow users who are not domain experts to directly produce singing voice by arranging pitches, lyrics and beats.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
Model-based Synthetic Data-driven Learning (MOST-DL): Application in Single-shot T2 Map** with Severe Head Motion Using Overlap**-echo Acquisition
Authors:
Qinqin Yang,
Yanhong Lin,
Jiechao Wang,
Jianfeng Bao,
Xiaoyin Wang,
Lingceng Ma,
Zihan Zhou,
Qizhi Yang,
Shuhui Cai,
Hongjian He,
Congbo Cai,
Jiyang Dong,
**gliang Cheng,
Zhong Chen,
Jianhui Zhong
Abstract:
Use of synthetic data has provided a potential solution for addressing unavailable or insufficient training samples in deep learning-based magnetic resonance imaging (MRI). However, the challenge brought by domain gap between synthetic and real data is usually encountered, especially under complex experimental conditions. In this study, by combining Bloch simulation and general MRI models, we prop…
▽ More
Use of synthetic data has provided a potential solution for addressing unavailable or insufficient training samples in deep learning-based magnetic resonance imaging (MRI). However, the challenge brought by domain gap between synthetic and real data is usually encountered, especially under complex experimental conditions. In this study, by combining Bloch simulation and general MRI models, we propose a framework for addressing the lack of training data in supervised learning scenarios, termed MOST-DL. A challenging application is demonstrated to verify the proposed framework and achieve motion-robust T2 map** using single-shot overlap**-echo acquisition. We decompose the process into two main steps: (1) calibrationless parallel reconstruction for ultra-fast pulse sequence and (2) intra-shot motion correction for T2 map**. To bridge the domain gap, realistic textures from a public database and various imperfection simulations were explored. The neural network was first trained with pure synthetic data and then evaluated with in vivo human brain. Both simulation and in vivo experiments show that the MOST-DL method significantly reduces ghosting and motion artifacts in T2 maps in the presence of unpredictable subject movement and has the potential to be applied to motion-prone patients in the clinic.
△ Less
Submitted 29 May, 2022; v1 submitted 30 July, 2021;
originally announced July 2021.
-
Electromagnetic Source Imaging via a Data-Synthesis-Based Convolutional Encoder-Decoder Network
Authors:
Gexin Huang,
Jiawen Liang,
Ke Liu,
Chang Cai,
ZhengHui Gu,
Feifei Qi,
Yuan Qing Li,
Zhu Liang Yu,
Wei Wu
Abstract:
Electromagnetic source imaging (ESI) requires solving a highly ill-posed inverse problem. To seek a unique solution, traditional ESI methods impose various forms of priors that may not accurately reflect the actual source properties, which may hinder their broad applications. To overcome this limitation, in this paper a novel data-synthesized spatio-temporally convolutional encoder-decoder network…
▽ More
Electromagnetic source imaging (ESI) requires solving a highly ill-posed inverse problem. To seek a unique solution, traditional ESI methods impose various forms of priors that may not accurately reflect the actual source properties, which may hinder their broad applications. To overcome this limitation, in this paper a novel data-synthesized spatio-temporally convolutional encoder-decoder network method termed DST-CedNet is proposed for ESI. DST-CedNet recasts ESI as a machine learning problem, where discriminative learning and latent-space representations are integrated in a convolutional encoder-decoder network (CedNet) to learn a robust map** from the measured electroencephalography/magnetoencephalography (E/MEG) signals to the brain activity. In particular, by incorporating prior knowledge regarding dynamical brain activities, a novel data synthesis strategy is devised to generate large-scale samples for effectively training CedNet. This stands in contrast to traditional ESI methods where the prior information is often enforced via constraints primarily aimed for mathematical convenience. Extensive numerical experiments as well as analysis of a real MEG and Epilepsy EEG dataset demonstrate that DST-CedNet outperforms several state-of-the-art ESI methods in robustly estimating source signals under a variety of source configurations.
△ Less
Submitted 13 July, 2022; v1 submitted 24 October, 2020;
originally announced October 2020.
-
AI Song Contest: Human-AI Co-Creation in Songwriting
Authors:
Cheng-Zhi Anna Huang,
Hendrik Vincent Koops,
Ed Newton-Rex,
Monica Dinculescu,
Carrie J. Cai
Abstract:
Machine learning is challenging the way we make music. Although research in deep generative models has dramatically improved the capability and fluency of music models, recent work has shown that it can be challenging for humans to partner with this new class of algorithms. In this paper, we present findings on what 13 musician/developer teams, a total of 61 users, needed when co-creating a song w…
▽ More
Machine learning is challenging the way we make music. Although research in deep generative models has dramatically improved the capability and fluency of music models, recent work has shown that it can be challenging for humans to partner with this new class of algorithms. In this paper, we present findings on what 13 musician/developer teams, a total of 61 users, needed when co-creating a song with AI, the challenges they faced, and how they leveraged and repurposed existing characteristics of AI to overcome some of these challenges. Many teams adopted modular approaches, such as independently running multiple smaller models that align with the musical building blocks of a song, before re-combining their results. As ML models are not easily steerable, teams also generated massive numbers of samples and curated them post-hoc, or used a range of strategies to direct the generation, or algorithmically ranked the samples. Ultimately, teams not only had to manage the "flare and focus" aspects of the creative process, but also juggle them with a parallel process of exploring and curating multiple ML models and outputs. These findings reflect a need to design machine learning-powered music interfaces that are more decomposable, steerable, interpretable, and adaptive, which in return will enable artists to more effectively explore how AI can extend their personal expression.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
Two-Timescale Optimization for Intelligent Reflecting Surface Aided D2D Underlay Communication
Authors:
Chang Cai,
Huiyuan Yang,
Xiaojun Yuan,
Ying-Chang Liang
Abstract:
The performance of a device-to-device (D2D) underlay communication system is limited by the co-channel interference between cellular users (CUs) and D2D devices. To address this challenge, an intelligent reflecting surface (IRS) aided D2D underlay system is studied in this paper. A two-timescale optimization scheme is proposed to reduce the required channel training and feedback overhead, where tr…
▽ More
The performance of a device-to-device (D2D) underlay communication system is limited by the co-channel interference between cellular users (CUs) and D2D devices. To address this challenge, an intelligent reflecting surface (IRS) aided D2D underlay system is studied in this paper. A two-timescale optimization scheme is proposed to reduce the required channel training and feedback overhead, where transmit beamforming at the base station (BS) and power control at the D2D transmitter are adapted to instantaneous effective channel state information (CSI); and the IRS phase shifts are adapted to slow-varying channel mean. Based on the two-timescale optimization scheme, we aim to maximize the D2D ergodic rate subject to a given outage probability constrained signal-to-interference-plus-noise ratio (SINR) target for the CU. The two-timescale problem is decoupled into two sub-problems, and the two sub-problems are solved iteratively with closed-form expressions. Numerical results verify that the two-timescale based optimization performs better than several baselines, and also demonstrate a favorable trade-off between system performance and CSI overhead.
△ Less
Submitted 2 June, 2020;
originally announced June 2020.
-
The optimal sequence for reset controllers
Authors:
Chengwai Cai,
Ali Ahmadi Dastjerdi,
Niranjan Saikumar,
S. H. HosseinNia
Abstract:
PID controllers cannot satisfy the high performance requirements since they are restricted by the water-bed effect. Thus, the need for a better alternative to linear PID controllers increases due to the rising demands of the high-tech industry. This has led many researchers to explore nonlinear controllers like reset control. Although reset controllers have been widely used to overcome the limitat…
▽ More
PID controllers cannot satisfy the high performance requirements since they are restricted by the water-bed effect. Thus, the need for a better alternative to linear PID controllers increases due to the rising demands of the high-tech industry. This has led many researchers to explore nonlinear controllers like reset control. Although reset controllers have been widely used to overcome the limitations of linear controllers in literature, the performance of the system varies depending on the relative sequence of controller linear and nonlinear parts. In this paper, the optimal sequence is found using high order sinusoidal input describing functions (HOSIDF). By arranging controller parts according to this strategy, better performance in the sense of precision and control input is achieved. The performance of the proposed sequence is validated on a precision positioning setup. The experimental results demonstrate that the optimal sequence found in theory outperforms other sequences.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.
-
Content Adaptive and Error Propagation Aware Deep Video Compression
Authors:
Guo Lu,
Chunlei Cai,
Xiaoyun Zhang,
Li Chen,
Wanli Ouyang,
Dong Xu,
Zhiyong Gao
Abstract:
Recently, learning based video compression methods attract increasing attention. However, the previous works suffer from error propagation due to the accumulation of reconstructed error in inter predictive coding. Meanwhile, the previous learning based video codecs are also not adaptive to different video contents. To address these two problems, we propose a content adaptive and error propagation…
▽ More
Recently, learning based video compression methods attract increasing attention. However, the previous works suffer from error propagation due to the accumulation of reconstructed error in inter predictive coding. Meanwhile, the previous learning based video codecs are also not adaptive to different video contents. To address these two problems, we propose a content adaptive and error propagation aware video compression system. Specifically, our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame. Based on the learned long-term temporal information, our approach effectively alleviates error propagation in reconstructed frames. More importantly, instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system. The proposed approach updates the parameters for encoder according to the rate-distortion criterion but keeps the decoder unchanged in the inference stage. Therefore, the encoder is adaptive to different video contents and achieves better compression performance by reducing the domain gap between the training and testing datasets. Our method is simple yet effective and outperforms the state-of-the-art learning based video codecs on benchmark datasets without increasing the model size or decreasing the decoding speed.
△ Less
Submitted 25 March, 2020;
originally announced March 2020.
-
Traffic signal control optimization under severe incident conditions using Genetic Algorithm
Authors:
Tuo Mao,
Adriana-Simona Mihaita,
Chen Cai
Abstract:
Traffic control optimization is a challenging task for various traffic centres in the world and majority of approaches focus only on applying adaptive methods under normal (recurrent) traffic conditions. But optimizing the control plans when severe incidents occur still remains a hard topic to address, especially if a high number of lanes or entire intersections are affected. This paper aims at ta…
▽ More
Traffic control optimization is a challenging task for various traffic centres in the world and majority of approaches focus only on applying adaptive methods under normal (recurrent) traffic conditions. But optimizing the control plans when severe incidents occur still remains a hard topic to address, especially if a high number of lanes or entire intersections are affected. This paper aims at tackling this problem and presents a novel methodology for optimizing the traffic signal timings in signalized urban intersections, under non-recurrent traffic incidents. The approach relies on deploying genetic algorithms (GA) by considering the phase durations as decision variables and the objective function to minimize as the total travel time in the network. Firstly, we develop the GA algorithm on a signalized testbed network under recurrent traffic conditions, with the purpose of fine-tuning the algorithm for crossover, mutation, fitness calculation, and obtain the optimal phase durations. Secondly, we apply the optimal signal timings previously found under severe incidents affecting the traffic flow in the network but without any further optimization. Lastly, we further apply the GA optimization under incident conditions and show that our approach improved the total travel time by almost 40.76%.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
Trip Table Estimation and Prediction for Dynamic Traffic Assignment Applications
Authors:
Sajjad Shafiei,
Adriana-Simona Mihaita,
Chen Cai
Abstract:
The study focuses on estimating and predicting time-varying origin to destination (OD) trip tables for a dynamic traffic assignment (DTA) model. A bi-level optimisation problem is formulated and solved to estimate OD flows from pre-existent demand matrix and historical traffic flow counts. The estimated demand is then considered as an input for a time series OD demand prediction model to support t…
▽ More
The study focuses on estimating and predicting time-varying origin to destination (OD) trip tables for a dynamic traffic assignment (DTA) model. A bi-level optimisation problem is formulated and solved to estimate OD flows from pre-existent demand matrix and historical traffic flow counts. The estimated demand is then considered as an input for a time series OD demand prediction model to support the DTA model for short-term traffic condition forecasting. Results show a high capability of the proposed OD demand estimation method to reduce the DTA model error through an iterative solution algorithm. Moreover, the applicability of the OD demand prediction approach is investigated for an incident analysis application for a major corridor in Sydney, Australia.
△ Less
Submitted 11 June, 2019;
originally announced June 2019.
-
Ubiquitous Acoustic Sensing on Commodity IoT Devices: A Survey
Authors:
Chao Cai,
Rong Zheng,
Jun Luo
Abstract:
With the proliferation of Internet-of-Things devices, acoustic sensing attracts much attention in recent years. It exploits acoustic transceivers such as microphones and speakers beyond their primary functions, namely recording and playing, to enable novel applications and new user experiences. In this paper, we present the first systematic survey of recent advances in active acoustic sensing usin…
▽ More
With the proliferation of Internet-of-Things devices, acoustic sensing attracts much attention in recent years. It exploits acoustic transceivers such as microphones and speakers beyond their primary functions, namely recording and playing, to enable novel applications and new user experiences. In this paper, we present the first systematic survey of recent advances in active acoustic sensing using commodity hardware with a frequency range below 24~\!kHz. We propose a general framework that categorizes main building blocks of acoustic sensing systems. This framework encompasses three layers, i.e., physical layer, core technique layer, and application layer. The physical layer includes basic hardware components, acoustic platforms as well as the air-borne and structure-borne channel characteristics. The core technique layer encompasses key mechanisms to generate acoustic signals (waveforms) and to extract useful temporal, spatial and spectral information from received signals. The application layer builds upon the functions offered by the core techniques to realize different acoustic sensing applications. We highlight unique challenges due to the limitations of physical devices and acoustic channels and how they are mitigated or overcame by core processing techniques and application-specific solutions. Finally, research opportunities and future directions are discussed to spawn further in-depth investigation on acoustic sensing.
△ Less
Submitted 12 August, 2021; v1 submitted 10 January, 2019;
originally announced January 2019.
-
DVC: An End-to-end Deep Video Compression Framework
Authors:
Guo Lu,
Wanli Ouyang,
Dong Xu,
Xiaoyun Zhang,
Chunlei Cai,
Zhiyong Gao
Abstract:
Conventional video compression approaches use the predictive coding architecture and encode the corresponding motion information and residual information. In this paper, taking advantage of both classical architecture in the conventional video compression method and the powerful non-linear representation ability of neural networks, we propose the first end-to-end video compression deep model that…
▽ More
Conventional video compression approaches use the predictive coding architecture and encode the corresponding motion information and residual information. In this paper, taking advantage of both classical architecture in the conventional video compression method and the powerful non-linear representation ability of neural networks, we propose the first end-to-end video compression deep model that jointly optimizes all the components for video compression. Specifically, learning based optical flow estimation is utilized to obtain the motion information and reconstruct the current frames. Then we employ two auto-encoder style neural networks to compress the corresponding motion and residual information. All the modules are jointly learned through a single loss function, in which they collaborate with each other by considering the trade-off between reducing the number of compression bits and improving quality of the decoded video. Experimental results show that the proposed approach can outperform the widely used video coding standard H.264 in terms of PSNR and be even on par with the latest standard H.265 in terms of MS-SSIM. Code is released at https://github.com/GuoLusjtu/DVC.
△ Less
Submitted 7 April, 2019; v1 submitted 30 November, 2018;
originally announced December 2018.
-
Audio-only Bird Species Automated Identification Method with Limited Training Data Based on Multi-Channel Deep Convolutional Neural Networks
Authors:
Jiang-jian Xie,
Chang-qing Ding,
Wen-bin Li,
Cheng-hao Cai
Abstract:
Based on the transfer learning, we design a bird species identification model that uses the VGG-16 model (pretrained on ImageNet) for feature extraction, then a classifier consisting of two fully-connected hidden layers and a Softmax layer is attached. We compare the performance of the proposed model with the original VGG16 model. The results show that the former has higher train efficiency, but l…
▽ More
Based on the transfer learning, we design a bird species identification model that uses the VGG-16 model (pretrained on ImageNet) for feature extraction, then a classifier consisting of two fully-connected hidden layers and a Softmax layer is attached. We compare the performance of the proposed model with the original VGG16 model. The results show that the former has higher train efficiency, but lower mean average precisions(MAP). To improve the MAP of the proposed model, we investigate the result fusion mode to form multi-channel identification model, the best MAP reaches 0.9998. The number of model parameters is 13110, which is only 0.0082% of the VGG16 model. Also, the size demand of sample is decreased.
△ Less
Submitted 3 March, 2018;
originally announced March 2018.