-
Data on the Move: Traffic-Oriented Data Trading Platform Powered by AI Agent with Common Sense
Authors:
Yi Yu,
Shengyue Yao,
Tianchen Zhou,
Yexuan Fu,
**gru Yu,
Ding Wang,
Xuhong Wang,
Cen Chen,
Yilun Lin
Abstract:
In the digital era, data has become a pivotal asset, advancing technologies such as autonomous driving. Despite this, data trading faces challenges like the absence of robust pricing methods and the lack of trustworthy trading mechanisms. To address these challenges, we introduce a traffic-oriented data trading platform named Data on The Move (DTM), integrating traffic simulation, data trading, an…
▽ More
In the digital era, data has become a pivotal asset, advancing technologies such as autonomous driving. Despite this, data trading faces challenges like the absence of robust pricing methods and the lack of trustworthy trading mechanisms. To address these challenges, we introduce a traffic-oriented data trading platform named Data on The Move (DTM), integrating traffic simulation, data trading, and Artificial Intelligent (AI) agents. The DTM platform supports evident-based data value evaluation and AI-based trading mechanisms. Leveraging the common sense capabilities of Large Language Models (LLMs) to assess traffic state and data value, DTM can determine reasonable traffic data pricing through multi-round interaction and simulations. Moreover, DTM provides a pricing method validation by simulating traffic systems, multi-agent interactions, and the heterogeneity and irrational behaviors of individuals in the trading market. Within the DTM platform, entities such as connected vehicles and traffic light controllers could engage in information collecting, data pricing, trading, and decision-making. Simulation results demonstrate that our proposed AI agent-based pricing approach enhances data trading by offering rational prices, as evidenced by the observed improvement in traffic efficiency. This underscores the effectiveness and practical value of DTM, offering new perspectives for the evolution of data markets and smart cities. To the best of our knowledge, this is the first study employing LLMs in data pricing and a pioneering data trading practice in the field of intelligent vehicles and smart cities.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Advancing Airport Tower Command Recognition: Integrating Squeeze-and-Excitation and Broadcasted Residual Learning
Authors:
Yuanxi Lin,
Tonglin Zhou,
Yang Xiao
Abstract:
Accurate recognition of aviation commands is vital for flight safety and efficiency, as pilots must follow air traffic control instructions precisely. This paper addresses challenges in speech command recognition, such as noisy environments and limited computational resources, by advancing keyword spotting technology. We create a dataset of standardized airport tower commands, including routine an…
▽ More
Accurate recognition of aviation commands is vital for flight safety and efficiency, as pilots must follow air traffic control instructions precisely. This paper addresses challenges in speech command recognition, such as noisy environments and limited computational resources, by advancing keyword spotting technology. We create a dataset of standardized airport tower commands, including routine and emergency instructions. We enhance broadcasted residual learning with squeeze-and-excitation and time-frame frequency-wise squeeze-and-excitation techniques, resulting in our BC-SENet model. This model focuses on crucial information with fewer parameters. Our tests on five keyword spotting models, including BC-SENet, demonstrate superior accuracy and efficiency. These findings highlight the effectiveness of our model advancements in improving speech command recognition for aviation safety and efficiency in noisy, high-stakes environments. Additionally, BC-SENet shows comparable performance on the common Google Speech Command dataset.
△ Less
Submitted 28 June, 2024; v1 submitted 26 June, 2024;
originally announced June 2024.
-
LFMamba: Light Field Image Super-Resolution with State Space Model
Authors:
Wang xia,
Yao Lu,
Shunzhou Wang,
Ziqi Wang,
Peiqi Xia,
Tianfei Zhou
Abstract:
Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scan…
▽ More
Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scanning mechanism (S6), exemplified by Mamba, has emerged as a superior alternative in various vision tasks compared to traditional CNN- and Transformer-based approaches, benefiting from its effective long-range sequence modeling capability and linear-time complexity. Therefore, integrating S6 into LFSR becomes compelling, especially considering the vast data volume of 4D light fields. However, the primary challenge lies in \emph{designing an appropriate scanning method for 4D light fields that effectively models light field features}. To tackle this, we employ SSMs on the informative 2D slices of 4D LFs to fully explore spatial contextual information, complementary angular information, and structure information. To achieve this, we carefully devise a basic SSM block characterized by an efficient SS2D mechanism that facilitates more effective and efficient feature learning on these 2D slices. Based on the above two designs, we further introduce an SSM-based network for LFSR termed LFMamba. Experimental results on LF benchmarks demonstrate the superior performance of LFMamba. Furthermore, extensive ablation studies are conducted to validate the efficacy and generalization ability of our proposed method. We expect that our LFMamba shed light on effective representation learning of LFs with state space models.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation
Authors:
Tianyu Huang,
Tao Zhou,
Weidi Xie,
Shuo Wang,
Qi Dou,
Yizhe Zhang
Abstract:
The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entai…
▽ More
The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entail manual or semi-manual corrections employing state-of-the-art annotation tools. Motivated by this process, we introduce a novel approach that leverages the advantages of online machine learning to enhance Segment Anything (SA) during test time. We employ rectified annotations to perform online learning, with the aim of improving the segmentation quality of SA on medical images. To improve the effectiveness and efficiency of online learning when integrated with large-scale vision models like SAM, we propose a new method called Auxiliary Online Learning (AuxOL). AuxOL creates and applies a small auxiliary model (specialist) in conjunction with SAM (generalist), entails adaptive online-batch and adaptive segmentation fusion. Experiments conducted on eight datasets covering four medical imaging modalities validate the effectiveness of the proposed method. Our work proposes and validates a new, practical, and effective approach for enhancing SA on downstream segmentation tasks (e.g., medical image segmentation).
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment
Authors:
Tianwei Zhou,
Songbai Tan,
Wei Zhou,
Yu Luo,
Yuan-Gen Wang,
Guanghui Yue
Abstract:
With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a nov…
▽ More
With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a novel blind image quality assessment (IQA) network, named AMFF-Net, for AGIs. AMFF-Net evaluates AGI quality from three dimensions, i.e., "visual quality", "authenticity", and "consistency". Specifically, inspired by the characteristics of the human visual system and motivated by the observation that "visual quality" and "authenticity" are characterized by both local and global aspects, AMFF-Net scales the image up and down and takes the scaled images and original-sized image as the inputs to obtain multi-scale features. After that, an Adaptive Feature Fusion (AFF) block is used to adaptively fuse the multi-scale features with learnable weights. In addition, considering the correlation between the image and prompt, AMFF-Net compares the semantic features from text encoder and image encoder to evaluate the text-to-image alignment. We carry out extensive experiments on three AGI quality assessment databases, and the experimental results show that our AMFF-Net obtains better performance than nine state-of-the-art blind IQA methods. The results of ablation experiments further demonstrate the effectiveness of the proposed multi-scale input strategy and AFF block.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Federated Prompt-based Decision Transformer for Customized VR Services in Mobile Edge Computing System
Authors:
Tailin Zhou,
Jiadong Yu,
Jun Zhang,
Danny H. K. Tsang
Abstract:
This paper investigates resource allocation to provide heterogeneous users with customized virtual reality (VR) services in a mobile edge computing (MEC) system. We first introduce a quality of experience (QoE) metric to measure user experience, which considers the MEC system's latency, user attention levels, and preferred resolutions. Then, a QoE maximization problem is formulated for resource al…
▽ More
This paper investigates resource allocation to provide heterogeneous users with customized virtual reality (VR) services in a mobile edge computing (MEC) system. We first introduce a quality of experience (QoE) metric to measure user experience, which considers the MEC system's latency, user attention levels, and preferred resolutions. Then, a QoE maximization problem is formulated for resource allocation to ensure the highest possible user experience,which is cast as a reinforcement learning problem, aiming to learn a generalized policy applicable across diverse user environments for all MEC servers. To learn the generalized policy, we propose a framework that employs federated learning (FL) and prompt-based sequence modeling to pre-train a common decision model across MEC servers, which is named FedPromptDT. Using FL solves the problem of insufficient local MEC data while protecting user privacy during offline training. The design of prompts integrating user-environment cues and user-preferred allocation improves the model's adaptability to various user environments during online execution.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Identification of Secondary Resonances of Nonlinear Systems using Phase-Locked Loop Testing
Authors:
Tong Zhou,
Gaetan Kerschen
Abstract:
One unique feature of nonlinear dynamical systems is the existence of superharmonic and subharmonic resonances in addition to primary resonances. In this study, an effective vibration testing methodology is introduced for the experimental identification of these secondary resonances. The proposed method relies on phase-locked loop control combined with adaptive filters for online Fourier decomposi…
▽ More
One unique feature of nonlinear dynamical systems is the existence of superharmonic and subharmonic resonances in addition to primary resonances. In this study, an effective vibration testing methodology is introduced for the experimental identification of these secondary resonances. The proposed method relies on phase-locked loop control combined with adaptive filters for online Fourier decomposition. To this end, the concept of a resonant phase lag is exploited to define the target phase lag to be followed during the experimental continuation process. The method is demonstrated using two systems featuring cubic nonlinearities, namely a numerical Duffing oscillator and a physical experiment comprising a clamped-clamped thin beam. The obtained results highlight that the control scheme can accurately characterize secondary resonances as well as track their backbone curves. A particularly salient feature of the developed algorithm is that, starting from the rest position, it facilitates an automatic and smooth dynamic state transfer toward one point of a subharmonic isolated branch, hence, inducing branch switching.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
SQA-SAM: Segmentation Quality Assessment for Medical Images Utilizing the Segment Anything Model
Authors:
Yizhe Zhang,
Shuo Wang,
Tao Zhou,
Qi Dou,
Danny Z. Chen
Abstract:
Segmentation quality assessment (SQA) plays a critical role in the deployment of a medical image based AI system. Users need to be informed/alerted whenever an AI system generates unreliable/incorrect predictions. With the introduction of the Segment Anything Model (SAM), a general foundation segmentation model, new research opportunities emerged in how one can utilize SAM for medical image segmen…
▽ More
Segmentation quality assessment (SQA) plays a critical role in the deployment of a medical image based AI system. Users need to be informed/alerted whenever an AI system generates unreliable/incorrect predictions. With the introduction of the Segment Anything Model (SAM), a general foundation segmentation model, new research opportunities emerged in how one can utilize SAM for medical image segmentation. In this paper, we propose a novel SQA method, called SQA-SAM, which exploits SAM to enhance the accuracy of quality assessment for medical image segmentation. When a medical image segmentation model (MedSeg) produces predictions for a test image, we generate visual prompts based on the predictions, and SAM is utilized to generate segmentation maps corresponding to the visual prompts. How well MedSeg's segmentation aligns with SAM's segmentation indicates how well MedSeg's segmentation aligns with the general perception of objectness and image region partition. We develop a score measure for such alignment. In experiments, we find that the generated scores exhibit moderate to strong positive correlation (in Pearson correlation and Spearman correlation) with Dice coefficient scores reflecting the true segmentation quality.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma
Authors:
Xiangde Luo,
Jia Fu,
Yunxin Zhong,
Shuolin Liu,
Bing Han,
Mehdi Astaraki,
Simone Bendazzoli,
Iuliana Toma-Dasu,
Yiwen Ye,
Ziyang Chen,
Yong Xia,
Yanzhou Su,
** Ye,
Junjun He,
Zhaohu Xing,
Hongqiu Wang,
Lei Zhu,
Kaixiang Yang,
Xin Fang,
Zhiwei Wang,
Chan Woong Lee,
Sang Joon Park,
Jaehee Chun,
Constantin Ulrich,
Klaus H. Maier-Hein
, et al. (17 additional authors not shown)
Abstract:
Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results…
▽ More
Radiation therapy is a primary and effective NasoPharyngeal Carcinoma (NPC) treatment strategy. The precise delineation of Gross Tumor Volumes (GTVs) and Organs-At-Risk (OARs) is crucial in radiation treatment, directly impacting patient prognosis. Previously, the delineation of GTVs and OARs was performed by experienced radiation oncologists. Recently, deep learning has achieved promising results in many medical image segmentation tasks. However, for NPC OARs and GTVs segmentation, few public datasets are available for model development and evaluation. To alleviate this problem, the SegRap2023 challenge was organized in conjunction with MICCAI2023 and presented a large-scale benchmark for OAR and GTV segmentation with 400 Computed Tomography (CT) scans from 200 NPC patients, each with a pair of pre-aligned non-contrast and contrast-enhanced CT scans. The challenge's goal was to segment 45 OARs and 2 GTVs from the paired CT scans. In this paper, we detail the challenge and analyze the solutions of all participants. The average Dice similarity coefficient scores for all submissions ranged from 76.68\% to 86.70\%, and 70.42\% to 73.44\% for OARs and GTVs, respectively. We conclude that the segmentation of large-size OARs is well-addressed, and more efforts are needed for GTVs and small-size or thin-structure OARs. The benchmark will remain publicly available here: https://segrap2023.grand-challenge.org
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Generalized Dam** Torque Analysis of Ultra-Low Frequency Oscillation in the Jerk Space
Authors:
Yichen Zhou,
Yang Yang,
Tao Zhou,
Yonggang Li
Abstract:
Ultra low frequency oscillation (ULFO) is significantly threatening the power system stability. Its unstable mechanism is mostly studied via generalized dam** torque analysis method (GDTA). However, the analysis still adopts the framework established for low frequency oscillation. Hence, this letter proposes a GDTA approach in the jerk space for ULFO. A multi-information variable is constructed…
▽ More
Ultra low frequency oscillation (ULFO) is significantly threatening the power system stability. Its unstable mechanism is mostly studied via generalized dam** torque analysis method (GDTA). However, the analysis still adopts the framework established for low frequency oscillation. Hence, this letter proposes a GDTA approach in the jerk space for ULFO. A multi-information variable is constructed to transform the system into a new state space, where it is found that the jerk dynamics of the turbine-generator cascaded system is a second-order differential equation. Benefiting from this characteristic, we propose a new form for GDTA using jerk dynamics, which is established in the frequency-frequency acceleration phase space. Then, analytical expressions of all dam** torque are provided. Finally, test results verified the proposed theoretical results. The negative dam** mechanism is revealed, and parameter adjustment measures are concluded.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Time Stretch with Continuous-Wave Lasers
Authors:
Tingyi Zhou,
Yuta Goto,
Takeshi Makino,
Callen MacPhee,
Yiming Zhou,
Asad M. Madni,
Hideaki Furukawa,
Naoya Wada,
Bahram Jalali
Abstract:
A single-shot measurement technique for ultrafast phenomena with high throughput enables the capture of rare events within a short time scale, facilitating the exploration of rare ultrafast processes. Photonic time stretch stands out as a highly effective method for both detecting rapid events and achieving remarkable speed in imaging and ranging applications. The current time stretch method relie…
▽ More
A single-shot measurement technique for ultrafast phenomena with high throughput enables the capture of rare events within a short time scale, facilitating the exploration of rare ultrafast processes. Photonic time stretch stands out as a highly effective method for both detecting rapid events and achieving remarkable speed in imaging and ranging applications. The current time stretch method relies on costly passive mode-locked lasers with continuous and fixed spectra to capture fast transients and dilate their time scale using dispersion. This hinders the broad application of time stretch technology and presents synchronization challenges with ultrafast events for measurement. Here we report the first implementation of time stretch using continuous wave (CW) diode lasers with discrete and tunable spectra that are common in WDM optical communication. This approach offers the potential for more cost-effective and compact time stretch systems and simplifies laser synchronization with the input signal. Two different embodiments in the United States and Japan demonstrate the technique's operation and limitations, and potential applications to time stretch imaging and angular light scattering.
△ Less
Submitted 1 November, 2023; v1 submitted 19 September, 2023;
originally announced September 2023.
-
CPU frequency scheduling of real-time applications on embedded devices with temporal encoding-based deep reinforcement learning
Authors:
Ti Zhou,
Man Lin
Abstract:
Small devices are frequently used in IoT and smart-city applications to perform periodic dedicated tasks with soft deadlines. This work focuses on develo** methods to derive efficient power-management methods for periodic tasks on small devices. We first study the limitations of the existing Linux built-in methods used in small devices. We illustrate three typical workload/system patterns that a…
▽ More
Small devices are frequently used in IoT and smart-city applications to perform periodic dedicated tasks with soft deadlines. This work focuses on develo** methods to derive efficient power-management methods for periodic tasks on small devices. We first study the limitations of the existing Linux built-in methods used in small devices. We illustrate three typical workload/system patterns that are challenging to manage with Linux's built-in solutions. We develop a reinforcement-learning-based technique with temporal encoding to derive an effective DVFS governor even with the presence of the three system patterns. The derived governor uses only one performance counter, the same as the built-in Linux mechanism, and does not require an explicit task model for the workload. We implemented a prototype system on the Nvidia Jetson Nano Board and experimented with it with six applications, including two self-designed and four benchmark applications. Under different deadline constraints, our approach can quickly derive a DVFS governor that can adapt to performance requirements and outperform the built-in Linux approach in energy saving. On Mibench workloads, with performance slack ranging from 0.04 s to 0.4 s, the proposed method can save 3% - 11% more energy compared to Ondemand. AudioReg and FaceReg applications tested have 5%- 14% energy-saving improvement. We have open-sourced the implementation of our in-kernel quantized neural network engine. The codebase can be found at: https://github.com/coladog/tinyagent.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning
Authors:
Qiang He,
Tianyi Zhou,
Meng Fang,
Setareh Maghsudi
Abstract:
We propose a novel value approximation method, namely Eigensubspace Regularized Critic (ERC) for deep reinforcement learning (RL). ERC is motivated by an analysis of the dynamics of Q-value approximation error in the Temporal-Difference (TD) method, which follows a path defined by the 1-eigensubspace of the transition kernel associated with the Markov Decision Process (MDP). It reveals a fundament…
▽ More
We propose a novel value approximation method, namely Eigensubspace Regularized Critic (ERC) for deep reinforcement learning (RL). ERC is motivated by an analysis of the dynamics of Q-value approximation error in the Temporal-Difference (TD) method, which follows a path defined by the 1-eigensubspace of the transition kernel associated with the Markov Decision Process (MDP). It reveals a fundamental property of TD learning that has remained unused in previous deep RL approaches. In ERC, we propose a regularizer that guides the approximation error tending towards the 1-eigensubspace, resulting in a more efficient and stable path of value approximation. Moreover, we theoretically prove the convergence of the ERC method. Besides, theoretical analysis and experiments demonstrate that ERC effectively reduces the variance of value functions. Among 26 tasks in the DMControl benchmark, ERC outperforms state-of-the-art methods for 20. Besides, it shows significant advantages in Q-value approximation and variance reduction. Our code is available at https://sites.google.com/view/erc-ecml23/.
△ Less
Submitted 8 November, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
-
Adaptive Policy Learning to Additional Tasks
Authors:
Wenjian Hao,
Zehui Lu,
Zihao Liang,
Tianyu Zhou,
Shaoshuai Mou
Abstract:
This paper develops a policy learning method for tuning a pre-trained policy to adapt to additional tasks without altering the original task. A method named Adaptive Policy Gradient (APG) is proposed in this paper, which combines Bellman's principle of optimality with the policy gradient approach to improve the convergence rate. This paper provides theoretical analysis which guarantees the converg…
▽ More
This paper develops a policy learning method for tuning a pre-trained policy to adapt to additional tasks without altering the original task. A method named Adaptive Policy Gradient (APG) is proposed in this paper, which combines Bellman's principle of optimality with the policy gradient approach to improve the convergence rate. This paper provides theoretical analysis which guarantees the convergence rate and sample complexity of $\mathcal{O}(1/T)$ and $\mathcal{O}(1/ε)$, respectively, where $T$ denotes the number of iterations and $ε$ denotes the accuracy of the resulting stationary policy. Furthermore, several challenging numerical simulations, including cartpole, lunar lander, and robot arm, are provided to show that APG obtains similar performance compared to existing deterministic policy gradient methods while utilizing much less data and converging at a faster rate.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Attention-based QoE-aware Digital Twin Empowered Edge Computing for Immersive Virtual Reality
Authors:
Jiadong Yu,
Ahmad Alhilal,
Tailin Zhou,
Pan Hui,
Danny H. K. Tsang
Abstract:
Metaverse applications such as virtual reality (VR) content streaming, require optimal resource allocation strategies for mobile edge computing (MEC) to ensure a high-quality user experience. In contrast to online reinforcement learning (RL) algorithms, which can incur substantial communication overheads and longer delays, the majority of existing works employ offline-trained RL algorithms for res…
▽ More
Metaverse applications such as virtual reality (VR) content streaming, require optimal resource allocation strategies for mobile edge computing (MEC) to ensure a high-quality user experience. In contrast to online reinforcement learning (RL) algorithms, which can incur substantial communication overheads and longer delays, the majority of existing works employ offline-trained RL algorithms for resource allocation decisions in MEC systems. However, they neglect the impact of desynchronization between the physical and digital worlds on the effectiveness of the allocation strategy. In this paper, we tackle this desynchronization using a continual RL framework that facilitates the resource allocation dynamically for MEC-enabled VR content streaming. We first design a digital twin-empowered edge computing (DTEC) system and formulate a quality of experience (QoE) maximization problem based on attention-based resolution perception. This problem optimizes the allocation of computing and bandwidth resources while adapting the attention-based resolution of the VR content. The continual RL framework in DTEC enables adaptive online execution in a time-varying environment. The reward function is defined based on the QoE and horizon-fairness QoE (hfQoE) constraints. Furthermore, we propose freshness prioritized experience replay - continual deep deterministic policy gradient (FPER-CDDPG) to enhance the performance of continual learning in the presence of time-varying DT updates. We test FPER-CDDPG using extensive experiments and evaluation. FPER-CDDPG outperforms the benchmarks in terms of average latency, QoE, and successful delivery rate as well as meeting the hfQoE requirements and performance over long-term execution while ensuring system scalability with the increasing number of users.
△ Less
Submitted 23 May, 2023; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Prediction of brain tumor recurrence location based on multi-modal fusion and nonlinear correlation learning
Authors:
Tongxue Zhou,
Alexandra Noeuveglise,
Romain Modzelewski,
Fethi Ghazouani,
Sébastien Thureau,
Maxime Fontanilles,
Su Ruan
Abstract:
Brain tumor is one of the leading causes of cancer death. The high-grade brain tumors are easier to recurrent even after standard treatment. Therefore, develo** a method to predict brain tumor recurrence location plays an important role in the treatment planning and it can potentially prolong patient's survival time. There is still little work to deal with this issue. In this paper, we present a…
▽ More
Brain tumor is one of the leading causes of cancer death. The high-grade brain tumors are easier to recurrent even after standard treatment. Therefore, develo** a method to predict brain tumor recurrence location plays an important role in the treatment planning and it can potentially prolong patient's survival time. There is still little work to deal with this issue. In this paper, we present a deep learning-based brain tumor recurrence location prediction network. Since the dataset is usually small, we propose to use transfer learning to improve the prediction. We first train a multi-modal brain tumor segmentation network on the public dataset BraTS 2021. Then, the pre-trained encoder is transferred to our private dataset for extracting the rich semantic features. Following that, a multi-scale multi-channel feature fusion model and a nonlinear correlation learning module are developed to learn the effective features. The correlation between multi-channel features is modeled by a nonlinear equation. To measure the similarity between the distributions of original features of one modality and the estimated correlated features of another modality, we propose to use Kullback-Leibler divergence. Based on this divergence, a correlation loss function is designed to maximize the similarity between the two feature distributions. Finally, two decoders are constructed to jointly segment the present brain tumor and predict its future tumor recurrence location. To the best of our knowledge, this is the first work that can segment the present tumor and at the same time predict future tumor recurrence location, making the treatment planning more efficient and precise. The experimental results demonstrated the effectiveness of our proposed method to predict the brain tumor recurrence location from the limited dataset.
△ Less
Submitted 10 April, 2023;
originally announced April 2023.
-
AI-assisted Automated Workflow for Real-time X-ray Ptychography Data Analysis via Federated Resources
Authors:
Anakha V Babu,
Tekin Bicer,
Saugat Kandel,
Tao Zhou,
Daniel J. Ching,
Steven Henke,
Siniša Veseli,
Ryan Chard,
Antonino Miceli,
Mathew Joseph Cherukara
Abstract:
We present an end-to-end automated workflow that uses large-scale remote compute resources and an embedded GPU platform at the edge to enable AI/ML-accelerated real-time analysis of data collected for x-ray ptychography. Ptychography is a lensless method that is being used to image samples through a simultaneous numerical inversion of a large number of diffraction patterns from adjacent overlappin…
▽ More
We present an end-to-end automated workflow that uses large-scale remote compute resources and an embedded GPU platform at the edge to enable AI/ML-accelerated real-time analysis of data collected for x-ray ptychography. Ptychography is a lensless method that is being used to image samples through a simultaneous numerical inversion of a large number of diffraction patterns from adjacent overlap** scan positions. This acquisition method can enable nanoscale imaging with x-rays and electrons, but this often requires very large experimental datasets and commensurately high turnaround times, which can limit experimental capabilities such as real-time experimental steering and low-latency monitoring. In this work, we introduce a software system that can automate ptychography data analysis tasks. We accelerate the data analysis pipeline by using a modified version of PtychoNN -- an ML-based approach to solve phase retrieval problem that shows two orders of magnitude speedup compared to traditional iterative methods. Further, our system coordinates and overlaps different data analysis tasks to minimize synchronization overhead between different stages of the workflow. We evaluate our workflow system with real-world experimental workloads from the 26ID beamline at Advanced Photon Source and ThetaGPU cluster at Argonne Leadership Computing Resources.
△ Less
Submitted 9 April, 2023;
originally announced April 2023.
-
Low Latency Computing for Time Stretch Instruments
Authors:
Tingyi Zhou,
Bahram Jalali
Abstract:
Time stretch instruments have been exceptionally successful in discovering single-shot ultrafast phenomena such as optical rogue waves and have led to record-speed microscopy, spectroscopy, lidar, etc. These instruments encode the ultrafast events into the spectrum of a femtosecond pulse and then dilate the time scale of the data using group velocity dispersion. Generating as much as Tbit per seco…
▽ More
Time stretch instruments have been exceptionally successful in discovering single-shot ultrafast phenomena such as optical rogue waves and have led to record-speed microscopy, spectroscopy, lidar, etc. These instruments encode the ultrafast events into the spectrum of a femtosecond pulse and then dilate the time scale of the data using group velocity dispersion. Generating as much as Tbit per second of data, they are ideal partners for deep learning networks which by their inherent complexity, require large datasets for training. However, the inference time scale of neural networks in the millisecond regime is orders of magnitude longer than the data acquisition rate of time stretch instruments. This underscores the need to explore means where some of the lower-level computational tasks can be done while the data is still in the optical domain. The Nonlinear Schrödinger Kernel computing addresses this predicament. It utilizes optical nonlinearities to map the data onto a new domain in which classification accuracy is enhanced, without increasing the data dimensions. One limitation of this technique is the fixed optical transfer function, which prevents training and generalizability. Here we show that the optical kernel can be effectively tuned and trained by utilizing digital phase encoding of the femtosecond laser pulse leading to a reduction of the error rate in data classification.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Active Simultaneously Transmitting and Reflecting (STAR)-RISs: Modelling and Analysis
Authors:
Jiaqi Xu,
Jiakuo Zuo,
Joey Tianyi Zhou,
Yuanwei Liu
Abstract:
A hardware model for active simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) is proposed consisting of reflection-type amplifiers. The amplitude gains of the STAR element are derived for both coupled and independent phase-shift scenarios. Based on the proposed hardware model, an active STAR-RIS-aided two-user downlink communication system is investigated.…
▽ More
A hardware model for active simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) is proposed consisting of reflection-type amplifiers. The amplitude gains of the STAR element are derived for both coupled and independent phase-shift scenarios. Based on the proposed hardware model, an active STAR-RIS-aided two-user downlink communication system is investigated. Closed-form expressions are obtained for the outage probabilities of both the coupled and independent phase-shift scenarios. To obtain further insights, scaling laws and diversity orders are derived for both users. Analytical results confirm that active STAR-RIS achieves the same diversity orders as passive ones while their scaling laws are different. It is proved that average received SNRs scale with M and M^2 for active and passive STAR-RISs, respectively. Numerical results show that active STAR-RISs outperform passive STAR-RISs in terms of outage probability especially when the number of elements is small.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
AMDET: Attention based Multiple Dimensions EEG Transformer for Emotion Recognition
Authors:
Yongling Xu,
Yang Du,
**g Zou,
Tianying Zhou,
Lushan Xiao,
Li Liu,
Pengcheng
Abstract:
Affective computing is an important branch of artificial intelligence, and with the rapid development of brain computer interface technology, emotion recognition based on EEG signals has received broad attention. It is still a great challenge to effectively explore the multi-dimensional information in the EEG data in spite of a large number of deep learning methods. In this paper, we propose a dee…
▽ More
Affective computing is an important branch of artificial intelligence, and with the rapid development of brain computer interface technology, emotion recognition based on EEG signals has received broad attention. It is still a great challenge to effectively explore the multi-dimensional information in the EEG data in spite of a large number of deep learning methods. In this paper, we propose a deep model called Attention-based Multiple Dimensions EEG Transformer (AMDET), which can exploit the complementarity among the spectral-spatial-temporal features of EEG data by employing the multi-dimensional global attention mechanism. We transformed the original EEG data into 3D temporal-spectral-spatial representations and then the AMDET would use spectral-spatial transformer encoder layer to extract effective features in the EEG signal and concentrate on the critical time frame with a temporal attention layer. We conduct extensive experiments on the DEAP, SEED, and SEED-IV datasets to evaluate the performance of AMDET and the results outperform the state-of-the-art baseline on three datasets. Accuracy rates of 97.48%, 96.85%, 97.17%, 87.32% were achieved in the DEAP-Arousal, DEAP-Valence, SEED, and SEED-IV datasets, respectively. We also conduct extensive experiments to explore the possible brain regions that influence emotions and the coupling of EEG signals. AMDET can perform as well even with few channels which are identified by visualizing what learned model focus on. The accuracy could achieve over 90% even with only eight channels and it is of great use and benefit for practical applications.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Multi-scale Transformer Network with Edge-aware Pre-training for Cross-Modality MR Image Synthesis
Authors:
Yonghao Li,
Tao Zhou,
Kelei He,
Yi Zhou,
Dinggang Shen
Abstract:
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones. Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model. However, it is often challenging to obtain sufficient paired data for supervised training. In reality, we often have a small number of paired data whil…
▽ More
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones. Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model. However, it is often challenging to obtain sufficient paired data for supervised training. In reality, we often have a small number of paired data while a large number of unpaired data. To take advantage of both paired and unpaired data, in this paper, we propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis. Specifically, an Edge-preserving Masked AutoEncoder (Edge-MAE) is first pre-trained in a self-supervised manner to simultaneously perform 1) image imputation for randomly masked patches in each image and 2) whole edge map estimation, which effectively learns both contextual and structural information. Besides, a novel patch-wise loss is proposed to enhance the performance of Edge-MAE by treating different masked patches differently according to the difficulties of their respective imputations. Based on this proposed pre-training, in the subsequent fine-tuning stage, a Dual-scale Selective Fusion (DSF) module is designed (in our MT-Net) to synthesize missing-modality images by integrating multi-scale features extracted from the encoder of the pre-trained Edge-MAE. Further, this pre-trained encoder is also employed to extract high-level features from the synthesized image and corresponding ground-truth image, which are required to be similar (consistent) in the training. Experimental results show that our MT-Net achieves comparable performance to the competing methods even using $70\%$ of all available paired data. Our code will be publicly available at https://github.com/lyhkevin/MT-Net.
△ Less
Submitted 18 June, 2023; v1 submitted 2 December, 2022;
originally announced December 2022.
-
A Structure-guided Effective and Temporal-lag Connectivity Network for Revealing Brain Disorder Mechanisms
Authors:
Zhengwang Xia,
Tao Zhou,
Saqib Mamoon,
Amani Alfakih,
Jianfeng Lu
Abstract:
Brain network provides important insights for the diagnosis of many brain disorders, and how to effectively model the brain structure has become one of the core issues in the domain of brain imaging analysis. Recently, various computational methods have been proposed to estimate the causal relationship (i.e., effective connectivity) between brain regions. Compared with traditional correlation-base…
▽ More
Brain network provides important insights for the diagnosis of many brain disorders, and how to effectively model the brain structure has become one of the core issues in the domain of brain imaging analysis. Recently, various computational methods have been proposed to estimate the causal relationship (i.e., effective connectivity) between brain regions. Compared with traditional correlation-based methods, effective connectivity can provide the direction of information flow, which may provide additional information for the diagnosis of brain diseases. However, existing methods either ignore the fact that there is a temporal-lag in the information transmission across brain regions, or simply set the temporal-lag value between all brain regions to a fixed value. To overcome these issues, we design an effective temporal-lag neural network (termed ETLN) to simultaneously infer the causal relationships and the temporal-lag values between brain regions, which can be trained in an end-to-end manner. In addition, we also introduce three mechanisms to better guide the modeling of brain networks. The evaluation results on the Alzheimer's Disease Neuroimaging Initiative (ADNI) database demonstrate the effectiveness of the proposed method.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Channel Modeling for UAV-to-Ground Communications with Posture Variation and Fuselage Scattering Effect
Authors:
Boyu Hua,
Haoran Ni,
Qiuming Zhu,
Cheng-Xiang Wang,
Tongtong Zhou,
Kai Mao,
Junwei Bao,
Xiaofei Zhang
Abstract:
Unmanned aerial vehicle (UAV)-to-ground (U2G) channel models play a pivotal role for reliable communications between UAV and ground terminal. This paper proposes a three-dimensional (3D) non-stationary hybrid model including both large-scale and small-scale fading for U2G multiple-input-multiple-output (MIMO) channels. Distinctive channel characteristics under U2G scenarios, i.e., 3D trajectory an…
▽ More
Unmanned aerial vehicle (UAV)-to-ground (U2G) channel models play a pivotal role for reliable communications between UAV and ground terminal. This paper proposes a three-dimensional (3D) non-stationary hybrid model including both large-scale and small-scale fading for U2G multiple-input-multiple-output (MIMO) channels. Distinctive channel characteristics under U2G scenarios, i.e., 3D trajectory and posture of UAV, fuselage scattering effect (FSE), and posture variation fading (PVF), are incorporated into the proposed model. The channel parameters, i.e., path loss (PL), shadow fading (SF), path delay, and path angle, are generated incorporating machine learning (ML) and ray tracing (RT) techniques to capture the structure-related characteristics. In order to guarantee the physical continuity of channel parameters such as Doppler phase and path power, the time evolution methods of inter- and intra- stationary intervals are proposed. Key statistical properties , i.e., temporal autocorrection function (ACF), power delay profile (PDP), level crossing rate (LCR), average fading duration (AFD), and stationary interval (SI) are given, and the impact of the change of fuselage and posture variation is analyzed. It is demonstrated that both posture variation and fuselage scattering have crucial effects on channel characteristics. The validity and practicability of the proposed model are verified by comparing the simulation results with the measured ones.
△ Less
Submitted 13 October, 2022; v1 submitted 5 October, 2022;
originally announced October 2022.
-
Deep learning at the edge enables real-time streaming ptychographic imaging
Authors:
Anakha V Babu,
Tao Zhou,
Saugat Kandel,
Tekin Bicer,
Zhengchun Liu,
William Judge,
Daniel J. Ching,
Yi Jiang,
Sinisa Veseli,
Steven Henke,
Ryan Chard,
Yudong Yao,
Ekaterina Sirazitdinova,
Geetika Gupta,
Martin V. Holt,
Ian T. Foster,
Antonino Miceli,
Mathew J. Cherukara
Abstract:
Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials charact…
▽ More
Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials characterization. However, associated significant increases in data and compute needs mean that conventional approaches no longer suffice for recovering sample images in real-time from high-speed coherent imaging experiments. Here, we demonstrate a workflow that leverages artificial intelligence at the edge and high-performance computing to enable real-time inversion on X-ray ptychography data streamed directly from a detector at up to 2 kHz. The proposed AI-enabled workflow eliminates the sampling constraints imposed by traditional ptychography, allowing low dose imaging using orders of magnitude less data than required by traditional methods.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
A Realistic 3D Non-Stationary Channel Model for UAV-to-Vehicle Communications Incorporating Fuselage Posture
Authors:
Boyu Hua,
Tongtong Zhou,
Qiuming Zhu,
Kai Mao,
Junwei Bao,
Weizhi Zhong,
Naeem Ahmed
Abstract:
Considering the unmanned aerial vehicle (UAV) three-dimensional (3D) posture, a novel 3D non-stationary geometry-based stochastic model (GBSM) is proposed for multiple-input multiple-output (MIMO) UAV-to-vehicle (U2V) channels. It consists of a line-of-sight (LoS) and non-line-of-sight (NLoS) components. The factor of fuselage posture is considered by introducing a time-variant 3D posture matrix.…
▽ More
Considering the unmanned aerial vehicle (UAV) three-dimensional (3D) posture, a novel 3D non-stationary geometry-based stochastic model (GBSM) is proposed for multiple-input multiple-output (MIMO) UAV-to-vehicle (U2V) channels. It consists of a line-of-sight (LoS) and non-line-of-sight (NLoS) components. The factor of fuselage posture is considered by introducing a time-variant 3D posture matrix. Some important statistical properties, i.e. the temporal autocorrelation function (ACF) and spatial cross correlation function (CCF), are derived and investigated. Simulation results show that the fuselage posture has significant impact on the U2V channel characteristic and aggravate the non-stationarity. The agreements between analytical, simulated, and measured results verify the correctness of proposed model and derivations. Moreover, it is demonstrated that the proposed model is also compatible to the existing GBSM without considering fuselage posture.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Simultaneously Transmitting and Reflecting (STAR)-RISs: Are they Applicable to Dual-Sided Incidence?
Authors:
Jiaqi Xu,
Xidong Mu,
Joey Tianyi Zhou,
Yuanwei Liu
Abstract:
A hardware model and a signal model are proposed for dual-sided simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs), where the signal simultaneously incident on both sides of the surface. Based on the proposed hardware model, signal models for dual-sided STAR-RISs are developed. For elements with scalar surface impedance, it is proved that their transmission…
▽ More
A hardware model and a signal model are proposed for dual-sided simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs), where the signal simultaneously incident on both sides of the surface. Based on the proposed hardware model, signal models for dual-sided STAR-RISs are developed. For elements with scalar surface impedance, it is proved that their transmission and reflection coefficients on both sides are identical. Based on the obtained symmetrical dual-sided STAR model, a STAR-RIS-aided two-user uplink communication system is investigated for both non-orthogonal multiple access (NOMA) and orthogonal multiple access (OMA) schemes. Analytical results for the outage probabilities for users are derived in the high transmit signal-to-noise ratio (SNR) regime. Numerical results demonstrate the performance gain of NOMA over OMA and reveal that the outage probability error floor can be lowered by adjusting the ratio between the amplitudes of transmission and reflection signals.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
Integrating Satellites and Mobile Edge Computing for 6G Wide-Area Edge Intelligence: Minimal Structures and Systematic Thinking
Authors:
Yueshan Lin,
Wei Feng,
Ting Zhou,
Yanmin Wang,
Yunfei Chen,
Ning Ge,
Cheng-Xiang Wang
Abstract:
The sixth-generation (6G) network will shift its focus to supporting everything including various machine-type devices (MTDs) in an everyone-centric manner. To ubiquitously cover the MTDs working in rural and disastrous areas, satellite communications become indispensable, while mobile edge computing (MEC) also plays an increasingly crucial role. Their sophisticated integration enables wide-area e…
▽ More
The sixth-generation (6G) network will shift its focus to supporting everything including various machine-type devices (MTDs) in an everyone-centric manner. To ubiquitously cover the MTDs working in rural and disastrous areas, satellite communications become indispensable, while mobile edge computing (MEC) also plays an increasingly crucial role. Their sophisticated integration enables wide-area edge intelligence which promises to facilitate globally-distributed customized services. In this article, we present typical use cases of integrated satellite-MEC networks and discuss the main challenges therein. Inspired by the protein structure and the systematic engineering methodology, we propose three minimal integrating structures, based on which a complex integrated satellite-MEC network can be treated as their extension and combination. We discuss the unique characteristics and key problems of each minimal structure. Accordingly, we establish an on-demand network orchestration framework to enrich the hierarchy of network management, which further leads to a process-oriented network optimization method. On that basis, a case study is utilized to showcase the benefits of on-demand network orchestration and process-oriented network optimization. Finally, we outline potential research issues to envision a more intelligent, more secure, and greener integrated network.
△ Less
Submitted 16 August, 2022;
originally announced August 2022.
-
Frequency Domain Identifiability and Sloppiness of Descriptor Systems with an LFT Structure
Authors:
Tong Zhou
Abstract:
Identifiability and sloppiness are investigated in this paper for the parameters of a descriptor system based on its frequency response samples. Two metrics are suggested respectively for measuring absolute and relative sloppiness of the parameter vector at a prescribed value. In this descriptor system, system matrices are assumed to depend on its parameters through a linear fractional transformat…
▽ More
Identifiability and sloppiness are investigated in this paper for the parameters of a descriptor system based on its frequency response samples. Two metrics are suggested respectively for measuring absolute and relative sloppiness of the parameter vector at a prescribed value. In this descriptor system, system matrices are assumed to depend on its parameters through a linear fractional transformation (LFT). When an associated transfer function matrix (TFM) is of full normal row rank, a matrix rank based necessary and sufficient condition is derived for parameter identifiability with a set of finitely many frequency responses. This condition can be verified recursively which is computationally quite appealing, especially when the system is of a large scale. From this condition, an algorithm is suggested to find a set of frequencies with which the frequency responses of the system are capable to uniquely determine its parameters. An ellipsoid approximation is given for the set consisting of all the parameter values with which the associated descriptor system has a frequency response that deviates within a prescribed distance, from that corresponding to a globally identifiable parameter vector value. Explicit formulas are also derived for the suggested absolute and relative sloppiness metrics.
△ Less
Submitted 21 June, 2022;
originally announced June 2022.
-
Multi-Scale Adaptive Network for Single Image Denoising
Authors:
Yuanbiao Gou,
Peng Hu,
Jiancheng Lv,
Joey Tianyi Zhou,
Xi Peng
Abstract:
Multi-scale architectures have shown effectiveness in a variety of tasks thanks to appealing cross-scale complementarity. However, existing architectures treat different scale features equally without considering the scale-specific characteristics, \textit{i.e.}, the within-scale characteristics are ignored in the architecture design. In this paper, we reveal this missing piece for multi-scale arc…
▽ More
Multi-scale architectures have shown effectiveness in a variety of tasks thanks to appealing cross-scale complementarity. However, existing architectures treat different scale features equally without considering the scale-specific characteristics, \textit{i.e.}, the within-scale characteristics are ignored in the architecture design. In this paper, we reveal this missing piece for multi-scale architecture design and accordingly propose a novel Multi-Scale Adaptive Network (MSANet) for single image denoising. Specifically, MSANet simultaneously embraces the within-scale characteristics and the cross-scale complementarity thanks to three novel neural blocks, \textit{i.e.}, adaptive feature block (AFeB), adaptive multi-scale block (AMB), and adaptive fusion block (AFuB). In brief, AFeB is designed to adaptively preserve image details and filter noises, which is highly expected for the features with mixed details and noises. AMB could enlarge the receptive field and aggregate the multi-scale information, which meets the need of contextually informative features. AFuB devotes to adaptively sampling and transferring the features from one scale to another scale, which fuses the multi-scale features with varying characteristics from coarse to fine. Extensive experiments on both three real and six synthetic noisy image datasets show the superiority of MSANet compared with 12 methods. The code could be accessed from https://github.com/XLearning-SCU/2022-NeurIPS-MSANet.
△ Less
Submitted 29 October, 2022; v1 submitted 8 March, 2022;
originally announced March 2022.
-
Mobile Device Association and Resource Allocation in Small-Cell IoT Networks with Mobile Edge Computing and Caching
Authors:
Tianqing Zhou,
Yali Yue,
Dong Qin,
Xuefang Nie,
Xuan Li,
Chunguo Li
Abstract:
To meet the need of computation-sensitive (CS) and high-rate (HR) communications, the framework of mobile edge computing and caching has been widely regarded as a promising solution. When such a framework is implemented in small-cell IoT (Internet of Tings) networks, it is a key and open topic how to assign mobile edge computing and caching servers to mobile devices (MDs) with CS and HR communicat…
▽ More
To meet the need of computation-sensitive (CS) and high-rate (HR) communications, the framework of mobile edge computing and caching has been widely regarded as a promising solution. When such a framework is implemented in small-cell IoT (Internet of Tings) networks, it is a key and open topic how to assign mobile edge computing and caching servers to mobile devices (MDs) with CS and HR communications. Since these servers are integrated into small base stations (BSs), the assignment of them refers to not only the BS selection (i.e., MD association), but also the selection of computing and caching modes. To mitigate the network interference and thus enhance the system performance, some highly-effective resource partitioning mechanisms are introduced for access and backhaul links firstly. After that a problem with minimizing the sum of MDs' weighted delays is formulated to attain a goal of joint MD association and resource allocation under limited resources. Considering that the MD association and resource allocation parameters are coupling in such a formulated problem, we develop an alternating optimization algorithm according to the coalitional game and convex optimization theorems. To ensure that the designed algorithm begins from a feasible initial solution, we develop an initiation algorithm according to the conventional best channel association, which is used for comparison and the input of coalition game in the simulation. Simulation results show that the algorithm designed for minimizing the sum of MDs' weighted delays may achieve a better performance than the initiation (best channel association) algorithm in general.
△ Less
Submitted 26 February, 2022;
originally announced February 2022.
-
Transformers in Time Series: A Survey
Authors:
Qingsong Wen,
Tian Zhou,
Chaoli Zhang,
Weiqi Chen,
Ziqing Ma,
Junchi Yan,
Liang Sun
Abstract:
Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interest in the time series community. Among multiple advantages of Transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applicati…
▽ More
Transformers have achieved superior performances in many tasks in natural language processing and computer vision, which also triggered great interest in the time series community. Among multiple advantages of Transformers, the ability to capture long-range dependencies and interactions is especially attractive for time series modeling, leading to exciting progress in various time series applications. In this paper, we systematically review Transformer schemes for time series modeling by highlighting their strengths as well as limitations. In particular, we examine the development of time series Transformers in two perspectives. From the perspective of network structure, we summarize the adaptations and modifications that have been made to Transformers in order to accommodate the challenges in time series analysis. From the perspective of applications, we categorize time series Transformers based on common tasks including forecasting, anomaly detection, and classification. Empirically, we perform robust analysis, model size analysis, and seasonal-trend decomposition analysis to study how Transformers perform in time series. Finally, we discuss and suggest future directions to provide useful research guidance. To the best of our knowledge, this paper is the first work to comprehensively and systematically summarize the recent advances of Transformers for modeling time series data. We hope this survey will ignite further research interests in time series Transformers.
△ Less
Submitted 11 May, 2023; v1 submitted 14 February, 2022;
originally announced February 2022.
-
Federated Dynamic Neural Network for Deep MIMO Detection
Authors:
Yuwen Yang,
Feifei Gao,
Jiang Xue,
Ting Zhou,
Zongben Xu
Abstract:
In this paper, we develop a dynamic detection network (DDNet) based detector for multiple-input multiple-output (MIMO) systems. By constructing an improved DetNet (IDetNet) detector and the OAMPNet detector as two independent network branches, the DDNet detector performs sample-wise dynamic routing to adaptively select a better one between the IDetNet and the OAMPNet detectors for every samples un…
▽ More
In this paper, we develop a dynamic detection network (DDNet) based detector for multiple-input multiple-output (MIMO) systems. By constructing an improved DetNet (IDetNet) detector and the OAMPNet detector as two independent network branches, the DDNet detector performs sample-wise dynamic routing to adaptively select a better one between the IDetNet and the OAMPNet detectors for every samples under different system conditions. To avoid the prohibitive transmission overhead of dataset collection in centralized learning (CL), we propose the federated averaging (FedAve)-DDNet detector, where all raw data are kept at local clients and only locally trained model parameters are transmitted to the central server for aggregation. To further reduce the transmission overhead, we develop the federated gradient sparsification (FedGS)-DDNet detector by randomly sampling gradients with elaborately calculated probability when uploading gradients to the central server. Based on simulation results, the proposed DDNet detector consistently outperforms other detectors under all system conditions thanks to the sample-wise dynamic routing. Moreover, the federated DDNet detectors, especially the FedGS-DDNet detector, can reduce the transmission overhead by at least 25.7\% while maintaining satisfactory detection accuracy.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Feature-enhanced Generation and Multi-modality Fusion based Deep Neural Network for Brain Tumor Segmentation with Missing MR Modalities
Authors:
Tongxue Zhou,
Stéphane Canu,
Pierre Vera,
Su Ruan
Abstract:
Using multimodal Magnetic Resonance Imaging (MRI) is necessary for accurate brain tumor segmentation. The main problem is that not all types of MRIs are always available in clinical exams. Based on the fact that there is a strong correlation between MR modalities of the same patient, in this work, we propose a novel brain tumor segmentation network in the case of missing one or more modalities. Th…
▽ More
Using multimodal Magnetic Resonance Imaging (MRI) is necessary for accurate brain tumor segmentation. The main problem is that not all types of MRIs are always available in clinical exams. Based on the fact that there is a strong correlation between MR modalities of the same patient, in this work, we propose a novel brain tumor segmentation network in the case of missing one or more modalities. The proposed network consists of three sub-networks: a feature-enhanced generator, a correlation constraint block and a segmentation network. The feature-enhanced generator utilizes the available modalities to generate 3D feature-enhanced image representing the missing modality. The correlation constraint block can exploit the multi-source correlation between the modalities and also constrain the generator to synthesize a feature-enhanced modality which must have a coherent correlation with the available modalities. The segmentation network is a multi-encoder based U-Net to achieve the final brain tumor segmentation. The proposed method is evaluated on BraTS 2018 dataset. Experimental results demonstrate the effectiveness of the proposed method which achieves the average Dice Score of 82.9, 74.9 and 59.1 on whole tumor, tumor core and enhancing tumor, respectively across all the situations, and outperforms the best method by 3.5%, 17% and 18.2%.
△ Less
Submitted 8 November, 2021;
originally announced November 2021.
-
A Tri-attention Fusion Guided Multi-modal Segmentation Network
Authors:
Tongxue Zhou,
Su Ruan,
Pierre Vera,
Stéphane Canu
Abstract:
In the field of multimodal segmentation, the correlation between different modalities can be considered for improving the segmentation results. Considering the correlation between different MR modalities, in this paper, we propose a multi-modality segmentation network guided by a novel tri-attention fusion. Our network includes N model-independent encoding paths with N image sources, a tri-attenti…
▽ More
In the field of multimodal segmentation, the correlation between different modalities can be considered for improving the segmentation results. Considering the correlation between different MR modalities, in this paper, we propose a multi-modality segmentation network guided by a novel tri-attention fusion. Our network includes N model-independent encoding paths with N image sources, a tri-attention fusion block, a dual-attention fusion block, and a decoding path. The model independent encoding paths can capture modality-specific features from the N modalities. Considering that not all the features extracted from the encoders are useful for segmentation, we propose to use dual attention based fusion to re-weight the features along the modality and space paths, which can suppress less informative features and emphasize the useful ones for each modality at different positions. Since there exists a strong correlation between different modalities, based on the dual attention fusion block, we propose a correlation attention module to form the tri-attention fusion block. In the correlation attention module, a correlation description block is first used to learn the correlation between modalities and then a constraint based on the correlation is used to guide the network to learn the latent correlated features which are more relevant for segmentation. Finally, the obtained fused feature representation is projected by the decoder to obtain the segmentation results. Our experiment results tested on BraTS 2018 dataset for brain tumor segmentation demonstrate the effectiveness of our proposed method.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Deep multi-modal aggregation network for MR image reconstruction with auxiliary modality
Authors:
Chun-Mei Feng,
Huazhu Fu,
Tianfei Zhou,
Yong Xu,
Ling Shao,
David Zhang
Abstract:
Magnetic resonance (MR) imaging produces detailed images of organs and tissues with better contrast, but it suffers from a long acquisition time, which makes the image quality vulnerable to say motion artifacts. Recently, many approaches have been developed to reconstruct full-sampled images from partially observed measurements to accelerate MR imaging. However, most approaches focused on reconstr…
▽ More
Magnetic resonance (MR) imaging produces detailed images of organs and tissues with better contrast, but it suffers from a long acquisition time, which makes the image quality vulnerable to say motion artifacts. Recently, many approaches have been developed to reconstruct full-sampled images from partially observed measurements to accelerate MR imaging. However, most approaches focused on reconstruction over a single modality, neglecting the discovery of correlation knowledge between the different modalities. Here we propose a Multi-modal Aggregation network for mR Image recOnstruction with auxiliary modality (MARIO), which is capable of discovering complementary representations from a fully sampled auxiliary modality, with which to hierarchically guide the reconstruction of a given target modality. This implies that our method can selectively aggregate multi-modal representations for better reconstruction, yielding comprehensive, multi-scale, multi-modal feature fusion. Extensive experiments on IXI and fastMRI datasets demonstrate the superiority of the proposed approach over state-of-the-art MR image reconstruction methods in removing artifacts.
△ Less
Submitted 21 February, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
Simultaneously Transmitting and Reflecting (STAR) Intelligent Omni-Surfaces, Their Modeling and Implementation
Authors:
Jiaqi Xu,
Yuanwei Liu,
Xidong Mu,
Joey Tianyi Zhou,
Lingyang Song,
H. Vincent Poor,
Lajos Hanzo
Abstract:
With the rapid development of advanced electromagnetic manipulation technologies, researchers and engineers are starting to study smart surfaces that can achieve enhanced coverages, high reconfigurability, and are easy to deploy. Among these efforts, simultaneously transmitting and reflecting intelligent omni-surface (STAR-IOS) is one of the most promising categories. Although pioneering works hav…
▽ More
With the rapid development of advanced electromagnetic manipulation technologies, researchers and engineers are starting to study smart surfaces that can achieve enhanced coverages, high reconfigurability, and are easy to deploy. Among these efforts, simultaneously transmitting and reflecting intelligent omni-surface (STAR-IOS) is one of the most promising categories. Although pioneering works have demonstrated the benefits of STAR-IOSs in terms of its wireless communication performance gain, several important issues remain unclear including practical hardware implementations and physics-compliant models for STAR-IOSs. In this paper, we answer these pressing questions of STAR-IOSs by discussing four practical hardware implementations of STAR-IOSs, as well as three hardware modelling methods and five channel modelling methods. These discussions not only categorize existing smart surface technologies but also serve as a physicscompliant pipeline for further investigating the STAR-IOSs.
△ Less
Submitted 3 September, 2021; v1 submitted 13 August, 2021;
originally announced August 2021.
-
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio
Authors:
Naoyuki Kanda,
Xiong Xiao,
Jian Wu,
Tianyan Zhou,
Yashesh Gaur,
Xiaofei Wang,
Zhong Meng,
Zhuo Chen,
Takuya Yoshioka
Abstract:
Speaker-attributed automatic speech recognition (SA-ASR) is a task to recognize "who spoke what" from multi-talker recordings. An SA-ASR system usually consists of multiple modules such as speech separation, speaker diarization and ASR. On the other hand, considering the joint optimization, an end-to-end (E2E) SA-ASR model has recently been proposed with promising results on simulation data. In th…
▽ More
Speaker-attributed automatic speech recognition (SA-ASR) is a task to recognize "who spoke what" from multi-talker recordings. An SA-ASR system usually consists of multiple modules such as speech separation, speaker diarization and ASR. On the other hand, considering the joint optimization, an end-to-end (E2E) SA-ASR model has recently been proposed with promising results on simulation data. In this paper, we present our recent study on the comparison of such modular and joint approaches towards SA-ASR on real monaural recordings. We develop state-of-the-art SA-ASR systems for both modular and joint approaches by leveraging large-scale training data, including 75 thousand hours of ASR training data and the VoxCeleb corpus for speaker representation learning. We also propose a new pipeline that performs the E2E SA-ASR model after speaker clustering. Our evaluation on the AMI meeting corpus reveals that after fine-tuning with a small real data, the joint system performs 8.9--29.9% better in accuracy compared to the best modular system while the modular system performs better before such fine-tuning. We also conduct various error analyses to show the remaining issues for the monaural SA-ASR.
△ Less
Submitted 17 September, 2021; v1 submitted 6 July, 2021;
originally announced July 2021.
-
Global Structure Identifiability and Reconstructibility of an NDS with Descriptor Subsystems
Authors:
Tong Zhou,
Kailin Yin
Abstract:
This paper investigates requirements on a networked dynamic system (NDS) such that its subsystem interactions can be solely determined from experiment data or reconstructed from its overall model. The NDS is constituted from several subsystems whose dynamics are described through a descriptor form. Except regularity on each subsystem and the whole NDS, no other restrictions are put on either subsy…
▽ More
This paper investigates requirements on a networked dynamic system (NDS) such that its subsystem interactions can be solely determined from experiment data or reconstructed from its overall model. The NDS is constituted from several subsystems whose dynamics are described through a descriptor form. Except regularity on each subsystem and the whole NDS, no other restrictions are put on either subsystem dynamics or subsystem interactions. A matrix rank based necessary and sufficient condition is derived for the global identifiability of subsystem interactions, which leads to several conclusions about NDS structure identifiability when there is some a priori information. This matrix also gives an explicit description for the set of subsystem interactions that can not be distinguished from experiment data only. In addition, under a well-posedness assumption, a necessary and sufficient condition is obtained for the reconstructibility of subsystem interactions from an NDS descriptor form model. This condition can be verified with each subsystem separately and is therefore attractive in the analysis and synthesis of a large-scale NDS. Simulation results show that rather than increases monotonically with the distance of subsystem interactions to the undifferentiable set, the magnitude of the external output differences between two NDSs with distinct subsystem interactions increases much more rapidly when one of them is close to be unstable. In addition, directions of probing signals are also very important in distinguishing external outputs of distinctive NDSs.These findings are expected to be helpful in identification experiment designs, etc.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
Joint Channel Estimation and Mixed-ADCs Allocation for Massive MIMO via Deep Learning
Authors:
Liangyuan Xu,
Feifei Gao,
Ting Zhou,
Shaodan Ma,
Wei Zhang
Abstract:
Millimeter wave (mmWave) multi-user massive multi-input multi-output (MIMO) is a promising technique for the next generation communication systems. However, the hardware cost and power consumption grow significantly as the number of radio frequency (RF) components increases, which hampers the deployment of practical massive MIMO systems. To address this issue and further facilitate the commerciali…
▽ More
Millimeter wave (mmWave) multi-user massive multi-input multi-output (MIMO) is a promising technique for the next generation communication systems. However, the hardware cost and power consumption grow significantly as the number of radio frequency (RF) components increases, which hampers the deployment of practical massive MIMO systems. To address this issue and further facilitate the commercialization of massive MIMO, mixed analog-to-digital converters (ADCs) architecture has been considered, where parts of conventionally assumed full-resolution ADCs are replaced by one-bit ADCs. In this paper, we first propose a deep learning-based (DL) joint pilot design and channel estimation method for mixed-ADCs mmWave massive MIMO. Specifically, we devise a pilot design neural network whose weights directly represent the optimized pilots, and develop a Runge-Kutta model-driven densely connected network as the channel estimator. Instead of randomly assigning the mixed-ADCs, we then design a novel antenna selection network for mixed-ADCs allocation to further improve the channel estimation accuracy. Moreover, we adopt an autoencoder-inspired end-to-end architecture to jointly optimize the pilot design, channel estimation and mixed-ADCs allocation networks. Simulation results show that the proposed DL-based methods have advantages over the traditional channel estimators as well as the state-of-the-art networks.
△ Less
Submitted 7 June, 2021;
originally announced June 2021.
-
Deep Unsupervised Learning for Joint Antenna Selection and Hybrid Beamforming
Authors:
Zhiyan Liu,
Yuwen Yang,
Feifei Gao,
Ting Zhou,
Hongbing Ma
Abstract:
In this paper, we propose a novel deep unsupervised learning-based approach that jointly optimizes antenna selection and hybrid beamforming to improve the hardware and spectral efficiencies of massive multiple-input-multiple-output (MIMO) downlink systems. By employing ResNet to extract features from the channel matrices, two neural networks, i.e., the antenna selection network (ASNet) and the hyb…
▽ More
In this paper, we propose a novel deep unsupervised learning-based approach that jointly optimizes antenna selection and hybrid beamforming to improve the hardware and spectral efficiencies of massive multiple-input-multiple-output (MIMO) downlink systems. By employing ResNet to extract features from the channel matrices, two neural networks, i.e., the antenna selection network (ASNet) and the hybrid beamforming network (BFNet), are respectively proposed for dynamic antenna selection and hybrid beamformer design. Furthermore, a deep probabilistic subsampling trick and a specially designed quantization function are respectively developed for ASNet and BFNet to preserve the differentiability while embedding discrete constraints into the network structures. With the aid of a flexibly designed loss function, ASNet and BFNet are jointly trained in a phased unsupervised way, which avoids the prohibitive computational cost of acquiring training labels in supervised learning. Simulation results demonstrate the advantage of the proposed approach over conventional optimization-based algorithms in terms of both the achieved rate and the computational complexity.
△ Less
Submitted 21 January, 2022; v1 submitted 6 June, 2021;
originally announced June 2021.
-
Conditional generator and multi-sourcecorrelation guided brain tumor segmentation with missing MR modalities
Authors:
Tongxue Zhou,
Stéphane Canu,
Pierre Vera,
Su Ruan
Abstract:
Brain tumor is one of the most high-risk cancers which causes the 5-year survival rate of only about 36%. Accurate diagnosis of brain tumor is critical for the treatment planning. However, complete data are not always available in clinical scenarios. In this paper, we propose a novel brain tumor segmentation network to deal with the missing data issue. To compensate for missing data, we propose to…
▽ More
Brain tumor is one of the most high-risk cancers which causes the 5-year survival rate of only about 36%. Accurate diagnosis of brain tumor is critical for the treatment planning. However, complete data are not always available in clinical scenarios. In this paper, we propose a novel brain tumor segmentation network to deal with the missing data issue. To compensate for missing data, we propose to use a conditional generator to generate the missing modality under the condition of the available modalities. As the multi-modality has a strong correlation in tumor region, we design a correlation constraint network to leverage the multi-source information. On the one hand, the correlation constraint network can help the conditional generator to generate the missing modality which should keep the multi-source correlation with the available modalities. On the other hand, it can guide the segmentation network to learn the correlated feature representations to improve the segmentation performance. The proposed network consists of a conditional generator, a correlation constraint network and a segmentation network. We carried out extensive experiments on BraTS 2018 dataset to evaluate the proposed method.The experimental results demonstrate the importance of the proposed components and the superior performance of the proposed method com-pared with the state-of-the-art methods
△ Less
Submitted 27 May, 2021;
originally announced May 2021.
-
Fast Camera Image Denoising on Mobile GPUs with Deep Learning, Mobile AI 2021 Challenge: Report
Authors:
Andrey Ignatov,
Kim Byeoung-su,
Radu Timofte,
Angeline Pouget,
Fenglong Song,
Cheng Li,
Shuai Xiao,
Zhongqian Fu,
Matteo Maggioni,
Yibin Huang,
Shen Cheng,
Xin Lu,
Yifeng Zhou,
Liangyu Chen,
Donghao Liu,
Xiangyu Zhang,
Haoqiang Fan,
Jian Sun,
Shuaicheng Liu,
Minsu Kwon,
Myungje Lee,
Jaeyoon Yoo,
Changbeom Kang,
Shinjo Wang,
Bin Huang
, et al. (7 additional authors not shown)
Abstract:
Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solut…
▽ More
Image denoising is one of the most critical problems in mobile photo processing. While many solutions have been proposed for this task, they are usually working with synthetic data and are too computationally expensive to run on mobile devices. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based image denoising solution that can demonstrate high efficiency on smartphone GPUs. For this, the participants were provided with a novel large-scale dataset consisting of noisy-clean image pairs captured in the wild. The runtime of all models was evaluated on the Samsung Exynos 2100 chipset with a powerful Mali GPU capable of accelerating floating-point and quantized neural networks. The proposed solutions are fully compatible with any mobile GPU and are capable of processing 480p resolution images under 40-80 ms while achieving high fidelity results. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Adaptive Partitioning Strategy for High-Dimensional Discrete Simulation-based Optimization Problems
Authors:
**g Lu,
Tianli Zhou,
Carolina Osorio
Abstract:
In this paper, we introduce a technique to enhance the computational efficiency of solution algorithms for high-dimensional discrete simulation-based optimization problems. The technique is based on innovative adaptive partitioning strategies that partition the feasible region using solutions that has already been simulated as well as prior knowledge of the problem of interesting. We integrate the…
▽ More
In this paper, we introduce a technique to enhance the computational efficiency of solution algorithms for high-dimensional discrete simulation-based optimization problems. The technique is based on innovative adaptive partitioning strategies that partition the feasible region using solutions that has already been simulated as well as prior knowledge of the problem of interesting. We integrate the proposed strategies with the Empirical Stochastic Branch-and-Bound framework proposed by Xu and Nelson (2013). This combination leads to a general-purpose discrete simulation-based optimization algorithm that is both globally convergent and has good small sample (finite-time) performance. The proposed general-purpose discrete simulation-based optimization algorithm is validated on a synthetic discrete simulation-based optimization problem and is then used to address a real-world car-sharing fleet assignment problem. Experiment results show that the proposed strategy can increase the algorithm efficiency significantly.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Latent Correlation Representation Learning for Brain Tumor Segmentation with Missing MRI Modalities
Authors:
Tongxue Zhou,
Stéphane Canu,
Pierre Vera,
Su Ruan
Abstract:
Magnetic Resonance Imaging (MRI) is a widely used imaging technique to assess brain tumor. Accurately segmenting brain tumor from MR images is the key to clinical diagnostics and treatment planning. In addition, multi-modal MR images can provide complementary information for accurate brain tumor segmentation. However, it's common to miss some imaging modalities in clinical practice. In this paper,…
▽ More
Magnetic Resonance Imaging (MRI) is a widely used imaging technique to assess brain tumor. Accurately segmenting brain tumor from MR images is the key to clinical diagnostics and treatment planning. In addition, multi-modal MR images can provide complementary information for accurate brain tumor segmentation. However, it's common to miss some imaging modalities in clinical practice. In this paper, we present a novel brain tumor segmentation algorithm with missing modalities. Since it exists a strong correlation between multi-modalities, a correlation model is proposed to specially represent the latent multi-source correlation. Thanks to the obtained correlation representation, the segmentation becomes more robust in the case of missing modality. First, the individual representation produced by each encoder is used to estimate the modality independent parameter. Then, the correlation model transforms all the individual representations to the latent multi-source correlation representations. Finally, the correlation representations across modalities are fused via attention mechanism into a shared representation to emphasize the most important features for segmentation. We evaluate our model on BraTS 2018 and BraTS 2019 dataset, it outperforms the current state-of-the-art methods and produces robust results when one or more modalities are missing.
△ Less
Submitted 20 April, 2021; v1 submitted 13 April, 2021;
originally announced April 2021.
-
Continuous Speech Separation with Ad Hoc Microphone Arrays
Authors:
Dongmei Wang,
Takuya Yoshioka,
Zhuo Chen,
Xiaofei Wang,
Tianyan Zhou,
Zhong Meng
Abstract:
Speech separation has been shown effective for multi-talker speech recognition. Under the ad hoc microphone array setup where the array consists of spatially distributed asynchronous microphones, additional challenges must be overcome as the geometry and number of microphones are unknown beforehand. Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently…
▽ More
Speech separation has been shown effective for multi-talker speech recognition. Under the ad hoc microphone array setup where the array consists of spatially distributed asynchronous microphones, additional challenges must be overcome as the geometry and number of microphones are unknown beforehand. Prior studies show, with a spatial-temporalinterleaving structure, neural networks can efficiently utilize the multi-channel signals of the ad hoc array. In this paper, we further extend this approach to continuous speech separation. Several techniques are introduced to enable speech separation for real continuous recordings. First, we apply a transformer-based network for spatio-temporal modeling of the ad hoc array signals. In addition, two methods are proposed to mitigate a speech duplication problem during single talker segments, which seems more severe in the ad hoc array scenarios. One method is device distortion simulation for reducing the acoustic mismatch between simulated training data and real recordings. The other is speaker counting to detect the single speaker segments and merge the output signal channels. Experimental results for AdHoc-LibiCSS, a new dataset consisting of continuous recordings of concatenated LibriSpeech utterances obtained by multiple different devices, show the proposed separation method can significantly improve the ASR accuracy for overlapped speech with little performance degradation for single talker segments.
△ Less
Submitted 3 March, 2021;
originally announced March 2021.
-
Dual-Path Modeling for Long Recording Speech Separation in Meetings
Authors:
Chenda Li,
Zhuo Chen,
Yi Luo,
Cong Han,
Tianyan Zhou,
Keisuke Kinoshita,
Marc Delcroix,
Shinji Watanabe,
Yanmin Qian
Abstract:
The continuous speech separation (CSS) is a task to separate the speech sources from a long, partially overlapped recording, which involves a varying number of speakers. A straightforward extension of conventional utterance-level speech separation to the CSS task is to segment the long recording with a size-fixed window and process each window separately. Though effective, this extension fails to…
▽ More
The continuous speech separation (CSS) is a task to separate the speech sources from a long, partially overlapped recording, which involves a varying number of speakers. A straightforward extension of conventional utterance-level speech separation to the CSS task is to segment the long recording with a size-fixed window and process each window separately. Though effective, this extension fails to model the long dependency in speech and thus leads to sub-optimum performance. The recent proposed dual-path modeling could be a remedy to this problem, thanks to its capability in jointly modeling the cross-window dependency and the local-window processing. In this work, we further extend the dual-path modeling framework for CSS task. A transformer-based dual-path system is proposed, which integrates transform layers for global modeling. The proposed models are applied to LibriCSS, a real recorded multi-talk dataset, and consistent WER reduction can be observed in the ASR evaluation for separated speech. Also, a dual-path transformer equipped with convolutional layers is proposed. It significantly reduces the computation amount by 30% with better WER evaluation. Furthermore, the online processing dual-path models are investigated, which shows 10% relative WER reduction compared to the baseline.
△ Less
Submitted 23 February, 2021;
originally announced February 2021.
-
3D Medical Multi-modal Segmentation Network Guided by Multi-source Correlation Constraint
Authors:
Tongxue Zhou,
Stéphane Canu,
Pierre Vera,
Su Ruan
Abstract:
In the field of multimodal segmentation, the correlation between different modalities can be considered for improving the segmentation results. In this paper, we propose a multi-modality segmentation network with a correlation constraint. Our network includes N model-independent encoding paths with N image sources, a correlation constraint block, a feature fusion block, and a decoding path. The mo…
▽ More
In the field of multimodal segmentation, the correlation between different modalities can be considered for improving the segmentation results. In this paper, we propose a multi-modality segmentation network with a correlation constraint. Our network includes N model-independent encoding paths with N image sources, a correlation constraint block, a feature fusion block, and a decoding path. The model independent encoding path can capture modality-specific features from the N modalities. Since there exists a strong correlation between different modalities, we first propose a linear correlation block to learn the correlation between modalities, then a loss function is used to guide the network to learn the correlated features based on the linear correlation block. This block forces the network to learn the latent correlated features which are more relevant for segmentation. Considering that not all the features extracted from the encoders are useful for segmentation, we propose to use dual attention based fusion block to recalibrate the features along the modality and spatial paths, which can suppress less informative features and emphasize the useful ones. The fused feature representation is finally projected by the decoder to obtain the segmentation result. Our experiment results tested on BraTS-2018 dataset for brain tumor segmentation demonstrate the effectiveness of our proposed method.
△ Less
Submitted 5 February, 2021;
originally announced February 2021.
-
Deep Learning based Antenna Selection and CSI Extrapolation in Massive MIMO Systems
Authors:
Bo Lin,
Feifei Gao,
Shun Zhang,
Ting Zhou,
Ahmed Alkhateeb
Abstract:
A critical bottleneck of massive multiple-input multiple-output (MIMO) system is the huge training overhead caused by downlink transmission, like channel estimation, downlink beamforming and covariance observation. In this paper, we propose to use the channel state information (CSI) of a small number of antennas to extrapolate the CSI of the other antennas and reduce the training overhead. Specifi…
▽ More
A critical bottleneck of massive multiple-input multiple-output (MIMO) system is the huge training overhead caused by downlink transmission, like channel estimation, downlink beamforming and covariance observation. In this paper, we propose to use the channel state information (CSI) of a small number of antennas to extrapolate the CSI of the other antennas and reduce the training overhead. Specifically, we design a deep neural network that we call an antenna domain extrapolation network (ADEN) that can exploit the correlation function among antennas. We then propose a deep learning (DL) based antenna selection network (ASN) that can select a limited antennas for optimizing the extrapolation, which is conventionally a type of combinatorial optimization and is difficult to solve. We trickly designed a constrained degradation algorithm to generate a differentiable approximation of the discrete antenna selection vector such that the back-propagation of the neural network can be guaranteed. Numerical results show that the proposed ADEN outperforms the traditional fully connected one, and the antenna selection scheme learned by ASN is much better than the trivially used uniform selection.
△ Less
Submitted 18 January, 2021;
originally announced January 2021.
-
Deep Learning for Latent Events Forecasting in Twitter Aided Caching Networks
Authors:
Zhong Yang,
Yuanwei Liu,
Yue Chen,
Joey Tianyi Zhou
Abstract:
A novel Twitter context aided content caching (TAC) framework is proposed for enhancing the caching efficiency by taking advantage of the legibility and massive volume of Twitter data. For the purpose of promoting the caching efficiency, three machine learning models are proposed to predict latent events and events popularity, utilizing collect Twitter data with geo-tags and geographic information…
▽ More
A novel Twitter context aided content caching (TAC) framework is proposed for enhancing the caching efficiency by taking advantage of the legibility and massive volume of Twitter data. For the purpose of promoting the caching efficiency, three machine learning models are proposed to predict latent events and events popularity, utilizing collect Twitter data with geo-tags and geographic information of the adjacent base stations (BSs). Firstly, we propose a latent Dirichlet allocation (LDA) model for latent events forecasting taking advantage of the superiority of the LDA model in natural language processing (NLP). Then, we conceive long short-term memory (LSTM) with skip-gram embedding approach and LSTM with continuous skip-gram-Geo-aware embedding approach for the events popularity forecasting. Lastly, we associate the predicted latent events and the popularity of the events with the caching strategy. Extensive practical experiments demonstrate that: (1) The proposed TAC framework outperforms the conventional caching framework and is capable of being employed in practical applications thanks to the associating ability with public interests. (2) The proposed LDA approach conserves superiority for natural language processing (NLP) in Twitter data. (3) The perplexity of the proposed skip-gram-based LSTM is lower compared with the conventional LDA approach. (4) Evaluation of the model demonstrates that the hit rates of tweets of the model vary from 50% to 65% and the hit rate of the caching contents is up to approximately 75\% with smaller caching space compared to conventional algorithms.
△ Less
Submitted 4 January, 2021;
originally announced January 2021.
-
Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording
Authors:
Cong Han,
Yi Luo,
Chenda Li,
Tianyan Zhou,
Keisuke Kinoshita,
Shinji Watanabe,
Marc Delcroix,
Hakan Erdogan,
John R. Hershey,
Nima Mesgarani,
Zhuo Chen
Abstract:
Leveraging additional speaker information to facilitate speech separation has received increasing attention in recent years. Recent research includes extracting target speech by using the target speaker's voice snippet and jointly separating all participating speakers by using a pool of additional speaker signals, which is known as speech separation using speaker inventory (SSUSI). However, all th…
▽ More
Leveraging additional speaker information to facilitate speech separation has received increasing attention in recent years. Recent research includes extracting target speech by using the target speaker's voice snippet and jointly separating all participating speakers by using a pool of additional speaker signals, which is known as speech separation using speaker inventory (SSUSI). However, all these systems ideally assume that the pre-enrolled speaker signals are available and are only evaluated on simple data configurations. In realistic multi-talker conversations, the speech signal contains a large proportion of non-overlapped regions, where we can derive robust speaker embedding of individual talkers. In this work, we adopt the SSUSI model in long recordings and propose a self-informed, clustering-based inventory forming scheme for long recording, where the speaker inventory is fully built from the input signal without the need for external speaker signals. Experiment results on simulated noisy reverberant long recording datasets show that the proposed method can significantly improve the separation performance across various conditions.
△ Less
Submitted 18 December, 2020; v1 submitted 17 December, 2020;
originally announced December 2020.