Search | arXiv e-print repository

Iterative Regularization with k-support Norm: An Important Complement to Sparse Recovery

Authors: William de Vazelhes, Bhaskar Mukhoty, Xiao-Tong Yuan, Bin Gu

Abstract: Sparse recovery is ubiquitous in machine learning and signal processing. Due to the NP-hard nature of sparse recovery, existing methods are known to suffer either from restrictive (or even unknown) applicability conditions, or high computational cost. Recently, iterative regularization methods have emerged as a promising fast approach because they can achieve sparse recovery in one pass through ea… ▽ More Sparse recovery is ubiquitous in machine learning and signal processing. Due to the NP-hard nature of sparse recovery, existing methods are known to suffer either from restrictive (or even unknown) applicability conditions, or high computational cost. Recently, iterative regularization methods have emerged as a promising fast approach because they can achieve sparse recovery in one pass through early stop**, rather than the tedious grid-search used in the traditional methods. However, most of those iterative methods are based on the $\ell_1$ norm which requires restrictive applicability conditions and could fail in many cases. Therefore, achieving sparse recovery with iterative regularization methods under a wider range of conditions has yet to be further explored. To address this issue, we propose a novel iterative regularization algorithm, IRKSN, based on the $k$-support norm regularizer rather than the $\ell_1$ norm. We provide conditions for sparse recovery with IRKSN, and compare them with traditional conditions for recovery with $\ell_1$ norm regularizers. Additionally, we give an early stop** bound on the model error of IRKSN with explicit constants, achieving the standard linear rate for sparse recovery. Finally, we illustrate the applicability of our algorithm on several experiments, including a support recovery experiment with a correlated design matrix. △ Less

Submitted 19 March, 2024; v1 submitted 19 December, 2023; originally announced January 2024.

Comments: Accepted at AAAI 2024. Code at https://github.com/wdevazelhes/IRKSN_AAAI2024

arXiv:2309.05908 [pdf, other]

Reset Controller Synthesis by Reach-avoid Analysis for Delay Hybrid Systems

Authors: Han Su, Jiyu Zhu, Shenghua Feng, Yunjun Bai, Bin Gu, Jiang Liu, Mengfei Yang, Naijun Zhan

Abstract: A reset controller plays a crucial role in designing hybrid systems. It restricts the initial set and redefines the reset map associated with discrete transitions, in order to guarantee the system to achieve its objective. Reset controller synthesis, together with feedback controller synthesis and switching logic controller synthesis, provides a correct-by-construction approach to designing hybrid… ▽ More A reset controller plays a crucial role in designing hybrid systems. It restricts the initial set and redefines the reset map associated with discrete transitions, in order to guarantee the system to achieve its objective. Reset controller synthesis, together with feedback controller synthesis and switching logic controller synthesis, provides a correct-by-construction approach to designing hybrid systems. However, time-delay is an inevitable factor in hybrid systems, which can degrade control performance and render verification certificates obtained by abstracting away time-delay invalid in practice. In this paper, we investigate this issue in a practical manner by taking time-delay into account. We propose an approach that reduces the synthesis of reset controllers to the generation of reach-avoid sets for the hybrid system under consideration, which can be efficiently solved using off-the-shell convex optimization solvers. △ Less

Submitted 27 May, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: 15 pages, 10 figures

arXiv:2309.05906 [pdf, other]

Correct-by-Construction for Hybrid Systems by Synthesizing Reset Controller

Authors: Jiang Liu, Han Su, Yunjun Bai, Bin Gu, Bai Xue, Mengfei Yang, Naijun Zhan

Abstract: Controller synthesis, including reset controller, feedback controller, and switching logic controller, provides an essential mechanism to guarantee the correctness and reliability of hybrid systems in a correct-by-construction manner. Unfortunately, reset controller synthesis is still in an infant stage in the literature, although it makes theoretical and practical significance. In this paper, we… ▽ More Controller synthesis, including reset controller, feedback controller, and switching logic controller, provides an essential mechanism to guarantee the correctness and reliability of hybrid systems in a correct-by-construction manner. Unfortunately, reset controller synthesis is still in an infant stage in the literature, although it makes theoretical and practical significance. In this paper, we propose a convex programming based method to synthesize reset controllers for polynomial hybrid systems subject to safety, possibly together with liveness. Such a problem essentially corresponds to computing an initial set of continuous states in each mode and a reset map associated with each discrete jump such that any trajectory starting from any computed initial state keeps safe if only safety constraints are given or reaches the target set eventually and keeps safe before that if both safety and liveness are given, through the computed reset maps. Both cases can be reduced to reach-avoid and/or differential invariant generation problems, further encoded as convex optimization problems. Finally, several examples are provided to demonstrate the efficiency and effectiveness of our method. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Comments: 26 pages, 8 figures

arXiv:2306.13874 [pdf, other]

Enhancing Spectrum Sensing via Reconfigurable Intelligent Surfaces: Passive or Active Sensing and How Many Reflecting Elements are Needed?

Authors: Hao Xie, Dong Li, Bowen Gu

Abstract: Cognitive radio has been proposed to alleviate the scarcity of available spectrum caused by the significant demand for wideband services and the fragmentation of spectrum resources. However, sensing performance is quite poor due to the low sensing signal-to-noise ratio, especially in complex environments with severe channel fading. Fortunately, reconfigurable intelligent surface (RIS)-aided spectr… ▽ More Cognitive radio has been proposed to alleviate the scarcity of available spectrum caused by the significant demand for wideband services and the fragmentation of spectrum resources. However, sensing performance is quite poor due to the low sensing signal-to-noise ratio, especially in complex environments with severe channel fading. Fortunately, reconfigurable intelligent surface (RIS)-aided spectrum sensing can effectively tackle the above challenge due to its high array gain. Nevertheless, the traditional passive RIS may suffer from the ``double fading'' effect, which severely limits the performance of passive RIS-aided spectrum sensing. Thus, a crucial challenge is how to fully exploit the potential advantages of the RIS and further improve the sensing performance. To this end, we introduce the active RIS into spectrum sensing and respectively formulate two optimization problems for the passive RIS and the active RIS to maximize the detection probability. In light of the intractability of the formulated problems, we develop a one-stage optimization algorithm with inner approximation and a two-stage optimization algorithm with a bisection method to obtain sub-optimal solutions, and apply the Rayleigh quotient to obtain the upper and lower bounds of the detection probability. Furthermore, in order to gain more insight into the impact of the RIS on spectrum sensing, we respectively investigate the number configuration for passive RIS and active RIS and analyze how many reflecting elements are needed to achieve the detection probability close to 1. Simulation results verify that the proposed algorithms outperform existing algorithms under the same parameter configuration, and achieve a detection probability close to 1 with even fewer reflecting elements or antennas than existing schemes. △ Less

Submitted 21 October, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

arXiv:2305.09946 [pdf]

AdaMSS: Adaptive Multi-Modality Segmentation-to-Survival Learning for Survival Outcome Prediction from PET/CT Images

Authors: Mingyuan Meng, Bingxin Gu, Michael Fulham, Shaoli Song, Dagan Feng, Lei Bi, **man Kim

Abstract: Survival prediction is a major concern for cancer management. Deep survival models based on deep learning have been widely adopted to perform end-to-end survival prediction from medical images. Recent deep survival models achieved promising performance by jointly performing tumor segmentation with survival prediction, where the models were guided to extract tumor-related information through Multi-… ▽ More Survival prediction is a major concern for cancer management. Deep survival models based on deep learning have been widely adopted to perform end-to-end survival prediction from medical images. Recent deep survival models achieved promising performance by jointly performing tumor segmentation with survival prediction, where the models were guided to extract tumor-related information through Multi-Task Learning (MTL). However, these deep survival models have difficulties in exploring out-of-tumor prognostic information. In addition, existing deep survival models are unable to effectively leverage multi-modality images. Empirically-designed fusion strategies were commonly adopted to fuse multi-modality information via task-specific manually-designed networks, thus limiting the adaptability to different scenarios. In this study, we propose an Adaptive Multi-modality Segmentation-to-Survival model (AdaMSS) for survival prediction from PET/CT images. Instead of adopting MTL, we propose a novel Segmentation-to-Survival Learning (SSL) strategy, where our AdaMSS is trained for tumor segmentation and survival prediction sequentially in two stages. This strategy enables the AdaMSS to focus on tumor regions in the first stage and gradually expand its focus to include other prognosis-related regions in the second stage. We also propose a data-driven strategy to fuse multi-modality information, which realizes adaptive optimization of fusion strategies based on training data during training. With the SSL and data-driven fusion strategies, our AdaMSS is designed as an adaptive model that can self-adapt its focus regions and fusion strategy for different training stages. Extensive experiments with two large clinical datasets show that our AdaMSS outperforms state-of-the-art survival prediction methods. △ Less

Submitted 19 July, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: Under Review

arXiv:2305.06094 [pdf, other]

doi 10.1109/TVT.2023.3348200

Computation-Efficient Backscatter-Blessed MEC with User Reciprocity

Authors: Bowen Gu, Hao Xie, Dong Li

Abstract: This letter proposes a new user cooperative offloading protocol called user reciprocity in backscatter communication (BackCom)-aided mobile edge computing systems with efficient computation, whose quintessence is that each user can switch alternately between the active or the BackCom mode in different slots, and one user works in the active mode and the other user works in the BackCom mode in each… ▽ More This letter proposes a new user cooperative offloading protocol called user reciprocity in backscatter communication (BackCom)-aided mobile edge computing systems with efficient computation, whose quintessence is that each user can switch alternately between the active or the BackCom mode in different slots, and one user works in the active mode and the other user works in the BackCom mode in each time slot. In particular, the user in the BackCom mode can always use the signal transmitted by the user in the active mode for more data transmission in a spectrum-sharing manner. To evaluate the proposed protocol, a computation efficiency (CE) maximization-based optimization problem is formulated by jointly power control, time scheduling, reflection coefficient adjustment, and computing frequency allocation, while satisfying various physical constraints on the maximum energy budget, the computing frequency threshold, the minimum computed bits, and harvested energy threshold. To solve this non-convex problem, Dinkelbach's method and quadratic transform are first employed to transform the complex fractional forms into linear ones. Then, an iterative algorithm is designed by decomposing the resulting problem to obtain the suboptimal solution. The closed-form solutions for the transmit power, the RC, and the local computing frequency are provided for more insights. Besides, the analytical performance gain with the reciprocal mode is also derived. Simulation results demonstrate that the proposed scheme outperforms benchmark schemes regarding the CE. △ Less

Submitted 10 May, 2023; originally announced May 2023.

arXiv:2303.01249 [pdf, ps, other]

Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition

Authors: Zhijie Shen, Wu Guo, Bin Gu

Abstract: In this paper, we propose a language-universal adapter learning framework based on a pre-trained model for end-to-end multilingual automatic speech recognition (ASR). For acoustic modeling, the wav2vec 2.0 pre-trained model is fine-tuned by inserting language-specific and language-universal adapters. An online knowledge distillation is then used to enable the language-universal adapters to learn b… ▽ More In this paper, we propose a language-universal adapter learning framework based on a pre-trained model for end-to-end multilingual automatic speech recognition (ASR). For acoustic modeling, the wav2vec 2.0 pre-trained model is fine-tuned by inserting language-specific and language-universal adapters. An online knowledge distillation is then used to enable the language-universal adapters to learn both language-specific and universal features. The linguistic information confusion is also reduced by leveraging language identifiers (LIDs). With LIDs we perform a position-wise modification on the multi-head attention outputs. In the inference procedure, the language-specific adapters are removed while the language-universal adapters are kept activated. The proposed method improves the recognition accuracy and addresses the linear increase of the number of adapters' parameters with the number of languages in common multilingual ASR systems. Experiments on the BABEL dataset confirm the effectiveness of the proposed framework. Compared to the conventional multilingual model, a 3.3% absolute error rate reduction is achieved. The code is available at: https://github.com/shen9712/UniversalAdapterLearning. △ Less

Submitted 28 February, 2023; originally announced March 2023.

arXiv:2212.13396 [pdf, other]

Bayesian Optimization Enhanced Deep Reinforcement Learning for Trajectory Planning and Network Formation in Multi-UAV Networks

Authors: Shimin Gong, Meng Wang, Bo Gu, Wenjie Zhang, Dinh Thai Hoang, Dusit Niyato

Abstract: In this paper, we employ multiple UAVs coordinated by a base station (BS) to help the ground users (GUs) to offload their sensing data. Different UAVs can adapt their trajectories and network formation to expedite data transmissions via multi-hop relaying. The trajectory planning aims to collect all GUs' data, while the UAVs' network formation optimizes the multi-hop UAV network topology to minimi… ▽ More In this paper, we employ multiple UAVs coordinated by a base station (BS) to help the ground users (GUs) to offload their sensing data. Different UAVs can adapt their trajectories and network formation to expedite data transmissions via multi-hop relaying. The trajectory planning aims to collect all GUs' data, while the UAVs' network formation optimizes the multi-hop UAV network topology to minimize the energy consumption and transmission delay. The joint network formation and trajectory optimization is solved by a two-step iterative approach. Firstly, we devise the adaptive network formation scheme by using a heuristic algorithm to balance the UAVs' energy consumption and data queue size. Then, with the fixed network formation, the UAVs' trajectories are further optimized by using multi-agent deep reinforcement learning without knowing the GUs' traffic demands and spatial distribution. To improve the learning efficiency, we further employ Bayesian optimization to estimate the UAVs' flying decisions based on historical trajectory points. This helps avoid inefficient action explorations and improves the convergence rate in the model training. The simulation results reveal close spatial-temporal couplings between the UAVs' trajectory planning and network formation. Compared with several baselines, our solution can better exploit the UAVs' cooperation in data offloading, thus improving energy efficiency and delay performance. △ Less

Submitted 27 December, 2022; originally announced December 2022.

Comments: 15 pages, 10 figures, 2 algorithms

arXiv:2212.13390 [pdf, other]

Hierarchical Deep Reinforcement Learning for Age-of-Information Minimization in IRS-aided and Wireless-powered Wireless Networks

Authors: Shimin Gong, Leiyang Cui, Bo Gu, Bin Lyu, Dinh Thai Hoang, Dusit Niyato

Abstract: In this paper, we focus on a wireless-powered sensor network coordinated by a multi-antenna access point (AP). Each node can generate sensing information and report the latest information to the AP using the energy harvested from the AP's signal beamforming. We aim to minimize the average age-of-information (AoI) by adapting the nodes' transmission scheduling and the transmission control strategie… ▽ More In this paper, we focus on a wireless-powered sensor network coordinated by a multi-antenna access point (AP). Each node can generate sensing information and report the latest information to the AP using the energy harvested from the AP's signal beamforming. We aim to minimize the average age-of-information (AoI) by adapting the nodes' transmission scheduling and the transmission control strategies jointly. To reduce the transmission delay, an intelligent reflecting surface (IRS) is used to enhance the channel conditions by controlling the AP's beamforming vector and the IRS's phase shifting matrix. Considering dynamic data arrivals at different sensing nodes, we propose a hierarchical deep reinforcement learning (DRL) framework to for AoI minimization in two steps. The users' transmission scheduling is firstly determined by the outer-loop DRL approach, e.g. the DQN or PPO algorithm, and then the inner-loop optimization is used to adapt either the uplink information transmission or downlink energy transfer to all nodes. A simple and efficient approximation is also proposed to reduce the inner-loop rum time overhead. Numerical results verify that the hierarchical learning framework outperforms typical baselines in terms of the average AoI and proportional fairness among different nodes. △ Less

Submitted 27 December, 2022; originally announced December 2022.

Comments: 31 pages, 6 figures, 2 tables, 3 algorithms

arXiv:2212.08298 [pdf, other]

Exploring Hybrid Active-Passive RIS-Aided MEC Systems: From the Mode-Switching Perspective

Authors: Hao Xie, Dong Li, Bowen Gu

Abstract: Mobile edge computing (MEC) has been regarded as a promising technique to support latencysensitivity and computation-intensive serves. However, the low offloading rate caused by the random channel fading characteristic becomes a major bottleneck in restricting the performance of the MEC. Fortunately, reconfigurable intelligent surface (RIS) can alleviate this problem since it can boost both the sp… ▽ More Mobile edge computing (MEC) has been regarded as a promising technique to support latencysensitivity and computation-intensive serves. However, the low offloading rate caused by the random channel fading characteristic becomes a major bottleneck in restricting the performance of the MEC. Fortunately, reconfigurable intelligent surface (RIS) can alleviate this problem since it can boost both the spectrum- and energy- efficiency. Different from the existing works adopting either fully active or fully passive RIS, we propose a novel hybrid RIS in which reflecting units can flexibly switch between active and passive modes. To achieve a tradeoff between the latency and energy consumption, an optimization problem is formulated by minimizing the total cost. In light of the intractability of the problem, we develop an alternating optimization-based iterative algorithm by combining the successive convex approximation method, the variable substitution, and the singular value decomposition (SVD) to obtain sub-optimal solutions. Furthermore, in order to gain more insight into the problem, we consider two special cases involving a latency minimization problem and an energy consumption minimization problem, and respectively analyze the tradeoff between the number of active and passive units. Simulation results verify that the proposed algorithm can achieve flexible mode switching and significantly outperforms existing algorithms. △ Less

Submitted 21 March, 2024; v1 submitted 16 December, 2022; originally announced December 2022.

arXiv:2210.03674 [pdf, other]

Reinforcement Learning Approach for Multi-Agent Flexible Scheduling Problems

Authors: Hongjian Zhou, Boyang Gu, Chenghao **

Abstract: Scheduling plays an important role in automated production. Its impact can be found in various fields such as the manufacturing industry, the service industry and the technology industry. A scheduling problem (NP-hard) is a task of finding a sequence of job assignments on a given set of machines with the goal of optimizing the objective defined. Methods such as Operation Research, Dispatching Rule… ▽ More Scheduling plays an important role in automated production. Its impact can be found in various fields such as the manufacturing industry, the service industry and the technology industry. A scheduling problem (NP-hard) is a task of finding a sequence of job assignments on a given set of machines with the goal of optimizing the objective defined. Methods such as Operation Research, Dispatching Rules, and Combinatorial Optimization have been applied to scheduling problems but no solution guarantees to find the optimal solution. The recent development of Reinforcement Learning has shown success in sequential decision-making problems. This research presents a Reinforcement Learning approach for scheduling problems. In particular, this study delivers an OpenAI gym environment with search-space reduction for Job Shop Scheduling Problems and provides a heuristic-guided Q-Learning solution with state-of-the-art performance for Multi-agent Flexible Job Shop Problems. △ Less

Submitted 7 October, 2022; originally announced October 2022.

arXiv:2209.13100 [pdf, other]

Gain without Pain: Recycling Reflected Energy from Wireless Powered RIS-aided Communications

Authors: Hao Xie, Bowen Gu, Dong Li, Zhi Lin, Yongjun Xu

Abstract: In this paper, we investigate and analyze energy recycling for a reconfigurable intelligent surface (RIS)-aided wireless-powered communication network. As opposed to the existing works where the energy harvested by Internet of things (IoT) devices only come from the power station, IoT devices are also allowed to recycle energy from other IoT devices. In particular, we propose group switching- and… ▽ More In this paper, we investigate and analyze energy recycling for a reconfigurable intelligent surface (RIS)-aided wireless-powered communication network. As opposed to the existing works where the energy harvested by Internet of things (IoT) devices only come from the power station, IoT devices are also allowed to recycle energy from other IoT devices. In particular, we propose group switching- and user switching-based protocols with time-division multiple access to evaluate the impact of energy recycling on system performance. Two different optimization problems are respectively formulated for maximizing the sum throughput by jointly optimizing the energy beamforming vectors, the transmit power, the transmission time, the receive beamforming vectors, the grou** factors, and the phase-shift matrices, where the constraints of the minimum throughput, the harvested energy, the maximum transmit power, the phase shift, the grou**, and the time allocation are taken into account. In light of the intractability of the above problems, we respectively develop two alternating optimization-based iterative algorithms by combining the successive convex approximation method and the penalty-based method to obtain corresponding sub-optimal solutions. Simulation results verify that the energy recycling-based mechanism can assist in enhancing the performance of IoT devices in terms of energy harvesting and information transmission. Besides, we also verify that the group switching-based algorithm can improve more sum throughput of IoT devices, and the user switching-based algorithm can harvest more energy. △ Less

Submitted 26 September, 2022; originally announced September 2022.

arXiv:2206.02507 [pdf, other]

Learning to Control under Time-Varying Environment

Authors: Yuzhen Han, Ruben Solozabal, **g Dong, Xingyu Zhou, Martin Takac, Bin Gu

Abstract: This paper investigates the problem of regret minimization in linear time-varying (LTV) dynamical systems. Due to the simultaneous presence of uncertainty and non-stationarity, designing online control algorithms for unknown LTV systems remains a challenging task. At a cost of NP-hard offline planning, prior works have introduced online convex optimization algorithms, although they suffer from non… ▽ More This paper investigates the problem of regret minimization in linear time-varying (LTV) dynamical systems. Due to the simultaneous presence of uncertainty and non-stationarity, designing online control algorithms for unknown LTV systems remains a challenging task. At a cost of NP-hard offline planning, prior works have introduced online convex optimization algorithms, although they suffer from nonparametric rate of regret. In this paper, we propose the first computationally tractable online algorithm with regret guarantees that avoids offline planning over the state linear feedback policies. Our algorithm is based on the optimism in the face of uncertainty (OFU) principle in which we optimistically select the best model in a high confidence region. Our algorithm is then more explorative when compared to previous approaches. To overcome non-stationarity, we propose either a restarting strategy (R-OFU) or a sliding window (SW-OFU) strategy. With proper configuration, our algorithm is attains sublinear regret $O(T^{2/3})$. These algorithms utilize data from the current phase for tracking variations on the system dynamics. We corroborate our theoretical findings with numerical experiments, which highlight the effectiveness of our methods. To the best of our knowledge, our study establishes the first model-based online algorithm with regret guarantees under LTV dynamical systems. △ Less

Submitted 6 June, 2022; originally announced June 2022.

arXiv:2205.10741 [pdf, other]

doi 10.1109/TCOMM.2023.3277519

Exploiting Constructive Interference for Backscatter Communication Systems

Authors: Bowen Gu, Dong Li, Ye Liu, Yongjun Xu

Abstract: Backscatter communication (BackCom), one of the core technologies to realize zero-power communication, is expected to be a pivotal paradigm for the next generation of the Internet of Things (IoT). However, the "strong" direct link (DL) interference (DLI) is traditionally assumed to be harmful, and generally drowns out the "weak" backscattered signals accordingly, thus deteriorating the performance… ▽ More Backscatter communication (BackCom), one of the core technologies to realize zero-power communication, is expected to be a pivotal paradigm for the next generation of the Internet of Things (IoT). However, the "strong" direct link (DL) interference (DLI) is traditionally assumed to be harmful, and generally drowns out the "weak" backscattered signals accordingly, thus deteriorating the performance of BackCom. In contrast to the previous efforts to eliminate the DLI, in this paper, we exploit the constructive interference (CI), in which the DLI contributes to the backscattered signal. To be specific, our objective is to maximize the received signal power by jointly optimizing the receive beamforming vectors and tag selection factors, which is, however, non-convex and difficult to solve due to constraints on the Kullback-Leibler (KL) divergence. In order to solve this problem, we first decompose the original problem, and then propose two algorithms to solve the sub-problem with beamforming design via a change of variables and semi-definite programming (SDP) and a greedy algorithm to solve the sub-problem with tag selection. In order to gain insight into the CI, we consider a special case with the single-antenna reader to reveal the channel angle between the backscattering link (BL) and the DL, in which the DLI will become constructive. Simulation results show that significant performance gain can always be achieved in the proposed algorithms compared with the traditional algorithms without the DL in terms of the strength of the received signal. The derived constructive channel angle for the BackCom system with the single-antenna reader is also confirmed by simulation results. △ Less

Submitted 12 May, 2023; v1 submitted 22 May, 2022; originally announced May 2022.

arXiv:2109.07711 [pdf]

doi 10.1109/JBHI.2022.3181791

DeepMTS: Deep Multi-task Learning for Survival Prediction in Patients with Advanced Nasopharyngeal Carcinoma using Pretreatment PET/CT

Authors: Mingyuan Meng, Bingxin Gu, Lei Bi, Shaoli Song, David Dagan Feng, **man Kim

Abstract: Nasopharyngeal Carcinoma (NPC) is a malignant epithelial cancer arising from the nasopharynx. Survival prediction is a major concern for NPC patients, as it provides early prognostic information to plan treatments. Recently, deep survival models based on deep learning have demonstrated the potential to outperform traditional radiomics-based survival prediction models. Deep survival models usually… ▽ More Nasopharyngeal Carcinoma (NPC) is a malignant epithelial cancer arising from the nasopharynx. Survival prediction is a major concern for NPC patients, as it provides early prognostic information to plan treatments. Recently, deep survival models based on deep learning have demonstrated the potential to outperform traditional radiomics-based survival prediction models. Deep survival models usually use image patches covering the whole target regions (e.g., nasopharynx for NPC) or containing only segmented tumor regions as the input. However, the models using the whole target regions will also include non-relevant background information, while the models using segmented tumor regions will disregard potentially prognostic information existing out of primary tumors (e.g., local lymph node metastasis and adjacent tissue invasion). In this study, we propose a 3D end-to-end Deep Multi-Task Survival model (DeepMTS) for joint survival prediction and tumor segmentation in advanced NPC from pretreatment PET/CT. Our novelty is the introduction of a hard-sharing segmentation backbone to guide the extraction of local features related to the primary tumors, which reduces the interference from non-relevant background information. In addition, we also introduce a cascaded survival network to capture the prognostic information existing out of primary tumors and further leverage the global tumor information (e.g., tumor size, shape, and locations) derived from the segmentation backbone. Our experiments with two clinical datasets demonstrate that our DeepMTS can consistently outperform traditional radiomics-based survival prediction models and existing deep survival models. △ Less

Submitted 7 June, 2022; v1 submitted 16 September, 2021; originally announced September 2021.

Comments: Accepted at IEEE Journal of Biomedical and Health Informatics (JBHI)

Journal ref: IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 9, pp. 4497-4507, 2022

arXiv:2106.16153 [pdf, other]

Multi-Modal Chorus Recognition for Improving Song Search

Authors: Jiaan Wang, Zhixu Li, Binbin Gu, Tingyi Zhang, Qingsheng Liu, Zhigang Chen

Abstract: We discuss a novel task, Chorus Recognition, which could potentially benefit downstream tasks such as song search and music summarization. Different from the existing tasks such as music summarization or lyrics summarization relying on single-modal information, this paper models chorus recognition as a multi-modal one by utilizing both the lyrics and the tune information of songs. We propose a mul… ▽ More We discuss a novel task, Chorus Recognition, which could potentially benefit downstream tasks such as song search and music summarization. Different from the existing tasks such as music summarization or lyrics summarization relying on single-modal information, this paper models chorus recognition as a multi-modal one by utilizing both the lyrics and the tune information of songs. We propose a multi-modal Chorus Recognition model that considers diverse features. Besides, we also create and publish the first Chorus Recognition dataset containing 627 songs for public use. Our empirical study performed on the dataset demonstrates that our approach outperforms several baselines in chorus recognition. In addition, our approach also helps to improve the accuracy of its downstream task - song search by more than 10.6%. △ Less

Submitted 27 June, 2021; originally announced June 2021.

Comments: Accepted at the 30th International Conference on Artificial Neural Networks (ICANN 2021)

arXiv:2106.08637 [pdf]

Topic Classification on Spoken Documents Using Deep Acoustic and Linguistic Features

Authors: Tan Liu, Wu Guo, Bin Gu

Abstract: Topic classification systems on spoken documents usually consist of two modules: an automatic speech recognition (ASR) module to convert speech into text and a text topic classification (TTC) module to predict the topic class from the decoded text. In this paper, instead of using the ASR transcripts, the fusion of deep acoustic and linguistic features is used for topic classification on spoken doc… ▽ More Topic classification systems on spoken documents usually consist of two modules: an automatic speech recognition (ASR) module to convert speech into text and a text topic classification (TTC) module to predict the topic class from the decoded text. In this paper, instead of using the ASR transcripts, the fusion of deep acoustic and linguistic features is used for topic classification on spoken documents. More specifically, a conventional CTC-based acoustic model (AM) using phonemes as output units is first trained, and the outputs of the layer before the linear phoneme classifier in the trained AM are used as the deep acoustic features of spoken documents. Furthermore, these deep acoustic features are fed to a phoneme-to-word (P2W) module to obtain deep linguistic features. Finally, a local multi-head attention module is proposed to fuse these two types of deep features for topic classification. Experiments conducted on a subset selected from Switchboard corpus show that our proposed framework outperforms the conventional ASR+TTC systems and achieves a 3.13% improvement in ACC. △ Less

Submitted 16 June, 2021; originally announced June 2021.

arXiv:2104.00230 [pdf, other]

Bidirectional Multiscale Feature Aggregation for Speaker Verification

Authors: Jiajun Qi, Wu Guo, Bin Gu

Abstract: In this paper, we propose a novel bidirectional multiscale feature aggregation (BMFA) network with attentional fusion modules for text-independent speaker verification. The feature maps from different stages of the backbone network are iteratively combined and refined in both a bottom-up and top-down manner. Furthermore, instead of simple concatenation or element-wise addition of feature maps from… ▽ More In this paper, we propose a novel bidirectional multiscale feature aggregation (BMFA) network with attentional fusion modules for text-independent speaker verification. The feature maps from different stages of the backbone network are iteratively combined and refined in both a bottom-up and top-down manner. Furthermore, instead of simple concatenation or element-wise addition of feature maps from different stages, an attentional fusion module is designed to compute the fusion weights. Experiments are conducted on the NIST SRE16 and VoxCeleb1 datasets. The experimental results demonstrate the effectiveness of the bidirectional aggregation strategy and show that the proposed attentional fusion module can further improve the performance. △ Less

Submitted 31 March, 2021; originally announced April 2021.

arXiv:2103.15421 [pdf, other]

Improved Meta-Learning Training for Speaker Verification

Authors: Yafeng Chen, Wu Guo, Bin Gu

Abstract: Meta-learning has recently become a research hotspot in speaker verification (SV). We introduce two methods to improve the meta-learning training for SV in this paper. For the first method, a backbone embedding network is first jointly trained with the conventional cross entropy loss and prototypical networks (PN) loss. Then, inspired by speaker adaptive training in speech recognition, additional… ▽ More Meta-learning has recently become a research hotspot in speaker verification (SV). We introduce two methods to improve the meta-learning training for SV in this paper. For the first method, a backbone embedding network is first jointly trained with the conventional cross entropy loss and prototypical networks (PN) loss. Then, inspired by speaker adaptive training in speech recognition, additional transformation coefficients are trained with only the PN loss. The transformation coefficients are used to modify the original backbone embedding network in the x-vector extraction process. Furthermore, the random erasing data augmentation technique is applied to all support samples in each episode to construct positive pairs, and a contrastive loss between the augmented and the original support samples is added to the objective in model training. Experiments are carried out on the SITW and VOiCES databases. Both of the methods can obtain consistent improvements over existing meta-learning training frameworks. By combining these two methods, we can observe further improvements on these two databases. △ Less

Submitted 2 August, 2023; v1 submitted 29 March, 2021; originally announced March 2021.

arXiv:2103.05220 [pdf]

doi 10.3389/fonc.2022.899351

Prediction of 5-year Progression-Free Survival in Advanced Nasopharyngeal Carcinoma with Pretreatment PET/CT using Multi-Modality Deep Learning-based Radiomics

Authors: Bingxin Gu, Mingyuan Meng, Lei Bi, **man Kim, David Dagan Feng, Shaoli Song

Abstract: Objective: Deep Learning-based Radiomics (DLR) has achieved great success in medical image analysis and has been considered a replacement for conventional radiomics that relies on handcrafted features. In this study, we aimed to explore the capability of DLR for the prediction of 5-year Progression-Free Survival (PFS) in Nasopharyngeal Carcinoma (NPC) using pretreatment PET/CT. Methods: A total of… ▽ More Objective: Deep Learning-based Radiomics (DLR) has achieved great success in medical image analysis and has been considered a replacement for conventional radiomics that relies on handcrafted features. In this study, we aimed to explore the capability of DLR for the prediction of 5-year Progression-Free Survival (PFS) in Nasopharyngeal Carcinoma (NPC) using pretreatment PET/CT. Methods: A total of 257 patients (170/87 in internal/external cohorts) with advanced NPC (TNM stage III or IVa) were enrolled. We developed an end-to-end multi-modality DLR model, in which a 3D convolutional neural network was optimized to extract deep features from pretreatment PET/CT images and predict the probability of 5-year PFS. TNM stage, as a high-level clinical feature, could be integrated into our DLR model to further improve the prognostic performance. To compare conventional radiomics and DLR, 1456 handcrafted features were extracted, and optimal conventional radiomics methods were selected from 54 cross-combinations of 6 feature selection methods and 9 classification methods. In addition, risk group stratification was performed with clinical signature, conventional radiomics signature, and DLR signature. Results: Our multi-modality DLR model using both PET and CT achieved higher prognostic performance than the optimal conventional radiomics method. Furthermore, the multi-modality DLR model outperformed single-modality DLR models using only PET or only CT. For risk group stratification, the conventional radiomics signature and DLR signature enabled significant differences between the high- and low-risk patient groups in both internal and external cohorts, while the clinical signature failed in the external cohort. Conclusion: Our study identified potential prognostic tools for survival prediction in advanced NPC, suggesting that DLR could provide complementary values to the current TNM staging. △ Less

Submitted 4 July, 2022; v1 submitted 8 March, 2021; originally announced March 2021.

Comments: Accepted at Frontiers in Oncology

Journal ref: Frontiers in Oncology, vol. 12, pp. 899352, 2022

arXiv:2002.06049 [pdf]

An Adaptive X-vector Model for Text-independent Speaker Verification

Authors: Bin Gu, Wu Guo, Lirong Dai, Jun Du

Abstract: In this paper, adaptive mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification. First, adaptive convolutional neural networks (ACNNs) are employed in frame-level embedding layers, where the parameters of the convolution filters are adjusted based on the input features. Compared with conventional CNNs, ACNNs have more flexibility in cap… ▽ More In this paper, adaptive mechanisms are applied in deep neural network (DNN) training for x-vector-based text-independent speaker verification. First, adaptive convolutional neural networks (ACNNs) are employed in frame-level embedding layers, where the parameters of the convolution filters are adjusted based on the input features. Compared with conventional CNNs, ACNNs have more flexibility in capturing speaker information. Moreover, we replace conventional batch normalization (BN) with adaptive batch normalization (ABN). By dynamically generating the scaling and shifting parameters in BN, ABN adapts models to the acoustic variability arising from various factors such as channel and environmental noises. Finally, we incorporate these two methods to further improve performance. Experiments are carried out on the speaker in the wild (SITW) and VOiCES databases. The results demonstrate that the proposed methods significantly outperform the original x-vector approach. △ Less

Submitted 14 February, 2020; originally announced February 2020.

Comments: 6 pages, 3 figures

arXiv:2001.04585 [pdf]

Gaussian speaker embedding learning for text-independent speaker verification

Authors: Bin Gu, Wu Guo

Abstract: The x-vector maps segments of arbitrary duration to vectors of fixed dimension using deep neural network. Combined with the probabilistic linear discriminant analysis (PLDA) backend, the x-vector/PLDA has become the dominant framework in text-independent speaker verification. Nevertheless, how to extract the x-vector appropriate for the PLDA backend is a key problem. In this paper, we propose a Ga… ▽ More The x-vector maps segments of arbitrary duration to vectors of fixed dimension using deep neural network. Combined with the probabilistic linear discriminant analysis (PLDA) backend, the x-vector/PLDA has become the dominant framework in text-independent speaker verification. Nevertheless, how to extract the x-vector appropriate for the PLDA backend is a key problem. In this paper, we propose a Gaussian noise constrained network (GNCN) to extract xvector, which adopts a multi-task learning strategy with the primary task classifying the speakers and the auxiliary task just fitting the Gaussian noises. Experiments are carried out using the SITW database. The results demonstrate the effectiveness of our proposed method △ Less

Submitted 13 January, 2020; originally announced January 2020.

Comments: 5 pages, 3 figures

arXiv:2001.04584 [pdf]

An Improved Deep Neural Network for Modeling Speaker Characteristics at Different Temporal Scales

Authors: Bin Gu, Wu Guo

Abstract: This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) Multi-scale convolution (MSCNN) is adopted in frame-level layers to capture complementary speaker information in different receptive fields. (2) A Baum-Welch statistics attention (BWS… ▽ More This paper presents an improved deep embedding learning method based on convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) Multi-scale convolution (MSCNN) is adopted in frame-level layers to capture complementary speaker information in different receptive fields. (2) A Baum-Welch statistics attention (BWSA) mechanism is applied in pooling-layer, which can integrate more useful long-term speaker characteristics in the temporal pooling layer. Experiments are carried out on the NIST SRE16 evaluation set. The results demonstrate the effectiveness of MSCNN and show the proposed BWSA can further improve the performance of the DNN embedding system △ Less

Submitted 13 January, 2020; originally announced January 2020.

Comments: 5 pages,2 figures

arXiv:1903.12428 [pdf]

USTCSpeech System for VOiCES from a Distance Challenge 2019

Authors: Lanhua You, Bin Gu, Wu Guo

Abstract: This document describes the speaker verification systems developed in the Speech lab at the University of Science and Technology of China (USTC) for the VOiCES from a Distance Challenge 2019. We develop the system for the Fixed Condition on two public corpus, VoxCeleb and SITW. The frameworks of our systems are based on the mainstream ivector/PLDA and x-vector/PLDA algorithms. This document describes the speaker verification systems developed in the Speech lab at the University of Science and Technology of China (USTC) for the VOiCES from a Distance Challenge 2019. We develop the system for the Fixed Condition on two public corpus, VoxCeleb and SITW. The frameworks of our systems are based on the mainstream ivector/PLDA and x-vector/PLDA algorithms. △ Less

Submitted 29 March, 2019; originally announced March 2019.

arXiv:1207.0922 [pdf, ps, other]

MDM: A Mode Diagram Modeling Framework for Periodic Control Systems

Authors: Zheng Wang, Geguang Pu, Shenchao Qin, Jianwen Li, Kim G. Larsen, Jan Madsen, Bin Gu, Jifeng He

Abstract: Periodic control systems used in spacecrafts and automotives are usually period-driven and can be decomposed into different modes with each mode representing a system state observed from outside. Such systems may also involve intensive computing in their modes. Despite the fact that such control systems are widely used in the above-mentioned safety-critical embedded domains, there is lack of domai… ▽ More Periodic control systems used in spacecrafts and automotives are usually period-driven and can be decomposed into different modes with each mode representing a system state observed from outside. Such systems may also involve intensive computing in their modes. Despite the fact that such control systems are widely used in the above-mentioned safety-critical embedded domains, there is lack of domain-specific formal modelling languages for such systems in the relevant industry. To address this problem, we propose a formal visual modeling framework called MDM as a concise and precise way to specify and analyze such systems. To capture the temporal properties of periodic control systems, we provide, along with MDM, a property specification language based on interval logic for the description of concrete temporal requirements the engineers are concerned with. The statistical model checking technique can then be used to verify the MDM models against desired properties. To demonstrate the viability of our approach, we have applied our modelling framework to some real life case studies from industry and helped detect two design defects for some spacecraft control systems. △ Less

Submitted 4 July, 2012; originally announced July 2012.

Showing 1–25 of 25 results for author: Gu, B