Search | arXiv e-print repository

A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge

Authors: Xiaopeng Wang, Yi Lu, Xin Qi, Zhiyong Wang, Yuankun Xie, Shuchen Shi, Ruibo Fu

Abstract: This paper presents the development of a speech synthesis system for the LIMMITS'24 Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning capabilities, covering seven Indian languages with both male and female speakers. The system was trained using challenge data and fine-tuned for few-… ▽ More This paper presents the development of a speech synthesis system for the LIMMITS'24 Challenge, focusing primarily on Track 2. The objective of the challenge is to establish a multi-speaker, multi-lingual Indic Text-to-Speech system with voice cloning capabilities, covering seven Indian languages with both male and female speakers. The system was trained using challenge data and fine-tuned for few-shot voice cloning on target speakers. Evaluation included both mono-lingual and cross-lingual synthesis across all seven languages, with subjective tests assessing naturalness and speaker similarity. Our system uses the VITS2 architecture, augmented with a multi-lingual ID and a BERT model to enhance contextual language comprehension. In Track 1, where no additional data usage was permitted, our model achieved a Speaker Similarity score of 4.02. In Track 2, which allowed the use of extra data, it attained a Speaker Similarity score of 4.17. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.14931 [pdf, other]

Multi-beam Training for Near-field Communications in High-frequency Bands

Authors: Cong Zhou, Changsheng You, Zixuan Huang, Shuo Shi, Yi Gong, Chan-Byoung Chae, Kaibin Huang

Abstract: In this paper, we study efficient multi-beam training design for near-field communications to reduce the beam training overhead of conventional single-beam training methods. In particular, the array-division based multi-beam training method, which is widely used in far-field communications, cannot be directly applied to the near-field scenario, since different sub-arrays may observe different user… ▽ More In this paper, we study efficient multi-beam training design for near-field communications to reduce the beam training overhead of conventional single-beam training methods. In particular, the array-division based multi-beam training method, which is widely used in far-field communications, cannot be directly applied to the near-field scenario, since different sub-arrays may observe different user angles and there exist coverage holes in the angular domain. To address these issues, we first devise a new near-field multi-beam codebook by sparsely activating a portion of antennas to form a sparse linear array (SLA), hence generating multiple beams simultaneously by effective exploiting the near-field grating-lobs. Next, a two-stage near-field beam training method is proposed, for which several candidate user locations are identified firstly based on multi-beam swee** over time, followed by the second stage to further determine the true user location with a small number of single-beam swee**. Finally, numerical results show that our proposed multi-beam training method significantly reduces the beam training overhead of conventional single-beam training methods, yet achieving comparable rate performance in data transmission. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: In this paper, a novel near-field multi-beam training scheme is proposed by sparsely activating a portion of antennas to form a sparse linear array

arXiv:2406.10591 [pdf, other]

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

Authors: Ruibo Fu, Shuchen Shi, Hongming Guo, Tao Wang, Chunyu Qiang, Zhengqi Wen, Jianhua Tao, Xin Qi, Yi Lu, Xiaopeng Wang, Zhiyong Wang, Yukun Liu, Xuefei Liu, Shuai Zhang, Guanjun Li

Abstract: Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on… ▽ More Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on detailed and acoustically relevant textual descriptions, falls short in practical video dubbing applications. Existing datasets like AudioSet, AudioCaps, Clotho, Sound-of-Story, and WavCaps do not fully meet the requirements for real-world foley audio dubbing task. To address this, we introduce the Multi-modal Image and Narrative Text Dubbing Dataset (MINT), designed to enhance mainstream dubbing tasks such as literary story audiobooks dubbing, image/silent video dubbing. Besides, to address the limitations of existing TTA technology in understanding and planning complex prompts, a Foley Audio Content Planning, Generation, and Alignment (CPGA) framework is proposed, which includes a content planning module leveraging large language models for complex multi-modal prompts comprehension. Additionally, the training process is optimized using Proximal Policy Optimization based reinforcement learning, significantly improving the alignment and auditory realism of generated foley audio. Experimental results demonstrate that our approach significantly advances the field of foley audio dubbing, providing robust solutions for the challenges of multi-modal dubbing. Even when utilizing the relatively lightweight GPT-2 model, our framework outperforms open-source multimodal large models such as LLaVA, DeepSeek-VL, and Moondream2. The dataset is available at https://github.com/borisfrb/MINT . △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.08112 [pdf, other]

Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

Authors: Yi Lu, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Zhiyong Wang, Xin Qi, Xuefei Liu, Yongwei Li, Yukun Liu, Xiaopeng Wang, Shuchen Shi

Abstract: With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to… ▽ More With the proliferation of Large Language Model (LLM) based deepfake audio, there is an urgent need for effective detection methods. Previous deepfake audio generation methods typically involve a multi-step generation process, with the final step using a vocoder to predict the waveform from handcrafted features. However, LLM-based audio is directly generated from discrete neural codecs in an end-to-end generation process, skip** the final step of vocoder processing. This poses a significant challenge for current audio deepfake detection (ADD) models based on vocoder artifacts. To effectively detect LLM-based deepfake audio, we focus on the core of the generation process, the conversion from neural codec to waveform. We propose Codecfake dataset, which is generated by seven representative neural codec methods. Experiment results show that codec-trained ADD models exhibit a 41.406% reduction in average equal error rate compared to vocoder-trained ADD models on the Codecfake test set. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH 2024. arXiv admin note: substantial text overlap with arXiv:2405.04880

arXiv:2406.04683 [pdf, other]

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

Authors: Shuchen Shi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Tao Wang, Chunyu Qiang, Yi Lu, Xin Qi, Xuefei Liu, Yukun Liu, Yongwei Li, Zhiyong Wang, Xiaopeng Wang

Abstract: Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description, playing a crucial role in media production. The text descriptions in TTA datasets lack rich variations and diversity, resulting in a drop in TTA model performance when faced with complex text. To address this issue, we propose a method called Portable Plug-in Prompt Refiner, which utilizes rich knowledge abo… ▽ More Text-to-Audio (TTA) aims to generate audio that corresponds to the given text description, playing a crucial role in media production. The text descriptions in TTA datasets lack rich variations and diversity, resulting in a drop in TTA model performance when faced with complex text. To address this issue, we propose a method called Portable Plug-in Prompt Refiner, which utilizes rich knowledge about textual descriptions inherent in large language models to effectively enhance the robustness of TTA acoustic models without altering the acoustic training set. Furthermore, a Chain-of-Thought that mimics human verification is introduced to enhance the accuracy of audio descriptions, thereby improving the accuracy of generated content in practical applications. The experiments show that our method achieves a state-of-the-art Inception Score (IS) of 8.72, surpassing AudioGen, AudioLDM and Tango. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: accepted by INTERSPEECH2024

arXiv:2406.04262 [pdf, other]

Near-field Beam Training with Sparse DFT Codebook

Authors: Cong Zhou, Chenyu Wu, Changsheng You, Shuo Shi

Abstract: Extremely large-scale array (XL-array) has emerged as one promising technology to improve the spectral efficiency and spatial resolution of future sixth generation (6G) wireless systems.The upsurge in the antenna number antennas renders communication users more likely to be located in the near-field region, which requires a more accurate spherical (instead of planar) wavefront propagation modeling… ▽ More Extremely large-scale array (XL-array) has emerged as one promising technology to improve the spectral efficiency and spatial resolution of future sixth generation (6G) wireless systems.The upsurge in the antenna number antennas renders communication users more likely to be located in the near-field region, which requires a more accurate spherical (instead of planar) wavefront propagation modeling.This inevitably incurs unaffordable beam training overhead when performing a two-dimensional (2D) beam-search in both the angular and range domains.To address this issue, we first introduce a new sparse discrete Fourier transform (DFT) codebook, which exhibits the angular periodicity in the received beam pattern at the user, which motivates us to propose a three-phase beam training scheme.Specifically, in the first phase, we utilize the sparse DFT codebook for beam swee** in an angular subspace and estimate candidate user angles according to the received beam pattern.Then, a central sub-array is activated to scan specific candidate angles for resolving the issue of angular ambiguity and identity the user angle.In the third phase, the polar-domain codebook is applied in the estimated angle to search the best effective range of the user.Finally, numerical results show that the proposed beam training scheme enabled by sparse DFT codebook achieves 98.67% reduction as compared with the exhaustive-search scheme, yet without compromising rate performance in the high signal-to-ratio (SNR) regime. △ Less

Submitted 18 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: In this paper, we propose a novel sparse DFT codebook to reduce near-field beam training overhead, which is equivalent to sparsely activating the dense array

arXiv:2406.03247 [pdf, other]

Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection

Authors: Xiaopeng Wang, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Yuankun Xie, Yukun Liu, Jianhua Tao, Xuefei Liu, Yongwei Li, Xin Qi, Yi Lu, Shuchen Shi

Abstract: The generalization of Fake Audio Detection (FAD) is critical due to the emergence of new spoofing techniques. Traditional FAD methods often focus solely on distinguishing between genuine and known spoofed audio. We propose a Genuine-Focused Learning (GFL) framework guided, aiming for highly generalized FAD, called GFL-FAD. This method incorporates a Counterfactual Reasoning Enhanced Representation… ▽ More The generalization of Fake Audio Detection (FAD) is critical due to the emergence of new spoofing techniques. Traditional FAD methods often focus solely on distinguishing between genuine and known spoofed audio. We propose a Genuine-Focused Learning (GFL) framework guided, aiming for highly generalized FAD, called GFL-FAD. This method incorporates a Counterfactual Reasoning Enhanced Representation (CRER) based on audio reconstruction using the Mask AutoEncoder (MAE) architecture to accurately model genuine audio features. To reduce the influence of spoofed audio during training, we introduce a genuine audio reconstruction loss, maintaining the focus on learning genuine data features. In addition, content-related bottleneck (BN) features are extracted from the MAE to supplement the knowledge of the original audio. These BN features are adaptively fused with CRER to further improve robustness. Our method achieves state-of-the-art performance with an EER of 0.25% on ASVspoof2019 LA. △ Less

Submitted 9 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH 2024

arXiv:2406.03237 [pdf, other]

Generalized Fake Audio Detection via Deep Stable Learning

Authors: Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Yuankun Xie, Yukun Liu, Xiaopeng Wang, Xuefei Liu, Yongwei Li, Jianhua Tao, Yi Lu, Xin Qi, Shuchen Shi

Abstract: Although current fake audio detection approaches have achieved remarkable success on specific datasets, they often fail when evaluated with datasets from different distributions. Previous studies typically address distribution shift by focusing on using extra data or applying extra loss restrictions during training. However, these methods either require a substantial amount of data or complicate t… ▽ More Although current fake audio detection approaches have achieved remarkable success on specific datasets, they often fail when evaluated with datasets from different distributions. Previous studies typically address distribution shift by focusing on using extra data or applying extra loss restrictions during training. However, these methods either require a substantial amount of data or complicate the training process. In this work, we propose a stable learning-based training scheme that involves a Sample Weight Learning (SWL) module, addressing distribution shift by decorrelating all selected features via learning weights from training samples. The proposed portable plug-in-like SWL is easy to apply to multiple base models and generalizes them without using extra data during training. Experiments conducted on the ASVspoof datasets clearly demonstrate the effectiveness of SWL in generalizing different models across three evaluation datasets from different distributions. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: accepted by INTERSPEECH2024

arXiv:2405.04066 [pdf, other]

Characterizing Regional Importance in Cities with Human Mobility Motifs in Metro Networks

Authors: Shuyang Shi, Ding Lyu, Lin Wang, Xiaofan Wang, Guanrong Chen

Abstract: Uncovering higher-order spatiotemporal dependencies within human mobility networks offers valuable insights into the analysis of urban structures. In most existing studies, human mobility networks are typically constructed by aggregating all trips without distinguishing who takes which specific trip. Instead, we claim individual mobility motifs, higher-order structures generated by daily trips of… ▽ More Uncovering higher-order spatiotemporal dependencies within human mobility networks offers valuable insights into the analysis of urban structures. In most existing studies, human mobility networks are typically constructed by aggregating all trips without distinguishing who takes which specific trip. Instead, we claim individual mobility motifs, higher-order structures generated by daily trips of people, as fundamental units of human mobility networks. In this paper, we propose two network construction frameworks at the level of mobility motifs in characterizing regional importance in cities. Firstly, we enhance the structural dependencies within mobility motifs and proceed to construct mobility networks based on the enhanced mobility motifs. Secondly, taking inspiration from PageRank, we speculate that people would allocate values of importance to destinations according to their trip intentions. A motif-wise network construction framework is proposed based on the established mechanism. Leveraging large-scale metro data across cities, we construct three types of human mobility networks and characterize the regional importance by node importance indicators. Our comparison results suggest that the motif-based mobility network outperforms the classic mobility network, thus highlighting the efficacy of the introduced human mobility motifs. Finally, we demonstrate that the performance in characterizing the regional importance is significantly improved by our motif-wise framework. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.13786 [pdf, other]

Soar: Design and Deployment of A Smart Roadside Infrastructure System for Autonomous Driving

Authors: Shuyao Shi, Neiwen Ling, Zhehao Jiang, Xuan Huang, Yuze He, Xiaoguang Zhao, Bufang Yang, Chen Bian, **gfei Xia, Zhenyu Yan, Raymond Yeung, Guoliang Xing

Abstract: Recently,smart roadside infrastructure (SRI) has demonstrated the potential of achieving fully autonomous driving systems. To explore the potential of infrastructure-assisted autonomous driving, this paper presents the design and deployment of Soar, the first end-to-end SRI system specifically designed to support autonomous driving systems. Soar consists of both software and hardware components ca… ▽ More Recently,smart roadside infrastructure (SRI) has demonstrated the potential of achieving fully autonomous driving systems. To explore the potential of infrastructure-assisted autonomous driving, this paper presents the design and deployment of Soar, the first end-to-end SRI system specifically designed to support autonomous driving systems. Soar consists of both software and hardware components carefully designed to overcome various system and physical challenges. Soar can leverage the existing operational infrastructure like street lampposts for a lower barrier of adoption. Soar adopts a new communication architecture that comprises a bi-directional multi-hop I2I network and a downlink I2V broadcast service, which are designed based on off-the-shelf 802.11ac interfaces in an integrated manner. Soar also features a hierarchical DL task management framework to achieve desirable load balancing among nodes and enable them to collaborate efficiently to run multiple data-intensive autonomous driving applications. We deployed a total of 18 Soar nodes on existing lampposts on campus, which have been operational for over two years. Our real-world evaluation shows that Soar can support a diverse set of autonomous driving applications and achieve desirable real-time performance and high communication reliability. Our findings and experiences in this work offer key insights into the development and deployment of next-generation smart roadside infrastructure and autonomous driving systems. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.07620 [pdf, other]

Diffusion Probabilistic Multi-cue Level Set for Reducing Edge Uncertainty in Pancreas Segmentation

Authors: Yue Gou, Yuming Xing, Shengzhu Shi, Zhichang Guo

Abstract: Accurately segmenting the pancreas remains a huge challenge. Traditional methods encounter difficulties in semantic localization due to the small volume and distorted structure of the pancreas, while deep learning methods encounter challenges in obtaining accurate edges because of low contrast and organ overlap**. To overcome these issues, we propose a multi-cue level set method based on the dif… ▽ More Accurately segmenting the pancreas remains a huge challenge. Traditional methods encounter difficulties in semantic localization due to the small volume and distorted structure of the pancreas, while deep learning methods encounter challenges in obtaining accurate edges because of low contrast and organ overlap**. To overcome these issues, we propose a multi-cue level set method based on the diffusion probabilistic model, namely Diff-mcs. Our method adopts a coarse-to-fine segmentation strategy. We use the diffusion probabilistic model in the coarse segmentation stage, with the obtained probability distribution serving as both the initial localization and prior cues for the level set method. In the fine segmentation stage, we combine the prior cues with grayscale cues and texture cues to refine the edge by maximizing the difference between probability distributions of the cues inside and outside the level set curve. The method is validated on three public datasets and achieves state-of-the-art performance, which can obtain more accurate segmentation results with lower uncertainty segmentation edges. In addition, we conduct ablation studies and uncertainty analysis to verify that the diffusion probability model provides a more appropriate initialization for the level set method. Furthermore, when combined with multiple cues, the level set method can better obtain edges and improve the overall accuracy. Our code is available at https://github.com/GOUYUEE/Diff-mcs. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2401.05690 [pdf, other]

Sparse Array Enabled Near-Field Communications: Beam Pattern Analysis and Hybrid Beamforming Design

Authors: Cong Zhou, Changsheng You, Haodong Zhang, Li Chen, Shuo Shi

Abstract: Extremely large-scale array (XL-array) has emerged as a promising technology to enable near-field communications for achieving enhanced spectrum efficiency and spatial resolution, by drastically increasing the number of antennas. However, this also inevitably incurs higher hardware and energy cost, which may not be affordable in future wireless systems. To address this issue, we propose in this pa… ▽ More Extremely large-scale array (XL-array) has emerged as a promising technology to enable near-field communications for achieving enhanced spectrum efficiency and spatial resolution, by drastically increasing the number of antennas. However, this also inevitably incurs higher hardware and energy cost, which may not be affordable in future wireless systems. To address this issue, we propose in this paper to exploit two types of sparse arrays (SAs) for enabling near-field communications. Specifically, we first consider the linear sparse array (LSA) and characterize its near-field beam pattern. It is shown that despite the achieved beam-focusing gain, the LSA introduces several undesired grating-lobes, which have comparable beam power with the main-lobe and are focused on specific regions. An efficient hybrid beamforming design is then proposed for the LSA to deal with the potential strong inter-user interference (IUI). Next, we consider another form of SA, called extended coprime array (ECA), which is composed of two LSA subarrays with different (coprime) inter-antenna spacing. By characterizing the ECA near-field beam pattern, we show that compared with the LSA with the same array sparsity, the ECA can greatly suppress the beam power of near-field grating-lobes thanks to the offset effect of the two subarrays, albeit with a larger number of grating-lobes. This thus motivates us to propose a customized two-phase hybrid beamforming design for the ECA. Finally, numerical results are presented to demonstrate the rate performance gain of the proposed two SAs over the conventional uniform linear array (ULA). △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: In this paper, we propose to exploit sparse arrays for enabling near-field communications and characterize its unique beam pattern for facilitating its hybrid beamforming design

arXiv:2312.11255 [pdf, other]

State-action control barrier functions: Imposing safety on learning-based control with low online computational costs

Authors: Kanghui He, Shengling Shi, Ton van den Boom, Bart De Schutter

Abstract: Learning-based control with safety guarantees usually requires real-time safety certification and modifications of possibly unsafe learning-based policies. The control barrier function (CBF) method uses a safety filter containing a constrained optimization problem to produce safe policies. However, finding a valid CBF for a general nonlinear system requires a complex function parameterization, whi… ▽ More Learning-based control with safety guarantees usually requires real-time safety certification and modifications of possibly unsafe learning-based policies. The control barrier function (CBF) method uses a safety filter containing a constrained optimization problem to produce safe policies. However, finding a valid CBF for a general nonlinear system requires a complex function parameterization, which in general, makes the policy optimization problem difficult to solve in real time. For nonlinear systems with nonlinear state constraints, this paper proposes the novel concept of state-action CBFs, which not only characterize the safety at each state but also evaluate the control inputs taken at each state. State-action CBFs, in contrast to CBFs, enable a flexible parameterization, resulting in a safety filter that involves a convex quadratic optimization problem. This, in turn, significantly alleviates the online computational burden. To synthesize state-action CBFs, we propose a learning-based approach exploiting Hamilton-Jacobi reachability. The effect of learning errors on the effectiveness of state-action CBFs is addressed by constraint tightening and introducing a new concept called contractive CBFs. These contributions ensure formal safety guarantees for learned CBFs and control policies, enhancing the applicability of learning-based control in real-time scenarios. Simulation results on an inverted pendulum with elastic walls validate the proposed CBFs in terms of constraint satisfaction and CPU time. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2311.03974 [pdf, ps, other]

NOMA Enabled Multi-Access Edge Computing: A Joint MU-MIMO Precoding and Computation Offloading Design

Authors: Deyou Zhang, Meng Wang, Shuo Shi, Ming Xiao

Abstract: This letter investigates computation offloading and transmit precoding co-design for multi-access edge computing (MEC), where multiple MEC users (MUs) equipped with multiple antennas access the MEC server in a non-orthogonal multiple access manner. We aim to minimize the total energy consumption of all MUs while satisfying the latency constraints by jointly optimizing the computational frequency,… ▽ More This letter investigates computation offloading and transmit precoding co-design for multi-access edge computing (MEC), where multiple MEC users (MUs) equipped with multiple antennas access the MEC server in a non-orthogonal multiple access manner. We aim to minimize the total energy consumption of all MUs while satisfying the latency constraints by jointly optimizing the computational frequency, offloading ratio, and precoding matrix of each MU. For tractability, we first decompose the original problem into three subproblems and then solve these subproblems iteratively until convergence. Simulation results validate the convergence of the proposed method and demonstrate its superiority over baseline algorithms. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.02679 [pdf, other]

Regret Analysis of Learning-Based Linear Quadratic Gaussian Control with Additive Exploration

Authors: Archith Athrey, Othmane Mazhar, Meichen Guo, Bart De Schutter, Shengling Shi

Abstract: In this paper, we analyze the regret incurred by a computationally efficient exploration strategy, known as naive exploration, for controlling unknown partially observable systems within the Linear Quadratic Gaussian (LQG) framework. We introduce a two-phase control algorithm called LQG-NAIVE, which involves an initial phase of injecting Gaussian input signals to obtain a system model, followed by… ▽ More In this paper, we analyze the regret incurred by a computationally efficient exploration strategy, known as naive exploration, for controlling unknown partially observable systems within the Linear Quadratic Gaussian (LQG) framework. We introduce a two-phase control algorithm called LQG-NAIVE, which involves an initial phase of injecting Gaussian input signals to obtain a system model, followed by a second phase of an interplay between naive exploration and control in an episodic fashion. We show that LQG-NAIVE achieves a regret growth rate of $\tilde{\mathcal{O}}(\sqrt{T})$, i.e., $\mathcal{O}(\sqrt{T})$ up to logarithmic factors after $T$ time steps, and we validate its performance through numerical simulations. Additionally, we propose LQG-IF2E, which extends the exploration signal to a `closed-loop' setting by incorporating the Fisher Information Matrix (FIM). We provide compelling numerical evidence of the competitive performance of LQG-IF2E compared to LQG-NAIVE. △ Less

Submitted 24 November, 2023; v1 submitted 5 November, 2023; originally announced November 2023.

arXiv:2310.15937 [pdf, other]

A Behavioral Perspective on Models of Linear Dynamical Networks with Manifest Variables

Authors: Shengling Shi, Zhiyong Sun, Bart De Schutter

Abstract: Networks of dynamical systems play an important role in various domains and have motivated many studies on the control and analysis of linear dynamical networks. For linear network models considered in these studies, it is typically pre-determined what signal channels are inputs and what are outputs. These models do not capture the practical need to incorporate different experimental situations, w… ▽ More Networks of dynamical systems play an important role in various domains and have motivated many studies on the control and analysis of linear dynamical networks. For linear network models considered in these studies, it is typically pre-determined what signal channels are inputs and what are outputs. These models do not capture the practical need to incorporate different experimental situations, where different selections of input and output channels are applied to the same network. Moreover, a unified view of different network models is lacking. This work makes an initial step towards addressing the above issues by taking a behavioral perspective, where input and output channels are not pre-determined. The focus of this work is on behavioral network models with only external variables. By exploiting the concept of hypergraphs, novel dual graphical representations, called system graphs and signal graphs, are introduced for behavioral networks. Moreover, connections between behavioral network models and structural vector autoregressive models are established. In addition to their connections in graphical representations, it is shown that the regularity of interconnections is an essential assumption when choosing a structural vector autoregressive model. △ Less

Submitted 5 May, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

arXiv:2309.00223 [pdf, other]

The FruitShell French synthesis system at the Blizzard 2023 Challenge

Authors: Xin Qi, Xiaopeng Wang, Zhiyong Wang, Wang Liu, Mingming Ding, Shuchen Shi

Abstract: This paper presents a French text-to-speech synthesis system for the Blizzard Challenge 2023. The challenge consists of two tasks: generating high-quality speech from female speakers and generating speech that closely resembles specific individuals. Regarding the competition data, we conducted a screening process to remove missing or erroneous text data. We organized all symbols except for phoneme… ▽ More This paper presents a French text-to-speech synthesis system for the Blizzard Challenge 2023. The challenge consists of two tasks: generating high-quality speech from female speakers and generating speech that closely resembles specific individuals. Regarding the competition data, we conducted a screening process to remove missing or erroneous text data. We organized all symbols except for phonemes and eliminated symbols that had no pronunciation or zero duration. Additionally, we added word boundary and start/end symbols to the text, which we have found to improve speech quality based on our previous experience. For the Spoke task, we performed data augmentation according to the competition rules. We used an open-source G2P model to transcribe the French texts into phonemes. As the G2P model uses the International Phonetic Alphabet (IPA), we applied the same transcription process to the provided competition data for standardization. However, due to compiler limitations in recognizing special symbols from the IPA chart, we followed the rules to convert all phonemes into the phonetic scheme used in the competition data. Finally, we resampled all competition audio to a uniform sampling rate of 16 kHz. We employed a VITS-based acoustic model with the hifigan vocoder. For the Spoke task, we trained a multi-speaker model and incorporated speaker information into the duration predictor, vocoder, and flow layers of the model. The evaluation results of our system showed a quality MOS score of 3.6 for the Hub task and 3.4 for the Spoke task, placing our system at an average level among all participating teams. △ Less

Submitted 31 August, 2023; originally announced September 2023.

arXiv:2307.03423 [pdf, other]

Hyperspectral and Multispectral Image Fusion Using the Conditional Denoising Diffusion Probabilistic Model

Authors: Shuaikai Shi, Lijun Zhang, Jie Chen

Abstract: Hyperspectral images (HSI) have a large amount of spectral information reflecting the characteristics of matter, while their spatial resolution is low due to the limitations of imaging technology. Complementary to this are multispectral images (MSI), e.g., RGB images, with high spatial resolution but insufficient spectral bands. Hyperspectral and multispectral image fusion is a technique for acqui… ▽ More Hyperspectral images (HSI) have a large amount of spectral information reflecting the characteristics of matter, while their spatial resolution is low due to the limitations of imaging technology. Complementary to this are multispectral images (MSI), e.g., RGB images, with high spatial resolution but insufficient spectral bands. Hyperspectral and multispectral image fusion is a technique for acquiring ideal images that have both high spatial and high spectral resolution cost-effectively. Many existing HSI and MSI fusion algorithms rely on known imaging degradation models, which are often not available in practice. In this paper, we propose a deep fusion method based on the conditional denoising diffusion probabilistic model, called DDPM-Fus. Specifically, the DDPM-Fus contains the forward diffusion process which gradually adds Gaussian noise to the high spatial resolution HSI (HrHSI) and another reverse denoising process which learns to predict the desired HrHSI from its noisy version conditioning on the corresponding high spatial resolution MSI (HrMSI) and low spatial resolution HSI (LrHSI). Once the training is completes, the proposed DDPM-Fus implements the reverse process on the test HrMSI and LrHSI to generate the fused HrHSI. Experiments conducted on one indoor and two remote sensing datasets show the superiority of the proposed model when compared with other advanced deep learningbased fusion methods. The codes of this work will be opensourced at this address: https://github.com/shuaikaishi/DDPMFus for reproducibility. △ Less

Submitted 7 July, 2023; originally announced July 2023.

arXiv:2307.03413 [pdf, other]

Unsupervised Hyperspectral and Multispectral Images Fusion Based on the Cycle Consistency

Authors: Shuaikai Shi, Lijun Zhang, Yoann Altmann, Jie Chen

Abstract: Hyperspectral images (HSI) with abundant spectral information reflected materials property usually perform low spatial resolution due to the hardware limits. Meanwhile, multispectral images (MSI), e.g., RGB images, have a high spatial resolution but deficient spectral signatures. Hyperspectral and multispectral image fusion can be cost-effective and efficient for acquiring both high spatial resolu… ▽ More Hyperspectral images (HSI) with abundant spectral information reflected materials property usually perform low spatial resolution due to the hardware limits. Meanwhile, multispectral images (MSI), e.g., RGB images, have a high spatial resolution but deficient spectral signatures. Hyperspectral and multispectral image fusion can be cost-effective and efficient for acquiring both high spatial resolution and high spectral resolution images. Many of the conventional HSI and MSI fusion algorithms rely on known spatial degradation parameters, i.e., point spread function, spectral degradation parameters, spectral response function, or both of them. Another class of deep learning-based models relies on the ground truth of high spatial resolution HSI and needs large amounts of paired training images when working in a supervised manner. Both of these models are limited in practical fusion scenarios. In this paper, we propose an unsupervised HSI and MSI fusion model based on the cycle consistency, called CycFusion. The CycFusion learns the domain transformation between low spatial resolution HSI (LrHSI) and high spatial resolution MSI (HrMSI), and the desired high spatial resolution HSI (HrHSI) are considered to be intermediate feature maps in the transformation networks. The CycFusion can be trained with the objective functions of marginal matching in single transform and cycle consistency in double transforms. Moreover, the estimated PSF and SRF are embedded in the model as the pre-training weights, which further enhances the practicality of our proposed model. Experiments conducted on several datasets show that our proposed model outperforms all compared unsupervised fusion methods. The codes of this paper will be available at this address: https: //github.com/shuaikaishi/CycFusion for reproducibility. △ Less

Submitted 7 July, 2023; originally announced July 2023.

arXiv:2306.15723 [pdf, other]

Approximate Dynamic Programming for Constrained Piecewise Affine Systems with Stability and Safety Guarantees

Authors: Kanghui He, Shengling Shi, Ton van den Boom, Bart De Schutter

Abstract: Infinite-horizon optimal control of constrained piecewise affine (PWA) systems has been approximately addressed by hybrid model predictive control (MPC), which, however, has computational limitations, both in offline design and online implementation. In this paper, we consider an alternative approach based on approximate dynamic programming (ADP), an important class of methods in reinforcement lea… ▽ More Infinite-horizon optimal control of constrained piecewise affine (PWA) systems has been approximately addressed by hybrid model predictive control (MPC), which, however, has computational limitations, both in offline design and online implementation. In this paper, we consider an alternative approach based on approximate dynamic programming (ADP), an important class of methods in reinforcement learning. We accommodate non-convex union-of-polyhedra state constraints and linear input constraints into ADP by designing PWA penalty functions. PWA function approximation is used, which allows for a mixed-integer encoding to implement ADP. The main advantage of the proposed ADP method is its online computational efficiency. Particularly, we propose two control policies, which lead to solving a smaller-scale mixed-integer linear program than conventional hybrid MPC, or a single convex quadratic program, depending on whether the policy is implicitly determined online or explicitly computed offline. We characterize the stability and safety properties of the closed-loop systems, as well as the sub-optimality of the proposed policies, by quantifying the approximation errors of value functions and policies. We also develop an offline mixed-integer linear programming-based method to certify the reliability of the proposed method. Simulation results on an inverted pendulum with elastic walls and on an adaptive cruise control problem validate the control performance in terms of constraint satisfaction and CPU time. △ Less

Submitted 6 January, 2024; v1 submitted 27 June, 2023; originally announced June 2023.

arXiv:2305.19069 [pdf, other]

doi 10.1016/j.asoc.2023.110675

Multi-source adversarial transfer learning for ultrasound image segmentation with limited similarity

Authors: Yifu Zhang, Hongru Li, Tao Yang, Rui Tao, Zhengyuan Liu, Shimeng Shi, Jiansong Zhang, Ning Ma, Wu** Feng, Zhanhu Zhang, Xinyu Zhang

Abstract: Lesion segmentation of ultrasound medical images based on deep learning techniques is a widely used method for diagnosing diseases. Although there is a large amount of ultrasound image data in medical centers and other places, labeled ultrasound datasets are a scarce resource, and it is likely that no datasets are available for new tissues/organs. Transfer learning provides the possibility to solv… ▽ More Lesion segmentation of ultrasound medical images based on deep learning techniques is a widely used method for diagnosing diseases. Although there is a large amount of ultrasound image data in medical centers and other places, labeled ultrasound datasets are a scarce resource, and it is likely that no datasets are available for new tissues/organs. Transfer learning provides the possibility to solve this problem, but there are too many features in natural images that are not related to the target domain. As a source domain, redundant features that are not conducive to the task will be extracted. Migration between ultrasound images can avoid this problem, but there are few types of public datasets, and it is difficult to find sufficiently similar source domains. Compared with natural images, ultrasound images have less information, and there are fewer transferable features between different ultrasound images, which may cause negative transfer. To this end, a multi-source adversarial transfer learning network for ultrasound image segmentation is proposed. Specifically, to address the lack of annotations, the idea of adversarial transfer learning is used to adaptively extract common features between a certain pair of source and target domains, which provides the possibility to utilize unlabeled ultrasound data. To alleviate the lack of knowledge in a single source domain, multi-source transfer learning is adopted to fuse knowledge from multiple source domains. In order to ensure the effectiveness of the fusion and maximize the use of precious data, a multi-source domain independent strategy is also proposed to improve the estimation of the target domain data distribution, which further increases the learning ability of the multi-source adversarial migration learning network in multiple domains. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: Submitted to Applied Soft Computing Journal

arXiv:2305.11438 [pdf, other]

Phonetic and Prosody-aware Self-supervised Learning Approach for Non-native Fluency Scoring

Authors: Kaiqi Fu, Shaojun Gao, Shuju Shi, Xiaohai Tian, Wei Li, Zejun Ma

Abstract: Speech fluency/disfluency can be evaluated by analyzing a range of phonetic and prosodic features. Deep neural networks are commonly trained to map fluency-related features into the human scores. However, the effectiveness of deep learning-based models is constrained by the limited amount of labeled training samples. To address this, we introduce a self-supervised learning (SSL) approach that take… ▽ More Speech fluency/disfluency can be evaluated by analyzing a range of phonetic and prosodic features. Deep neural networks are commonly trained to map fluency-related features into the human scores. However, the effectiveness of deep learning-based models is constrained by the limited amount of labeled training samples. To address this, we introduce a self-supervised learning (SSL) approach that takes into account phonetic and prosody awareness for fluency scoring. Specifically, we first pre-train the model using a reconstruction loss function, by masking phones and their durations jointly on a large amount of unlabeled speech and text prompts. We then fine-tune the pre-trained model using human-annotated scoring data. Our experimental results, conducted on datasets such as Speechocean762 and our non-native datasets, show that our proposed method outperforms the baseline systems in terms of Pearson correlation coefficients (PCC). Moreover, we also conduct an ablation study to better understand the contribution of phonetic and prosody factors during the pre-training stage. △ Less

Submitted 19 May, 2023; originally announced May 2023.

arXiv:2305.10983 [pdf, other]

Assessor360: Multi-sequence Network for Blind Omnidirectional Image Quality Assessment

Authors: Tianhe Wu, Shuwei Shi, Haoming Cai, Mingdeng Cao, **g Xiao, Yinqiang Zheng, Yujiu Yang

Abstract: Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs) without relying on pristine-quality image information. It is becoming more significant with the increasing advancement of virtual reality (VR) technology. However, the quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipe… ▽ More Blind Omnidirectional Image Quality Assessment (BOIQA) aims to objectively assess the human perceptual quality of omnidirectional images (ODIs) without relying on pristine-quality image information. It is becoming more significant with the increasing advancement of virtual reality (VR) technology. However, the quality assessment of ODIs is severely hampered by the fact that the existing BOIQA pipeline lacks the modeling of the observer's browsing process. To tackle this issue, we propose a novel multi-sequence network for BOIQA called Assessor360, which is derived from the realistic multi-assessor ODI quality assessment procedure. Specifically, we propose a generalized Recursive Probability Sampling (RPS) method for the BOIQA task, combining content and details information to generate multiple pseudo-viewport sequences from a given starting point. Additionally, we design a Multi-scale Feature Aggregation (MFA) module with a Distortion-aware Block (DAB) to fuse distorted and semantic features of each viewport. We also devise Temporal Modeling Module (TMM) to learn the viewport transition in the temporal domain. Extensive experimental results demonstrate that Assessor360 outperforms state-of-the-art methods on multiple OIQA datasets. The code and models are available at https://github.com/TianheWu/Assessor360. △ Less

Submitted 10 October, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

arXiv:2305.01871 [pdf]

Convolutional neural network-based single-shot speckle tracking for x-ray phase-contrast imaging

Authors: Serena Qinyun Z. Shi, Nadav Shapira, Peter B. Noël, Sebastian Meyer

Abstract: X-ray phase-contrast imaging offers enhanced sensitivity for weakly-attenuating materials, such as breast and brain tissue, but has yet to be widely implemented clinically due to high coherence requirements and expensive x-ray optics. Speckle-based phase contrast imaging has been proposed as an affordable and simple alternative; however, obtaining high-quality phase-contrast images requires accura… ▽ More X-ray phase-contrast imaging offers enhanced sensitivity for weakly-attenuating materials, such as breast and brain tissue, but has yet to be widely implemented clinically due to high coherence requirements and expensive x-ray optics. Speckle-based phase contrast imaging has been proposed as an affordable and simple alternative; however, obtaining high-quality phase-contrast images requires accurate tracking of sample-induced speckle pattern modulations. This study introduced a convolutional neural network to accurately retrieve sub-pixel displacement fields from pairs of reference (i.e., without sample) and sample images for speckle tracking. Speckle patterns were generated utilizing an in-house wave-optical simulation tool. These images were then randomly deformed and attenuated to generate training and testing datasets. The performance of the model was evaluated and compared against conventional speckle tracking algorithms: zero-normalized cross-correlation and unified modulated pattern analysis. We demonstrate improved accuracy (1.7 times better than conventional speckle tracking), bias (2.6 times), and spatial resolution (2.3 times), as well as noise robustness, window size independence, and computational efficiency. In addition, the model was validated with a simulated geometric phantom. Thus, in this study, we propose a novel convolutional-neural-network-based speckle-tracking method with enhanced performance and robustness that offers improved alternative tracking while further expanding the potential applications of speckle-based phase contrast imaging. △ Less

Submitted 2 May, 2023; originally announced May 2023.

arXiv:2302.12511 [pdf, ps, other]

Two-Stage Hierarchical Beam Training for Near-Field Communications

Authors: Chenyu Wu, Changsheng You, Yuanwei Liu, Li Chen, Shuo Shi

Abstract: Extremely large-scale array (XL-array) has emerged as a promising technology to improve the spectrum efficiency and spatial resolution of future wireless systems. However, the huge number of antennas renders the users more likely to locate in the near-field (instead of the far-field) region of the XL-array with spherical wavefront propagation. This inevitably incurs prohibitively high beam trainin… ▽ More Extremely large-scale array (XL-array) has emerged as a promising technology to improve the spectrum efficiency and spatial resolution of future wireless systems. However, the huge number of antennas renders the users more likely to locate in the near-field (instead of the far-field) region of the XL-array with spherical wavefront propagation. This inevitably incurs prohibitively high beam training overhead since it requires a two-dimensional (2D) beam search over both the angular and distance domains. To address this issue, we propose in this paper an efficient two-stage hierarchical beam training method for near-field communications. Specifically, in the first stage, we employ the central sub-array of the XL-array to search for a coarse user direction in the angular domain with conventional far-field hierarchical codebook. Then, in the second stage, given the coarse user direction, we progressively search for the fine-grained user direction-and-distance in the polar domain with a dedicatedly designed codebook. Numerical results show that our proposed two-stage hierarchical beam training method can achieve over 99% training overhead reduction as compared to the 2D exhaustive search, yet achieving comparable rate performance. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: We proposed a novel two-stage hierarchical beam training method for near-field communication systems. This paper has been submitted to IEEE for possible publication

arXiv:2302.10444 [pdf, other]

Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring

Authors: Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee

Abstract: Recent studies on pronunciation scoring have explored the effect of introducing phone embeddings as reference pronunciation, but mostly in an implicit manner, i.e., addition or concatenation of reference phone embedding and actual pronunciation of the target phone as the phone-level pronunciation quality representation. In this paper, we propose to use linguistic-acoustic similarity to explicitly… ▽ More Recent studies on pronunciation scoring have explored the effect of introducing phone embeddings as reference pronunciation, but mostly in an implicit manner, i.e., addition or concatenation of reference phone embedding and actual pronunciation of the target phone as the phone-level pronunciation quality representation. In this paper, we propose to use linguistic-acoustic similarity to explicitly measure the deviation of non-native production from its native reference for pronunciation assessment. Specifically, the deviation is first estimated by the cosine similarity between reference phone embedding and corresponding acoustic embedding. Next, a phone-level Goodness of pronunciation (GOP) pre-training stage is introduced to guide this similarity-based learning for better initialization of the aforementioned two embeddings. Finally, a transformer-based hierarchical pronunciation scorer is used to map a sequence of phone embeddings, acoustic embeddings along with their similarity measures to predict the final utterance-level score. Experimental results on the non-native databases suggest that the proposed system significantly outperforms the baselines, where the acoustic and phone embeddings are simply added or concatenated. A further examination shows that the phone embeddings learned in the proposed approach are able to capture linguistic-acoustic attributes of native pronunciation as reference. △ Less

Submitted 13 March, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: Accepted by ICASSP 2023

arXiv:2302.09928 [pdf, other]

An ASR-free Fluency Scoring Approach with Self-Supervised Learning

Authors: Wei Liu, Kaiqi Fu, Xiaohai Tian, Shuju Shi, Wei Li, Zejun Ma, Tan Lee

Abstract: A typical fluency scoring system generally relies on an automatic speech recognition (ASR) system to obtain time stamps in input speech for either the subsequent calculation of fluency-related features or directly modeling speech fluency with an end-to-end approach. This paper describes a novel ASR-free approach for automatic fluency assessment using self-supervised learning (SSL). Specifically, w… ▽ More A typical fluency scoring system generally relies on an automatic speech recognition (ASR) system to obtain time stamps in input speech for either the subsequent calculation of fluency-related features or directly modeling speech fluency with an end-to-end approach. This paper describes a novel ASR-free approach for automatic fluency assessment using self-supervised learning (SSL). Specifically, wav2vec2.0 is used to extract frame-level speech features, followed by K-means clustering to assign a pseudo label (cluster index) to each frame. A BLSTM-based model is trained to predict an utterance-level fluency score from frame-level SSL features and the corresponding cluster indexes. Neither speech transcription nor time stamp information is required in the proposed system. It is ASR-free and can potentially avoid the ASR errors effect in practice. Experimental results carried out on non-native English databases show that the proposed approach significantly improves the performance in the "open response" scenario as compared to previous methods and matches the recently reported performance in the "read aloud" scenario. △ Less

Submitted 13 March, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: Accepted by ICASSP 2023

arXiv:2301.07876 [pdf, other]

Suboptimality analysis of receding horizon quadratic control with unknown linear systems and its applications in learning-based control

Authors: Shengling Shi, Anastasios Tsiamis, Bart De Schutter

Abstract: In this work, we aim to analyze how the trade-off between the modeling error, the terminal value function error, and the prediction horizon affects the performance of a nominal receding-horizon linear quadratic (LQ) controller. By develo** a novel perturbation result of the Riccati difference equation, a novel performance upper bound is obtained and suggests that for many cases, the prediction h… ▽ More In this work, we aim to analyze how the trade-off between the modeling error, the terminal value function error, and the prediction horizon affects the performance of a nominal receding-horizon linear quadratic (LQ) controller. By develo** a novel perturbation result of the Riccati difference equation, a novel performance upper bound is obtained and suggests that for many cases, the prediction horizon can be either one or infinity to improve the control performance, depending on the relative difference between the modeling error and the terminal value function error. The result also shows that when an infinite horizon is desired, a finite prediction horizon that is larger than the controllability index can be sufficient for achieving a near-optimal performance, revealing a close relation between the prediction horizon and controllability. The obtained suboptimality performance bound is also applied to provide novel sample complexity and regret guarantees for nominal receding-horizon LQ controllers in a learning-based setting. △ Less

Submitted 8 April, 2024; v1 submitted 18 January, 2023; originally announced January 2023.

arXiv:2209.08209 [pdf, other]

RISE-Based Adaptive Control with Mass-Inertia Parameter Estimation for Aerial Transportation of Multi-Rotor UAVs

Authors: Shuyang Shi, Yuzhu Li, Wei Dong

Abstract: This paper proposes an adaptive tracking strategy with mass-inertia estimation for aerial transportation problems of multi-rotor UAVs. The dynamic model of multi-rotor UAVs with disturbances is firstly developed with a linearly parameterized form. Subsequently, a cascade controller with the robust integral of the sign of the error (RISE) terms is applied to smooth the control inputs and address bo… ▽ More This paper proposes an adaptive tracking strategy with mass-inertia estimation for aerial transportation problems of multi-rotor UAVs. The dynamic model of multi-rotor UAVs with disturbances is firstly developed with a linearly parameterized form. Subsequently, a cascade controller with the robust integral of the sign of the error (RISE) terms is applied to smooth the control inputs and address bounded disturbances. Then, adaptive estimation laws for mass-inertia parameters are designed based on a filter operation. Such operation is introduced to extract estimation errors exploited to theoretically guarantee the finite-time (FT) convergence of estimation errors. Finally, simulations are conducted to verify the effectiveness of the designed controller. The results show that the proposed method provides better tracking and estimation performance than traditional adaptive controllers based on sliding mode control algorithms and gradient-based estimation strategies. △ Less

Submitted 16 September, 2022; originally announced September 2022.

arXiv:2208.01283 [pdf, ps, other]

doi 10.23919/cje.2022.00.093

Towards V2I Age-aware Fairness Access: A DQN Based Intelligent Vehicular Node Training and Test Method

Authors: Qiong Wu, Shuai Shi, Ziyang Wan, Qiang Fan, **yi Fan, Cui Zhang

Abstract: Vehicles on the road exchange data with base station (BS) frequently through vehicle to infrastructure (V2I) communications to ensure the normal use of vehicular applications, where the IEEE 802.11 distributed coordination function (DCF) is employed to allocate a minimum contention window (MCW) for channel access. Each vehicle may change its MCW to achieve more access opportunities at the expense… ▽ More Vehicles on the road exchange data with base station (BS) frequently through vehicle to infrastructure (V2I) communications to ensure the normal use of vehicular applications, where the IEEE 802.11 distributed coordination function (DCF) is employed to allocate a minimum contention window (MCW) for channel access. Each vehicle may change its MCW to achieve more access opportunities at the expense of others, which results in unfair communication performance. Moreover, the key access parameters MCW is the privacy information and each vehicle are not willing to share it with other vehicles. In this uncertain setting, age of information (AoI) is an important communication metric to measure the freshness of data, we design an intelligent vehicular node to learn the dynamic environment and predict the optimal MCW which can make it achieve age fairness. In order to allocate the optimal MCW for the vehicular node, we employ a learning algorithm to make a desirable decision by learning from replay history data. In particular, the algorithm is proposed by extending the traditional DQN training and testing method. Finally, by comparing with other methods, it is proved that the proposed DQN method can significantly improve the age fairness of the intelligent node. △ Less

Submitted 3 March, 2023; v1 submitted 2 August, 2022; originally announced August 2022.

Comments: This paper has been accepted by Chinese Journal of Electronics. Simulation codes have been provided at: https://github.com/qiongwu86/Age-Fairness

arXiv:2207.00792 [pdf, ps, other]

Two-Timescale Design for STAR-RIS Aided NOMA Systems

Authors: Chenyu Wu, Changsheng You, Yuanwei Liu, Shuo Shi, Marco Di Renzo

Abstract: Simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) have emerged as a promising technology for achieving full-space coverage. Prior works on STAR-RISs mostly assumed the full and instantaneous channel state information (CSI) is available, which, however, is practically difficult to obtain due to the large number of elements. To address it, we investigate STAR… ▽ More Simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) have emerged as a promising technology for achieving full-space coverage. Prior works on STAR-RISs mostly assumed the full and instantaneous channel state information (CSI) is available, which, however, is practically difficult to obtain due to the large number of elements. To address it, we investigate STAR-RIS aided NOMA systems, where two efficient two-timescale transmission protocols are proposed for different channel setups to maximize the average sum-rate. Specifically, 1) for line-of-sight (LoS) dominant channels, we propose the beamforming-then-estimate (BTE) Protocol, where the long-term STAR-RIS coefficients are optimized based on the statistical CSI, while the short-term power allocation at the base station (BS) is designed based on the effective channels; 2) for the rich scattering environment, we propose an alternative partition-then-estimate (PTE) Protocol, where the BS determines the long-term STAR-RIS surface-partition strategy; then the BS estimates the instantaneous subsurface channels and designs its power allocation and STAR-RIS phase-shifts accordingly. Simulation results validate the superiority of our proposed transmission protocols as compared to various benchmarks. It is shown that the BTE Protocol outperforms the PTE Protocol when the number of STAR-RIS elements is large and/or the LoS channel components are dominant, and vice versa. △ Less

Submitted 2 July, 2022; originally announced July 2022.

Comments: 30 pages, 10 figures

arXiv:2205.10065 [pdf, ps, other]

Approximate Dynamic Programming for Constrained Linear Systems: A Piecewise Quadratic Approximation Approach

Authors: Kanghui He, Shengling Shi, Ton van den Boom, Bart De Schutter

Abstract: Approximate dynamic programming (ADP) faces challenges in dealing with constraints in control problems. Model predictive control (MPC) is, in comparison, well-known for its accommodation of constraints and stability guarantees, although its computation is sometimes prohibitive. This paper introduces an approach combining the two methodologies to overcome their individual limitations. The predictiv… ▽ More Approximate dynamic programming (ADP) faces challenges in dealing with constraints in control problems. Model predictive control (MPC) is, in comparison, well-known for its accommodation of constraints and stability guarantees, although its computation is sometimes prohibitive. This paper introduces an approach combining the two methodologies to overcome their individual limitations. The predictive control law for constrained linear quadratic regulation (CLQR) problems has been proven to be piecewise affine (PWA) while the value function is piecewise quadratic. We exploit these formal results from MPC to design an ADP method for CLQR problems. A novel convex and piecewise quadratic neural network with a local-global architecture is proposed to provide an accurate approximation of the value function, which is used as the cost-to-go function in the online dynamic programming problem. An efficient decomposition algorithm is developed to speed up the online computation. Rigorous stability analysis of the closed-loop system is conducted for the proposed control scheme under the condition that a good approximation of the value function is achieved. Comparative simulations are carried out to demonstrate the potential of the proposed method in terms of online computation and optimality. △ Less

Submitted 6 April, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

arXiv:2204.08958 [pdf, other]

MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment

Authors: Sidi Yang, Tianhe Wu, Shuwei Shi, Shanshan Lao, Yuan Gong, Mingdeng Cao, Jiahao Wang, Yujiu Yang

Abstract: No-Reference Image Quality Assessment (NR-IQA) aims to assess the perceptual quality of images in accordance with human subjective perception. Unfortunately, existing NR-IQA methods are far from meeting the needs of predicting accurate quality scores on GAN-based distortion images. To this end, we propose Multi-dimension Attention Network for no-reference Image Quality Assessment (MANIQA) to impro… ▽ More No-Reference Image Quality Assessment (NR-IQA) aims to assess the perceptual quality of images in accordance with human subjective perception. Unfortunately, existing NR-IQA methods are far from meeting the needs of predicting accurate quality scores on GAN-based distortion images. To this end, we propose Multi-dimension Attention Network for no-reference Image Quality Assessment (MANIQA) to improve the performance on GAN-based distortion. We firstly extract features via ViT, then to strengthen global and local interactions, we propose the Transposed Attention Block (TAB) and the Scale Swin Transformer Block (SSTB). These two modules apply attention mechanisms across the channel and spatial dimension, respectively. In this multi-dimensional manner, the modules cooperatively increase the interaction among different regions of images globally and locally. Finally, a dual branch structure for patch-weighted quality prediction is applied to predict the final score depending on the weight of each patch's score. Experimental results demonstrate that MANIQA outperforms state-of-the-art methods on four standard datasets (LIVE, TID2013, CSIQ, and KADID-10K) by a large margin. Besides, our method ranked first place in the final testing phase of the NTIRE 2022 Perceptual Image Quality Assessment Challenge Track 2: No-Reference. Codes and models are available at https://github.com/IIGROUP/MANIQA. △ Less

Submitted 20 April, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

arXiv:2204.07876 [pdf, other]

Lodestar: Supporting Independent Learning and Rapid Experimentation Through Data-Driven Analysis Recommendations

Authors: Deepthi Raghunandan, Zhe Cui, Kartik Krishnan, Segen Tirfe, Shenzhi Shi, Tejaswi Darshan Shrestha, Leilani Battle, Niklas Elmqvist

Abstract: Kee** abreast of current trends, technologies, and best practices in visualization and data analysis is becoming increasingly difficult, especially for fledgling data scientists. In this paper, we propose Lodestar, an interactive computational notebook that allows users to quickly explore and construct new data science workflows by selecting from a list of automated analysis recommendations. We… ▽ More Kee** abreast of current trends, technologies, and best practices in visualization and data analysis is becoming increasingly difficult, especially for fledgling data scientists. In this paper, we propose Lodestar, an interactive computational notebook that allows users to quickly explore and construct new data science workflows by selecting from a list of automated analysis recommendations. We derive our recommendations from directed graphs of known analysis states, with two input sources: one manually curated from online data science tutorials, and another extracted through semi-automatic analysis of a corpus of over 6,000 Jupyter notebooks. We evaluate Lodestar in a formative study guiding our next set of improvements to the tool. Our results suggest that users find Lodestar useful for rapidly creating data science workflows. △ Less

Submitted 16 April, 2022; originally announced April 2022.

Comments: This paper was presented as part of the workshop called Visualization in Data Science (at ACM KDD and IEEE VIS)

arXiv:2203.09862 [pdf, other]

doi 10.1109/LCSYS.2022.3187511

Finite-sample analysis of identification of switched linear systems with arbitrary or restricted switching

Authors: Shengling Shi, Othmane Mazhar, Bart De Schutter

Abstract: For the identification of switched systems with a measured switching signal, this work aims to analyze the effect of switching strategies on the estimation error. The data for identification is assumed to be collected from globally asymptotically or marginally stable switched systems under switches that are arbitrary or subject to an average dwell time constraint. Then the switched system is estim… ▽ More For the identification of switched systems with a measured switching signal, this work aims to analyze the effect of switching strategies on the estimation error. The data for identification is assumed to be collected from globally asymptotically or marginally stable switched systems under switches that are arbitrary or subject to an average dwell time constraint. Then the switched system is estimated by the least-squares (LS) estimator. To capture the effect of the parameters of the switching strategies on the LS estimation error, finite-sample error bounds are developed in this work. The obtained error bounds show that the estimation error is logarithmic of the switching parameters when there are only stable modes; however, when there are unstable modes, the estimation error bound can increase linearly as the switching parameter changes. This suggests that in the presence of unstable modes, the switching strategy should be properly designed to avoid the significant increase of the estimation error. △ Less

Submitted 28 June, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

arXiv:2201.09095 [pdf, other]

doi 10.1109/LCSYS.2022.3171172

Excitation allocation for generic identifiability of linear dynamic networks with fixed modules

Authors: H. J. Dreef, S. Shi, X. Cheng, M. C. F. Donkers, P. M. J. Van den Hof

Abstract: Identifiability of linear dynamic networks requires the presence of a sufficient number of external excitation signals. The problem of allocating a minimal number of external signals for guaranteeing generic network identifiability has been recently addressed in the literature. Here we will extend that work by explicitly incorporating the situation that some network modules are known, and thus are… ▽ More Identifiability of linear dynamic networks requires the presence of a sufficient number of external excitation signals. The problem of allocating a minimal number of external signals for guaranteeing generic network identifiability has been recently addressed in the literature. Here we will extend that work by explicitly incorporating the situation that some network modules are known, and thus are fixed in the parametrized model set. The graphical approach introduced earlier is extended to this situation, showing that the presence of fixed modules reduces the required number of external signals. An algorithm is presented that allocates the external signals in a systematic fashion. △ Less

Submitted 13 May, 2022; v1 submitted 22 January, 2022; originally announced January 2022.

Journal ref: IEEE Control Systems Letters, Vol. 6, pp. 2587-2592, 2022

arXiv:2201.08477 [pdf, ps, other]

doi 10.1109/TSP.2022.3207269

DDPG-Driven Deep-Unfolding with Adaptive Depth for Channel Estimation with Sparse Bayesian Learning

Authors: Qiyu Hu, Shuhan Shi, Yunlong Cai, Guanding Yu

Abstract: Deep-unfolding neural networks (NNs) have received great attention since they achieve satisfactory performance with relatively low complexity. Typically, these deep-unfolding NNs are restricted to a fixed-depth for all inputs. However, the optimal number of layers required for convergence changes with different inputs. In this paper, we first develop a framework of deep deterministic policy gradie… ▽ More Deep-unfolding neural networks (NNs) have received great attention since they achieve satisfactory performance with relatively low complexity. Typically, these deep-unfolding NNs are restricted to a fixed-depth for all inputs. However, the optimal number of layers required for convergence changes with different inputs. In this paper, we first develop a framework of deep deterministic policy gradient (DDPG)-driven deep-unfolding with adaptive depth for different inputs, where the trainable parameters of deep-unfolding NN are learned by DDPG, rather than updated by the stochastic gradient descent algorithm directly. Specifically, the optimization variables, trainable parameters, and architecture of deep-unfolding NN are designed as the state, action, and state transition of DDPG, respectively. Then, this framework is employed to deal with the channel estimation problem in massive multiple-input multiple-output systems. Specifically, first of all we formulate the channel estimation problem with an off-grid basis and develop a sparse Bayesian learning (SBL)-based algorithm to solve it. Secondly, the SBL-based algorithm is unfolded into a layer-wise structure with a set of introduced trainable parameters. Thirdly, the proposed DDPG-driven deep-unfolding framework is employed to solve this channel estimation problem based on the unfolded structure of the SBL-based algorithm. To realize adaptive depth, we design the halting score to indicate when to stop, which is a function of the channel reconstruction error. Furthermore, the proposed framework is extended to realize the adaptive depth of the general deep neural networks (DNNs). Simulation results show that the proposed algorithm outperforms the conventional optimization algorithms and DNNs with fixed depth with much reduced number of layers. △ Less

Submitted 18 April, 2023; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: 16 pages, 14 figures

arXiv:2112.05612 [pdf, other]

doi 10.1109/MWC.101.2100354

Decentralized Spectrum Access System: Vision, Challenges, and a Blockchain Solution

Authors: Yang Xiao, Shanghao Shi, Wen**g Lou, Chonggang Wang, Xu Li, Ning Zhang, Y. Thomas Hou, Jeffrey H. Reed

Abstract: Spectrum access system (SAS) is widely considered the de facto solution to coordinating dynamic spectrum sharing (DSS) and protecting incumbent users. The current SAS paradigm prescribed by the FCC for the CBRS band and standardized by the WInnForum follows a centralized service model in that a spectrum user subscribes to a SAS server for spectrum allocation service. This model, however, neither t… ▽ More Spectrum access system (SAS) is widely considered the de facto solution to coordinating dynamic spectrum sharing (DSS) and protecting incumbent users. The current SAS paradigm prescribed by the FCC for the CBRS band and standardized by the WInnForum follows a centralized service model in that a spectrum user subscribes to a SAS server for spectrum allocation service. This model, however, neither tolerates SAS server failures (crash or Byzantine) nor resists dishonest SAS administrators, leading to serious concerns on SAS system reliability and trustworthiness. This is especially concerning for the evolving DSS landscape where an increasing number of SAS service providers and heterogeneous user requirements are coming up. To address these challenges, we propose a novel blockchain-based decentralized SAS architecture called BD-SAS that provides SAS services securely and efficiently, without relying on the trust of each individual SAS server for the overall system trustworthiness. In BD-SAS, a global blockchain (G-Chain) is used for spectrum regulatory compliance while smart contract-enabled local blockchains (L-Chains) are instantiated in individual spectrum zones for automating spectrum access assignment per user request. We hope our vision of a decentralized SAS, the BD-SAS architecture, and discussion on future challenges can open up a new direction towards reliable spectrum management in a decentralized manner. △ Less

Submitted 10 December, 2021; originally announced December 2021.

Comments: A version of this work has been accepted by IEEE Wireless Communications for publication

Journal ref: IEEE Wireless Communications (2022)

arXiv:2110.05443 [pdf]

doi 10.1002/mp.16805

Spatial-temporal V-Net for automatic segmentation and quantification of right ventricles in gated myocardial perfusion SPECT images

Authors: Chen Zhao, Shi Shi, Zhuo He, Cheng Wang, Zhongqiang Zhao, Xinli Li, Yanli Zhou, Weihua Zhou

Abstract: Background. Functional assessment of right ventricle (RV) using gated myocardial perfusion single-photon emission computed tomography (MPS) heavily relies on the precise extraction of right ventricular contours. In this paper, we present a new deep-learning-based model integrating both the spatial and temporal features in gated MPS images to perform the segmentation of the RV epicardium and endoca… ▽ More Background. Functional assessment of right ventricle (RV) using gated myocardial perfusion single-photon emission computed tomography (MPS) heavily relies on the precise extraction of right ventricular contours. In this paper, we present a new deep-learning-based model integrating both the spatial and temporal features in gated MPS images to perform the segmentation of the RV epicardium and endocardium. Methods. By integrating the spatial features from each cardiac frame of the gated MPS and the temporal features from the sequential cardiac frames of the gated MPS, we developed a Spatial-Temporal V-Net (ST-VNet) for automatic extraction of RV endocardial and epicardial contours. In the ST-VNet, a V-Net is employed to hierarchically extract spatial features, and convolutional long-term short-term memory (ConvLSTM) units are added to the skip-connection pathway to extract the temporal features. The input of the ST-VNet is ECG-gated sequential frames of the MPS images and the output is the probability map of the epicardial or endocardial masks. A Dice similarity coefficient (DSC) loss which penalizes the discrepancy between the model prediction and the ground truth was adopted to optimize the segmentation model. Results. Our segmentation model was trained and validated on a retrospective dataset with 45 subjects, and the cardiac cycle of each subject was divided into 8 gates. The proposed ST-VNet achieved a DSC of 0.8914 and 0.8157 for the RV epicardium and endocardium segmentation, respectively. The mean absolute error, the mean squared error, and the Pearson correlation coefficient of the RV ejection fraction (RVEF) between the ground truth and the model prediction were 0.0609, 0.0830, and 0.6985. Conclusion. Our proposed ST-VNet is an effective model for RV segmentation. It has great promise for clinical use in RV functional assessment. △ Less

Submitted 26 December, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: 15 pages, 8 figures

arXiv:2109.06574 [pdf, other]

Deep-Unfolding Neural-Network Aided Hybrid Beamforming Based on Symbol-Error Probability Minimization

Authors: S. Shi, Y. Cai, Q. Hu, B. Champagne, L. Hanzo

Abstract: In massive multiple-input multiple-output (MIMO) systems, hybrid analog-digital (AD) beamforming can be used to attain a high directional gain without requiring a dedicated radio frequency (RF) chain for each antenna element, which substantially reduces both the hardware costs and power consumption. While massive MIMO transceiver design typically relies on the conventional mean-square error (MSE)… ▽ More In massive multiple-input multiple-output (MIMO) systems, hybrid analog-digital (AD) beamforming can be used to attain a high directional gain without requiring a dedicated radio frequency (RF) chain for each antenna element, which substantially reduces both the hardware costs and power consumption. While massive MIMO transceiver design typically relies on the conventional mean-square error (MSE) criterion, directly minimizing the symbol error rate (SER) can lead to a superior performance. In this paper, we first mathematically formulate the problem of hybrid transceiver design under the minimum SER (MSER) optimization criterion and then develop a MSER-based gradient descent (GD) iterative algorithm to find the related stationary points. We then propose a deep-unfolding neural network (NN), in which the iterative GD algorithm is unfolded into a multi-layer structure wherein a set of trainable parameters are introduced for accelerating the convergence and enhancing the overall system performance. To implement the training stage, the relationship between the gradients of adjacent layers is derived based on the generalized chain rule (GCR). The deep-unfolding NN is developed for both quadrature phase shift keying (QPSK) and for $M$-ary quadrature amplitude modulated (QAM) signals and its convergence is investigated theoretically. Furthermore, we analyze the transfer capability, computational complexity, and generalization capability of the proposed deep-unfolding NN. Our simulation results show that the latter significantly outperforms its conventional counterpart at a reduced complexity. △ Less

Submitted 14 September, 2021; originally announced September 2021.

arXiv:2105.03072 [pdf, other]

NTIRE 2021 Challenge on Perceptual Image Quality Assessment

Authors: **** Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, Sungjun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, **gyu Guo, Zirui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang , et al. (25 additional authors not shown)

Abstract: This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o… ▽ More This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These output images have completely different characteristics from traditional distortions, thus pose a new challenge for IQA methods to evaluate their visual quality. In comparison with previous IQA challenges, the training and testing datasets in this challenge include the outputs of perceptual image processing algorithms and the corresponding subjective scores. Thus they can be used to develop and evaluate IQA methods on GAN-based distortions. The challenge has 270 registered participants in total. In the final testing stage, 13 participating teams submitted their models and fact sheets. Almost all of them have achieved much better results than existing IQA methods, while the winning method can demonstrate state-of-the-art performance. △ Less

Submitted 28 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

arXiv:2104.11599 [pdf, other]

Region-Adaptive Deformable Network for Image Quality Assessment

Authors: Shuwei Shi, Qingyan Bai, Mingdeng Cao, Weihao Xia, Jiahao Wang, Yifan Chen, Yujiu Yang

Abstract: Image quality assessment (IQA) aims to assess the perceptual quality of images. The outputs of the IQA algorithms are expected to be consistent with human subjective perception. In image restoration and enhancement tasks, images generated by generative adversarial networks (GAN) can achieve better visual performance than traditional CNN-generated images, although they have spatial shift and textur… ▽ More Image quality assessment (IQA) aims to assess the perceptual quality of images. The outputs of the IQA algorithms are expected to be consistent with human subjective perception. In image restoration and enhancement tasks, images generated by generative adversarial networks (GAN) can achieve better visual performance than traditional CNN-generated images, although they have spatial shift and texture noise. Unfortunately, the existing IQA methods have unsatisfactory performance on the GAN-based distortion partially because of their low tolerance to spatial misalignment. To this end, we propose the reference-oriented deformable convolution, which can improve the performance of an IQA network on GAN-based distortion by adaptively considering this misalignment. We further propose a patch-level attention module to enhance the interaction among different patch regions, which are processed independently in previous patch-based methods. The modified residual block is also proposed by applying modifications to the classic residual block to construct a patch-region-based baseline called WResNet. Equip** this baseline with the two proposed modules, we further propose Region-Adaptive Deformable Network (RADN). The experiment results on the NTIRE 2021 Perceptual Image Quality Assessment Challenge dataset show the superior performance of RADN, and the ensemble approach won fourth place in the final testing phase of the challenge. Code is available at https://github.com/IIGROUP/RADN. △ Less

Submitted 23 April, 2021; originally announced April 2021.

Comments: CVPR NTIRE Workshop 2021. The first two authors contribute equally to this work. Code is available at https://github.com/IIGROUP/RADN

arXiv:2104.01818 [pdf, other]

The Multi-speaker Multi-style Voice Cloning Challenge 2021

Authors: Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu

Abstract: The Multi-speaker Multi-style Voice Cloning Challenge (M2VoC) aims to provide a common sizable dataset as well as a fair testbed for the benchmarking of the popular voice cloning task. Specifically, we formulate the challenge to adapt an average TTS model to the stylistic target voice with limited data from target speaker, evaluated by speaker identity and style similarity. The challenge consists… ▽ More The Multi-speaker Multi-style Voice Cloning Challenge (M2VoC) aims to provide a common sizable dataset as well as a fair testbed for the benchmarking of the popular voice cloning task. Specifically, we formulate the challenge to adapt an average TTS model to the stylistic target voice with limited data from target speaker, evaluated by speaker identity and style similarity. The challenge consists of two tracks, namely few-shot track and one-shot track, where the participants are required to clone multiple target voices with 100 and 5 samples respectively. There are also two sub-tracks in each track. For sub-track a, to fairly compare different strategies, the participants are allowed to use only the training data provided by the organizer strictly. For sub-track b, the participants are allowed to use any data publicly available. In this paper, we present a detailed explanation on the tasks and data used in the challenge, followed by a summary of submitted systems and evaluation results. △ Less

Submitted 5 April, 2021; originally announced April 2021.

Comments: has been accepted to ICASSP 2021

arXiv:2101.08918 [pdf, other]

Performance Analysis for Cache-enabled Cellular Networks with Cooperative Transmission

Authors: Tianming Feng, Shuo Shi, Shushi Gu, Ning Zhang, Wei Xiang, Xuemai Gu

Abstract: The large amount of deployed smart devices put tremendous traffic pressure on networks. Caching at the edge has been widely studied as a promising technique to solve this problem. To further improve the successful transmission probability (STP) of cache-enabled cellular networks (CEN), we combine the cooperative transmission technique with CEN and propose a novel transmission scheme. Local channel… ▽ More The large amount of deployed smart devices put tremendous traffic pressure on networks. Caching at the edge has been widely studied as a promising technique to solve this problem. To further improve the successful transmission probability (STP) of cache-enabled cellular networks (CEN), we combine the cooperative transmission technique with CEN and propose a novel transmission scheme. Local channel state information (CSI) is introduced at each cooperative base station (BS) to enhance the strength of the signal received by the user. A tight approximation for the STP of this scheme is derived using tools from stochastic geometry. The optimal content placement strategy of this scheme is obtained using a numerical method to maximize the STP. Simulation results demonstrate the optimal strategy achieves significant gains in STP over several comparative baselines with the proposed scheme. △ Less

Submitted 21 January, 2021; originally announced January 2021.

Comments: arXiv admin note: text overlap with arXiv:2101.08669

arXiv:2101.05442 [pdf, other]

Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans

Authors: Xin He, Shihao Wang, Xiaowen Chu, Shaohuai Shi, Jiang** Tang, Xin Liu, Chenggang Yan, Jiyong Zhang, Guiguang Ding

Abstract: The COVID-19 pandemic has spread globally for several months. Because its transmissibility and high pathogenicity seriously threaten people's lives, it is crucial to accurately and quickly detect COVID-19 infection. Many recent studies have shown that deep learning (DL) based solutions can help detect COVID-19 based on chest CT scans. However, most existing work focuses on 2D datasets, which may r… ▽ More The COVID-19 pandemic has spread globally for several months. Because its transmissibility and high pathogenicity seriously threaten people's lives, it is crucial to accurately and quickly detect COVID-19 infection. Many recent studies have shown that deep learning (DL) based solutions can help detect COVID-19 based on chest CT scans. However, most existing work focuses on 2D datasets, which may result in low quality models as the real CT scans are 3D images. Besides, the reported results span a broad spectrum on different datasets with a relatively unfair comparison. In this paper, we first use three state-of-the-art 3D models (ResNet3D101, DenseNet3D121, and MC3\_18) to establish the baseline performance on the three publicly available chest CT scan datasets. Then we propose a differentiable neural architecture search (DNAS) framework to automatically search for the 3D DL models for 3D chest CT scans classification with the Gumbel Softmax technique to improve the searching efficiency. We further exploit the Class Activation Map** (CAM) technique on our models to provide the interpretability of the results. The experimental results show that our automatically searched models (CovidNet3D) outperform the baseline human-designed models on the three datasets with tens of times smaller model size and higher accuracy. Furthermore, the results also verify that CAM can be well applied in CovidNet3D for COVID-19 datasets to provide interpretability for medical diagnosis. △ Less

Submitted 12 February, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

Comments: Accepted by AAAI 2021, COVID-19, Neural Architecture Search, AutoML

arXiv:2012.11414 [pdf, other]

Single module identifiability in linear dynamic networks with partial excitation and measurement

Authors: Shengling Shi, Xiaodong Cheng, Paul M. J. Van den Hof

Abstract: Identifiability of a single module in a network of transfer functions is determined by whether a particular transfer function in the network can be uniquely distinguished within a network model set, on the basis of data. Whereas previous research has focused on the situations that all network signals are either excited or measured, we develop generalized analysis results for the situation of parti… ▽ More Identifiability of a single module in a network of transfer functions is determined by whether a particular transfer function in the network can be uniquely distinguished within a network model set, on the basis of data. Whereas previous research has focused on the situations that all network signals are either excited or measured, we develop generalized analysis results for the situation of partial measurement and partial excitation. As identifiability conditions typically require a sufficient number of external excitation signals, this work introduces a novel network model structure such that excitation from unmeasured noise signals is included, which leads to less conservative identifiability conditions than relying on measured excitation signals only. More importantly, graphical conditions are developed to verify global and generic identifiability of a single module based on the topology of the dynamic network. Depending on whether the input or the output of the module can be measured, we present four identifiability conditions which cover all possible situations in single module identification. These conditions further lead to synthesis approaches for allocating excitation signals and selecting measured signals, to warrant single module identifiability. In addition, if the identifiability conditions are satisfied, indirect identification methods are developed to provide a consistent estimate of the module. All the obtained results are also extended to identifiability of multiple modules in the network. △ Less

Submitted 20 December, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

arXiv:2010.07801 [pdf, other]

doi 10.1016/j.compbiomed.2020.104055

A Bayesian method for inference of effective connectivity in brain networks for detecting the Mozart effect

Authors: Rik J. C. van Esch, Shengling Shi, Antoine Bernas, Svitlana Zinger, Albert P. Aldenkamp, Paul M. J. Van den Hof

Abstract: Several studies claim that listening to Mozart music affects cognition and can be used to treat neurological conditions like epilepsy. Research into this Mozart effect has not addressed how dynamic interactions between brain networks, i.e. effective connectivity, are affected. The Granger-causality analysis is often used to infer effective connectivity. First, we investigate if a new method, Bayes… ▽ More Several studies claim that listening to Mozart music affects cognition and can be used to treat neurological conditions like epilepsy. Research into this Mozart effect has not addressed how dynamic interactions between brain networks, i.e. effective connectivity, are affected. The Granger-causality analysis is often used to infer effective connectivity. First, we investigate if a new method, Bayesian topology identification, can be used as an alternative. Both methods are evaluated on simulation data, where the Bayesian method outperforms the Granger-causality analysis in the inference of connectivity graphs of dynamic networks, especially for short data lengths. In the second part, the Bayesian method is extended to enable the inference of changes in effective connectivity between groups of subjects. Next, we apply both methods to fMRI scans of 16 healthy subjects, who were scanned before and after exposure to Mozart's sonata K448 at least 2 hours a day for 7 days. Here, we investigate if the effective connectivity of the subjects significantly changed after listening to Mozart music. The Bayesian method detected changes in effective connectivity between networks related to cognitive processing and control: First, in the connection from the central executive to the superior sensori-motor network. Second, in the connection from the posterior default mode to the fronto-parietal right network. Finally, in the connection from the anterior default mode to the dorsal attention network, but only in a subgroup of subjects with a longer listening duration. Only in this last connection an effect was found by the Granger-causality analysis. △ Less

Submitted 15 October, 2020; originally announced October 2020.

arXiv:2008.01495 [pdf, other]

Generic identifiability of subnetworks in a linear dynamic network: the full measurement case

Authors: Shengling Shi, Xiaodong Cheng, Paul M. J. Van den Hof

Abstract: Identifiability conditions for single or multiple modules in a dynamic network specify under which conditions the considered modules can be uniquely recovered from the second-order statistical properties of the measured signals. Conditions for generic identifiability of multiple modules, i.e. a subnetwork, are developed for the situation that all node signals are measured and excitation of the net… ▽ More Identifiability conditions for single or multiple modules in a dynamic network specify under which conditions the considered modules can be uniquely recovered from the second-order statistical properties of the measured signals. Conditions for generic identifiability of multiple modules, i.e. a subnetwork, are developed for the situation that all node signals are measured and excitation of the network is provided by both measured excitation signals and unmeasured disturbance inputs. Additionally, the network model set is allowed to contain non-parametrized modules that are fixed, and e.g. reflect modules of which the dynamics are known to the user. The conditions take the form of path-based conditions on the graph of the network model set. Based on these conditions, synthesis results are formulated for allocating external excitation signals to achieve generic identifiability of particular subnetworks. If there are a sufficient number of measured external excitation signals, the formulated results give rise to a generalized indirect type of identification algorithm that requires only the measurement of a subset of the node signals in the network. △ Less

Submitted 26 October, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

arXiv:2003.06307 [pdf, other]

Communication-Efficient Distributed Deep Learning: A Comprehensive Survey

Authors: Zhenheng Tang, Shaohuai Shi, Wei Wang, Bo Li, Xiaowen Chu

Abstract: Distributed deep learning (DL) has become prevalent in recent years to reduce training time by leveraging multiple computing devices (e.g., GPUs/TPUs) due to larger models and datasets. However, system scalability is limited by communication becoming the performance bottleneck. Addressing this communication issue has become a prominent research topic. In this paper, we provide a comprehensive surv… ▽ More Distributed deep learning (DL) has become prevalent in recent years to reduce training time by leveraging multiple computing devices (e.g., GPUs/TPUs) due to larger models and datasets. However, system scalability is limited by communication becoming the performance bottleneck. Addressing this communication issue has become a prominent research topic. In this paper, we provide a comprehensive survey of the communication-efficient distributed training algorithms, focusing on both system-level and algorithmic-level optimizations. We first propose a taxonomy of data-parallel distributed training algorithms that incorporates four primary dimensions: communication synchronization, system architectures, compression techniques, and parallelism of communication and computing tasks. We then investigate state-of-the-art studies that address problems in these four dimensions. We also compare the convergence rates of different algorithms to understand their convergence speed. Additionally, we conduct extensive experiments to empirically compare the convergence performance of various mainstream distributed training algorithms. Based on our system-level communication cost analysis, theoretical and experimental convergence speed comparison, we provide readers with an understanding of which algorithms are more efficient under specific distributed environments. Our research also extrapolates potential directions for further optimizations. △ Less

Submitted 1 September, 2023; v1 submitted 10 March, 2020; originally announced March 2020.

arXiv:2001.09259 [pdf, ps, other]

A Blockchain-Based Approach for Saving and Tracking Differential-Privacy Cost

Authors: Yang Zhao, Jun Zhao, Jiawen Kang, Zehang Zhang, Dusit Niyato, Shuyu Shi

Abstract: An increasing amount of users' sensitive information is now being collected for analytics purposes. To protect users' privacy, differential privacy has been widely studied in the literature. Specifically, a differentially private algorithm adds noise to the true answer of a query to generate a noisy response. As a result, the information about the dataset leaked by the noisy output is bounded by t… ▽ More An increasing amount of users' sensitive information is now being collected for analytics purposes. To protect users' privacy, differential privacy has been widely studied in the literature. Specifically, a differentially private algorithm adds noise to the true answer of a query to generate a noisy response. As a result, the information about the dataset leaked by the noisy output is bounded by the privacy parameter. Oftentimes, a dataset needs to be used for answering multiple queries (e.g., for multiple analytics tasks), so the level of privacy protection may degrade as more queries are answered. Thus, it is crucial to keep track of the privacy spending which should not exceed the given privacy budget. Moreover, if a query has been answered before and is asked again on the same dataset, we may reuse the previous noisy response for the current query to save the privacy cost. In view of the above, we design and implement a blockchain-based system for tracking and saving differential-privacy cost. Blockchain provides a distributed immutable ledger that records each query's type, the noisy response used to answer each query, the associated noise level added to the true query result, and the remaining privacy budget in our system. Furthermore, since the blockchain records the noisy response used to answer each query, we also design an algorithm to reuse previous noisy response if the same query is asked repeatedly. Specifically, considering that different requests of the same query may have different privacy requirements, our algorithm (via a rigorous proof) is able to set the optimal reuse fraction of the old noisy response and add new noise (if necessary) to minimize the accumulated privacy cost. Experimental results show that the proposed algorithm can reduce the privacy cost significantly without compromising data accuracy. △ Less

Submitted 22 December, 2020; v1 submitted 24 January, 2020; originally announced January 2020.

Comments: 14 pages, 4 figures

Showing 1–50 of 55 results for author: Shi, S