-
A semidefinite programming approach for robust elliptic localization
Authors:
Wenxin Xiong,
Jiajun He,
Zhang-Lei Shi,
Keyuan Hu,
Hing Cheung So,
Chi-Sing Leung
Abstract:
This short communication addresses the problem of elliptic localization with outlier measurements, whose occurrences are prevalent in various location-enabled applications and can significantly compromise the positioning performance if not adequately handled. In contrast to the reliance on $M$-estimation adopted in the majority of existing solutions, we take a different path, specifically explorin…
▽ More
This short communication addresses the problem of elliptic localization with outlier measurements, whose occurrences are prevalent in various location-enabled applications and can significantly compromise the positioning performance if not adequately handled. In contrast to the reliance on $M$-estimation adopted in the majority of existing solutions, we take a different path, specifically exploring the worst-case robust approximation criterion, to bolster resistance of the elliptic location estimator against outliers. From a geometric standpoint, our method boils down to pinpointing the Chebyshev center of the feasible set determined by the available bistatic ranges with bounded measurement errors. For a practical approach to the associated min-max problem, we convert it into the well-established convex optimization framework of semidefinite programming (SDP). Numerical simulations confirm that our SDP-based technique can outperform a number of existing elliptic localization schemes in terms of positioning accuracy in Gaussian mixture noise, a common type of impulsive interference in the context of range-based localization.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Two-stage Progressive Residual Dense Attention Network for Image Denoising
Authors:
Wencong Wu,
An Ge,
Guannan Lv,
Yuelong Xia,
Yungang Zhang,
Wen Xiong
Abstract:
Deep convolutional neural networks (CNNs) for image denoising can effectively exploit rich hierarchical features and have achieved great success. However, many deep CNN-based denoising models equally utilize the hierarchical features of noisy images without paying attention to the more important and useful features, leading to relatively low performance. To address the issue, we design a new Two-s…
▽ More
Deep convolutional neural networks (CNNs) for image denoising can effectively exploit rich hierarchical features and have achieved great success. However, many deep CNN-based denoising models equally utilize the hierarchical features of noisy images without paying attention to the more important and useful features, leading to relatively low performance. To address the issue, we design a new Two-stage Progressive Residual Dense Attention Network (TSP-RDANet) for image denoising, which divides the whole process of denoising into two sub-tasks to remove noise progressively. Two different attention mechanism-based denoising networks are designed for the two sequential sub-tasks: the residual dense attention module (RDAM) is designed for the first stage, and the hybrid dilated residual dense attention module (HDRDAM) is proposed for the second stage. The proposed attention modules are able to learn appropriate local features through dense connection between different convolutional layers, and the irrelevant features can also be suppressed. The two sub-networks are then connected by a long skip connection to retain the shallow feature to enhance the denoising performance. The experiments on seven benchmark datasets have verified that compared with many state-of-the-art methods, the proposed TSP-RDANet can obtain favorable results both on synthetic and real noisy image denoising. The code of our TSP-RDANet is available at https://github.com/WenCongWu/TSP-RDANet.
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
A Spatio-Temporal Graph Convolutional Network for Gesture Recognition from High-Density Electromyography
Authors:
Wenjuan Zhong,
Yuyang Zhang,
Peiwen Fu,
Wenxuan Xiong,
Mingming Zhang
Abstract:
Accurate hand gesture prediction is crucial for effective upper-limb prosthetic limbs control. As the high flexibility and multiple degrees of freedom exhibited by human hands, there has been a growing interest in integrating deep networks with high-density surface electromyography (HD-sEMG) grids to enhance gesture recognition capabilities. However, many existing methods fall short in fully explo…
▽ More
Accurate hand gesture prediction is crucial for effective upper-limb prosthetic limbs control. As the high flexibility and multiple degrees of freedom exhibited by human hands, there has been a growing interest in integrating deep networks with high-density surface electromyography (HD-sEMG) grids to enhance gesture recognition capabilities. However, many existing methods fall short in fully exploit the specific spatial topology and temporal dependencies present in HD-sEMG data. Additionally, these studies are often limited number of gestures and lack generality. Hence, this study introduces a novel gesture recognition method, named STGCN-GR, which leverages spatio-temporal graph convolution networks for HD-sEMG-based human-machine interfaces. Firstly, we construct muscle networks based on functional connectivity between channels, creating a graph representation of HD-sEMG recordings. Subsequently, a temporal convolution module is applied to capture the temporal dependences in the HD-sEMG series and a spatial graph convolution module is employed to effectively learn the intrinsic spatial topology information among distinct HD-sEMG channels. We evaluate our proposed model on a public HD-sEMG dataset comprising a substantial number of gestures (i.e., 65). Our results demonstrate the remarkable capability of the STGCN-GR method, achieving an impressive accuracy of 91.07% in predicting gestures, which surpasses state-of-the-art deep learning methods applied to the same dataset.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Windformer:Bi-Directional Long-Distance Spatio-Temporal Network For Wind Speed Prediction
Authors:
Xuewei Li,
Zewen Shang,
Zhiqiang Liu,
Jian Yu,
Wei Xiong,
Mei Yu
Abstract:
Wind speed prediction is critical to the management of wind power generation. Due to the large range of wind speed fluctuations and wake effect, there may also be strong correlations between long-distance wind turbines. This difficult-to-extract feature has become a bottleneck for improving accuracy. History and future time information includes the trend of airflow changes, whether this dynamic in…
▽ More
Wind speed prediction is critical to the management of wind power generation. Due to the large range of wind speed fluctuations and wake effect, there may also be strong correlations between long-distance wind turbines. This difficult-to-extract feature has become a bottleneck for improving accuracy. History and future time information includes the trend of airflow changes, whether this dynamic information can be utilized will also affect the prediction effect. In response to the above problems, this paper proposes Windformer. First, Windformer divides the wind turbine cluster into multiple non-overlap** windows and calculates correlations inside the windows, then shifts the windows partially to provide connectivity between windows, and finally fuses multi-channel features based on detailed and global information. To dynamically model the change process of wind speed, this paper extracts time series in both history and future directions simultaneously. Compared with other current-advanced methods, the Mean Square Error (MSE) of Windformer is reduced by 0.5\% to 15\% on two datasets from NERL.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
Prompting Large Language Models with Speech Recognition Abilities
Authors:
Yassir Fathullah,
Chunyang Wu,
Egor Lakomkin,
Junteng Jia,
Yuan Shangguan,
Ke Li,
**xi Guo,
Wenhan Xiong,
Jay Mahadeokar,
Ozlem Kalinli,
Christian Fuegen,
Mike Seltzer
Abstract:
Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly attaching a small audio encoder allowing it to perform speech recognition. By directly prepending a sequence of audial embeddings to the text token embeddings,…
▽ More
Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly attaching a small audio encoder allowing it to perform speech recognition. By directly prepending a sequence of audial embeddings to the text token embeddings, the LLM can be converted to an automatic speech recognition (ASR) system, and be used in the exact same manner as its textual counterpart. Experiments on Multilingual LibriSpeech (MLS) show that incorporating a conformer encoder into the open sourced LLaMA-7B allows it to outperform monolingual baselines by 18% and perform multilingual speech recognition despite LLaMA being trained overwhelmingly on English text. Furthermore, we perform ablation studies to investigate whether the LLM can be completely frozen during training to maintain its original capabilities, scaling up the audio encoder, and increasing the audio encoder striding to generate fewer embeddings. The results from these studies show that multilingual ASR is possible even when the LLM is frozen or when strides of almost 1 second are used in the audio encoder opening up the possibility for LLMs to operate on long-form audio.
△ Less
Submitted 21 July, 2023;
originally announced July 2023.
-
Robust time-of-arrival localization via ADMM
Authors:
Wenxin Xiong,
Christian Schindelhauer,
Hing Cheung So
Abstract:
This article considers the problem of source localization (SL) using possibly unreliable time-of-arrival (TOA) based range measurements. Adopting the strategy of statistical robustification, we formulate TOA SL as minimization of a versatile loss that possesses resistance against the occurrence of outliers. We then present an alternating direction method of multipliers (ADMM) to tackle the nonconv…
▽ More
This article considers the problem of source localization (SL) using possibly unreliable time-of-arrival (TOA) based range measurements. Adopting the strategy of statistical robustification, we formulate TOA SL as minimization of a versatile loss that possesses resistance against the occurrence of outliers. We then present an alternating direction method of multipliers (ADMM) to tackle the nonconvex optimization problem in a computationally attractive iterative manner. Moreover, we prove that the solution obtained by the proposed ADMM will correspond to a Karush-Kuhn-Tucker point of the formulation when the algorithm converges, and discuss reasonable assumptions about the robust loss function under which the approach can be theoretically guaranteed to be convergent. Numerical investigations demonstrate the superiority of our method over many existing TOA SL schemes in terms of positioning accuracy and computational simplicity. In particular, the proposed ADMM achieves estimation results with mean square error performance closer to the Cramér-Rao lower bound than its competitors in our simulations of impulsive noise environments.
△ Less
Submitted 17 January, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Multi-Head State Space Model for Speech Recognition
Authors:
Yassir Fathullah,
Chunyang Wu,
Yuan Shangguan,
Junteng Jia,
Wenhan Xiong,
Jay Mahadeokar,
Chunxi Liu,
Yangyang Shi,
Ozlem Kalinli,
Mike Seltzer,
Mark J. F. Gales
Abstract:
State space models (SSMs) have recently shown promising results on small-scale sequence and language modelling tasks, rivalling and outperforming many attention-based approaches. In this paper, we propose a multi-head state space (MH-SSM) architecture equipped with special gating mechanisms, where parallel heads are taught to learn local and global temporal dynamics on sequence data. As a drop-in…
▽ More
State space models (SSMs) have recently shown promising results on small-scale sequence and language modelling tasks, rivalling and outperforming many attention-based approaches. In this paper, we propose a multi-head state space (MH-SSM) architecture equipped with special gating mechanisms, where parallel heads are taught to learn local and global temporal dynamics on sequence data. As a drop-in replacement for multi-head attention in transformer encoders, this new model significantly outperforms the transformer transducer on the LibriSpeech speech recognition corpus. Furthermore, we augment the transformer block with MH-SSMs layers, referred to as the Stateformer, achieving state-of-the-art performance on the LibriSpeech task, with word error rates of 1.76\%/4.37\% on the development and 1.91\%/4.36\% on the test sets without using an external language model.
△ Less
Submitted 25 May, 2023; v1 submitted 21 May, 2023;
originally announced May 2023.
-
Reward Teaching for Federated Multi-armed Bandits
Authors:
Chengshuai Shi,
Wei Xiong,
Cong Shen,
**g Yang
Abstract:
Most of the existing federated multi-armed bandits (FMAB) designs are based on the presumption that clients will implement the specified design to collaborate with the server. In reality, however, it may not be possible to modify the clients' existing protocols. To address this challenge, this work focuses on clients who always maximize their individual cumulative rewards, and introduces a novel i…
▽ More
Most of the existing federated multi-armed bandits (FMAB) designs are based on the presumption that clients will implement the specified design to collaborate with the server. In reality, however, it may not be possible to modify the clients' existing protocols. To address this challenge, this work focuses on clients who always maximize their individual cumulative rewards, and introduces a novel idea of ``reward teaching'', where the server guides the clients towards global optimality through implicit local reward adjustments. Under this framework, the server faces two tightly coupled tasks of bandit learning and target teaching, whose combination is non-trivial and challenging. A phased approach, called Teaching-After-Learning (TAL), is first designed to encourage and discourage clients' explorations separately. General performance analyses of TAL are established when the clients' strategies satisfy certain mild requirements. With novel technical approaches developed to analyze the warm-start behaviors of bandit algorithms, particularized guarantees of TAL with clients running UCB or epsilon-greedy strategies are then obtained. These results demonstrate that TAL achieves logarithmic regrets while only incurring logarithmic adjustment costs, which is order-optimal w.r.t. a natural lower bound. As a further extension, the Teaching-While-Learning (TWL) algorithm is developed with the idea of successive arm elimination to break the non-adaptive phase separation in TAL. Rigorous analyses demonstrate that when facing clients with UCB1, TWL outperforms TAL in terms of the dependencies on sub-optimality gaps thanks to its adaptive design. Experimental results demonstrate the effectiveness and generality of the proposed algorithms.
△ Less
Submitted 20 November, 2023; v1 submitted 3 May, 2023;
originally announced May 2023.
-
Globally Optimized TDOA High Frequency Source Localization Based on Quasi-Parabolic Ionosphere Modeling and Collaborative Gradient Projection
Authors:
Wenxin Xiong,
Christian Schindelhauer,
Hing Cheung So
Abstract:
We investigate the problem of high frequency (HF) source localization using the time-difference-of-arrival (TDOA) observations of ionosphere-refracted radio rays based on quasi-parabolic (QP) modeling. An unresolved but pertinent issue in such a field is that the existing gradient-type scheme can easily get trapped in local optima for practical use. This will lead to the difficulty in initializing…
▽ More
We investigate the problem of high frequency (HF) source localization using the time-difference-of-arrival (TDOA) observations of ionosphere-refracted radio rays based on quasi-parabolic (QP) modeling. An unresolved but pertinent issue in such a field is that the existing gradient-type scheme can easily get trapped in local optima for practical use. This will lead to the difficulty in initializing the algorithm and finally degraded positioning performance if the starting point is inappropriately selected. In this paper, we develop a collaborative gradient projection (GP) algorithm in order to globally solve the highly nonconvex QP-based TDOA HF localization problem. The metaheuristic of particle swarm optimization (PSO) is exploited for information sharing among multiple GP models, each of which is guaranteed to work out a critical point solution to the simplified maximum likelihood formulation. Random mutations are incorporated to avoid the early convergence of PSO. Rather than treating the geolocation of HF transmitter as a pure optimization problem, we further provide workarounds for addressing the possible impairments and challenges when the proposed technique is applied in practice. Numerical results demonstrate the effectiveness of our PSO-assisted re-initialization strategy in achieving the global optimality, and the superiority of our method over its competitor in terms of positioning accuracy.
△ Less
Submitted 10 February, 2023;
originally announced February 2023.
-
End-to-end Recording Device Identification Based on Deep Representation Learning
Authors:
Chunyan Zeng,
Dongliang Zhu,
Zhifeng Wang,
Minghu Wu,
Wei Xiong,
Nan Zhao
Abstract:
Deep learning techniques have achieved specific results in recording device source identification. The recording device source features include spatial information and certain temporal information. However, most recording device source identification methods based on deep learning only use spatial representation learning from recording device source features, which cannot make full use of recordin…
▽ More
Deep learning techniques have achieved specific results in recording device source identification. The recording device source features include spatial information and certain temporal information. However, most recording device source identification methods based on deep learning only use spatial representation learning from recording device source features, which cannot make full use of recording device source information. Therefore, in this paper, to fully explore the spatial information and temporal information of recording device source, we propose a new method for recording device source identification based on the fusion of spatial feature information and temporal feature information by using an end-to-end framework. From a feature perspective, we designed two kinds of networks to extract recording device source spatial and temporal information. Afterward, we use the attention mechanism to adaptively assign the weight of spatial information and temporal information to obtain fusion features. From a model perspective, our model uses an end-to-end framework to learn the deep representation from spatial feature and temporal feature and train using deep and shallow loss to joint optimize our network. This method is compared with our previous work and baseline system. The results show that the proposed method is better than our previous work and baseline system under general conditions.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results
Authors:
Yawei Li,
Kai Zhang,
Radu Timofte,
Luc Van Gool,
Fangyuan Kong,
Mingxi Li,
Songwei Liu,
Zongcai Du,
Ding Liu,
Chenhui Zhou,
**gyi Chen,
Qingrui Han,
Zheyuan Li,
Yingqi Liu,
Xiangyu Chen,
Haoming Cai,
Yu Qiao,
Chao Dong,
Long Sun,
**shan Pan,
Yi Zhu,
Zhikai Zong,
Xiaoxiao Liu,
Zheng Hui,
Tao Yang
, et al. (86 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of e…
▽ More
This paper reviews the NTIRE 2022 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The task of the challenge was to super-resolve an input image with a magnification factor of $\times$4 based on pairs of low and corresponding high resolution images. The aim was to design a network for single image super-resolution that achieved improvement of efficiency measured according to several metrics including runtime, parameters, FLOPs, activations, and memory consumption while at least maintaining the PSNR of 29.00dB on DIV2K validation set. IMDN is set as the baseline for efficiency measurement. The challenge had 3 tracks including the main track (runtime), sub-track one (model complexity), and sub-track two (overall performance). In the main track, the practical runtime performance of the submissions was evaluated. The rank of the teams were determined directly by the absolute value of the average runtime on the validation set and test set. In sub-track one, the number of parameters and FLOPs were considered. And the individual rankings of the two metrics were summed up to determine a final ranking in this track. In sub-track two, all of the five metrics mentioned in the description of the challenge including runtime, parameter count, FLOPs, activations, and memory consumption were considered. Similar to sub-track one, the rankings of five metrics were summed up to determine a final ranking. The challenge had 303 registered participants, and 43 teams made valid submissions. They gauge the state-of-the-art in efficient single image super-resolution.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
Breast Cancer Induced Bone Osteolysis Prediction Using Temporal Variational Auto-Encoders
Authors:
Wei Xiong,
Neil Yeung,
Shubo Wang,
Haofu Liao,
Liyun Wang,
Jiebo Luo
Abstract:
Objective and Impact Statement. We adopt a deep learning model for bone osteolysis prediction on computed tomography (CT) images of murine breast cancer bone metastases. Given the bone CT scans at previous time steps, the model incorporates the bone-cancer interactions learned from the sequential images and generates future CT images. Its ability of predicting the development of bone lesions in ca…
▽ More
Objective and Impact Statement. We adopt a deep learning model for bone osteolysis prediction on computed tomography (CT) images of murine breast cancer bone metastases. Given the bone CT scans at previous time steps, the model incorporates the bone-cancer interactions learned from the sequential images and generates future CT images. Its ability of predicting the development of bone lesions in cancer-invading bones can assist in assessing the risk of impending fractures and choosing proper treatments in breast cancer bone metastasis. Introduction. Breast cancer often metastasizes to bone, causes osteolytic lesions, and results in skeletal related events (SREs) including severe pain and even fatal fractures. Although current imaging techniques can detect macroscopic bone lesions, predicting the occurrence and progression of bone lesions remains a challenge. Methods. We adopt a temporal variational auto-encoder (T-VAE) model that utilizes a combination of variational auto-encoders and long short-term memory networks to predict bone lesion emergence on our micro-CT dataset containing sequential images of murine tibiae. Given the CT scans of murine tibiae at early weeks, our model can learn the distribution of their future states from data. Results. We test our model against other deep learning-based prediction models on the bone lesion progression prediction task. Our model produces much more accurate predictions than existing models under various evaluation metrics. Conclusion. We develop a deep learning framework that can accurately predict and visualize the progression of osteolytic bone lesions. It will assist in planning and evaluating treatment strategies to prevent SREs in breast cancer patients.
△ Less
Submitted 28 March, 2022; v1 submitted 20 March, 2022;
originally announced March 2022.
-
Deep Instance Segmentation with Automotive Radar Detection Points
Authors:
Jianan Liu,
Weiyi Xiong,
Li** Bai,
Yuxuan Xia,
Tao Huang,
Wanli Ouyang,
Bing Zhu
Abstract:
Automotive radar provides reliable environmental perception in all-weather conditions with affordable cost, but it hardly supplies semantic and geometry information due to the sparsity of radar detection points. With the development of automotive radar technologies in recent years, instance segmentation becomes possible by using automotive radar. Its data contain contexts such as radar cross secti…
▽ More
Automotive radar provides reliable environmental perception in all-weather conditions with affordable cost, but it hardly supplies semantic and geometry information due to the sparsity of radar detection points. With the development of automotive radar technologies in recent years, instance segmentation becomes possible by using automotive radar. Its data contain contexts such as radar cross section and micro-Doppler effects, and sometimes can provide detection when the field of view is obscured. The outcome from instance segmentation could be potentially used as the input of trackers for tracking targets. The existing methods often utilize a clustering-based classification framework, which fits the need of real-time processing but has limited performance due to minimum information provided by sparse radar detection points. In this paper, we propose an efficient method based on clustering of estimated semantic information to achieve instance segmentation for the sparse radar detection points. In addition, we show that the performance of the proposed approach can be further enhanced by incorporating the visual multi-layer perceptron. The effectiveness of the proposed method is verified by experimental results on the popular RadarScenes dataset, achieving 89.53% mean coverage and 86.97% mean average precision with the IoU threshold of 0.5, which is superior to other approaches in the literature. More significantly, the consumed memory is around 1MB, and the inference time is less than 40ms, indicating that our proposed algorithm is storage and time efficient. These two criteria ensure the practicality of the proposed method in real-world systems.
△ Less
Submitted 5 February, 2023; v1 submitted 4 October, 2021;
originally announced October 2021.
-
Two Efficient and Easy-to-Use NLOS Mitigation Solutions to Indoor 3-D AOA-Based Localization
Authors:
Wenxin Xiong,
Joan Bordoy,
Andrea Gabbrielli,
Georg Fischer,
Dominik Jan Schott,
Fabian Hoeflinger,
Johannes Wendeberg,
Christian Schindelhauer,
Stefan Johann Rupitsch
Abstract:
This paper proposes two efficient and easy-to-use error mitigation solutions to the problem of three-dimensional (3-D) angle-of-arrival (AOA) source localization in the mixed line-of-sight (LOS) and non-line-of-sight (NLOS) indoor environments. A weighted linear least squares estimator is derived first for the LOS AOA components in terms of the direction vectors of arrival, albeit in a sub-optimal…
▽ More
This paper proposes two efficient and easy-to-use error mitigation solutions to the problem of three-dimensional (3-D) angle-of-arrival (AOA) source localization in the mixed line-of-sight (LOS) and non-line-of-sight (NLOS) indoor environments. A weighted linear least squares estimator is derived first for the LOS AOA components in terms of the direction vectors of arrival, albeit in a sub-optimal manner. Next, data selection exploiting the sum of squared residuals is carried out to discard the error-prone NLOS connections. In so doing, the first approach is constituted and more accurate closed-form location estimates can be obtained. The second method applies a simulated annealing stochastic framework to realize the robust $\ell_1$-minimization criterion, which therefore falls into the methodology of statistical robustification. Computer simulations and ultrasonic onsite experiments are conducted to evaluate the performance of the two proposed methods, demonstrating their outstanding positioning results in the respective scenarios.
△ Less
Submitted 13 August, 2021; v1 submitted 21 July, 2021;
originally announced July 2021.
-
Neurodynamic TDOA localization with NLOS mitigation via maximum correntropy criterion
Authors:
Wenxin Xiong,
Christian Schindelhauer,
Hing Cheung So,
Junli Liang,
Zhi Wang
Abstract:
In this paper, we exploit the maximum correntropy criterion (MCC) to robustify the traditional time-difference-of-arrival (TDOA) location estimator in the presence of non-line-of-sight (NLOS) propagation conditions. For the sake of statistical efficiency, the correntropy-based robust loss is imposed on the underlying time-of-arrival composition via joint estimation of the source position and onset…
▽ More
In this paper, we exploit the maximum correntropy criterion (MCC) to robustify the traditional time-difference-of-arrival (TDOA) location estimator in the presence of non-line-of-sight (NLOS) propagation conditions. For the sake of statistical efficiency, the correntropy-based robust loss is imposed on the underlying time-of-arrival composition via joint estimation of the source position and onset time, instead of the TDOA counterpart generated in the postprocessing of sensor-collected timestamps. We then employ a neurodynamic optimization approach to tackle the highly nonconvex MCC formulation. Furthermore, we examine the local stability of equilibrium for the corresponding projection-type neural network model. Simulation investigations in representative NLOS propagation scenarios demonstrate that our neurodynamic robust TDOA localization solution is capable of outperforming several existing schemes in terms of positioning accuracy.
△ Less
Submitted 9 November, 2021; v1 submitted 14 September, 2020;
originally announced September 2020.
-
Maximum correntropy criterion for robust TOA-based localization in NLOS environments
Authors:
Wenxin Xiong,
Christian Schindelhauer,
Hing Cheung So,
Zhi Wang
Abstract:
We investigate the problem of time-of-arrival (TOA) based localization under possible non-line-of-sight (NLOS) propagation conditions. To robustify the squared-range-based location estimator, we follow the maximum correntropy criterion, essentially the Welsch $M$-estimator with a redescending influence function which behaves like $\ell_0$-minimization towards the grossly biased measurements, to de…
▽ More
We investigate the problem of time-of-arrival (TOA) based localization under possible non-line-of-sight (NLOS) propagation conditions. To robustify the squared-range-based location estimator, we follow the maximum correntropy criterion, essentially the Welsch $M$-estimator with a redescending influence function which behaves like $\ell_0$-minimization towards the grossly biased measurements, to derive the formulation. The half-quadratic technique is then applied to settle the resulting optimization problem in an alternating maximization (AM) manner. By construction, the major computational challenge at each AM iteration boils down to handling an easily solvable generalized trust region subproblem. It is worth noting that the implementation of our localization method requires nothing but merely the TOA-based range measurements and sensor positions as prior information. Simulation and experimental results demonstrate the competence of the presented scheme in outperforming several state-of-the-art approaches in terms of positioning accuracy, especially in scenarios where the percentage of NLOS paths is not large enough.
△ Less
Submitted 10 September, 2021; v1 submitted 13 September, 2020;
originally announced September 2020.
-
LinksIQ: Robust and Efficient Modulation Recognition with Imperfect Spectrum Scans
Authors:
Wei Xiong,
Karyn Doke,
Petko Bogdanov,
Mariya Zheleva
Abstract:
While critical for the practical progress of spectrum sharing, modulation recognition has so far been investigated under unrealistic assumptions: (i) a transmitter's bandwidth must be scanned alone and in full, (ii) prior knowledge of the technology must be available and (iii) a transmitter must be trustworthy. In reality these assumptions cannot be readily met, as a transmitter's bandwidth may on…
▽ More
While critical for the practical progress of spectrum sharing, modulation recognition has so far been investigated under unrealistic assumptions: (i) a transmitter's bandwidth must be scanned alone and in full, (ii) prior knowledge of the technology must be available and (iii) a transmitter must be trustworthy. In reality these assumptions cannot be readily met, as a transmitter's bandwidth may only be scanned intermittently, partially, or alongside other transmitters, and modulation obfuscation may be introduced by short-lived scans or malicious activity.
This paper presents LinksIQ, which bridges the gap between real-world spectrum sensing and the growing body of modrec methods designed under simplifying assumptions. Our key insight is that ordered IQ samples form distinctive patterns across modulations, which persist even with scan deficiencies. We mine these patterns through a Fisher Kernel framework and employ lightweight linear support vector machine for modulation classification. LinksIQ is robust to noise, scan partiality and data biases without utilizing prior knowledge of transmitter technology. Its accuracy consistently outperforms baselines in both simulated and real traces. We evaluate LinksIQ performance in a testbed using two popular SDR platforms, RTL-SDR and USRP. We demonstrate high detection accuracy (i.e. 0.74) even with a $20 RTL-SDR scanning at 50% transmitter overlap.
This constitutes an average of 43% improvement over existing counterparts employed on RTL-SDR scans. We also explore the effects of platform-aware classifier training and discuss implications on real-world modrec system design. Our results demonstrate the feasibility of low-cost transmitter fingerprinting at scale.
△ Less
Submitted 7 May, 2020;
originally announced May 2020.
-
Unsupervised Low-light Image Enhancement with Decoupled Networks
Authors:
Wei Xiong,
Ding Liu,
Xiaohui Shen,
Chen Fang,
Jiebo Luo
Abstract:
In this paper, we tackle the problem of enhancing real-world low-light images with significant noise in an unsupervised fashion. Conventional unsupervised learning-based approaches usually tackle the low-light image enhancement problem using an image-to-image translation model. They focus primarily on illumination or contrast enhancement but fail to suppress the noise that ubiquitously exists in i…
▽ More
In this paper, we tackle the problem of enhancing real-world low-light images with significant noise in an unsupervised fashion. Conventional unsupervised learning-based approaches usually tackle the low-light image enhancement problem using an image-to-image translation model. They focus primarily on illumination or contrast enhancement but fail to suppress the noise that ubiquitously exists in images taken under real-world low-light conditions. To address this issue, we explicitly decouple this task into two sub-tasks: illumination enhancement and noise suppression. We propose to learn a two-stage GAN-based framework to enhance the real-world low-light images in a fully unsupervised fashion. To facilitate the unsupervised training of our model, we construct samples with pseudo labels. Furthermore, we propose an adaptive content loss to suppress real image noise in different regions based on illumination intensity. In addition to conventional benchmark datasets, a new unpaired low-light image enhancement dataset is built and used to thoroughly evaluate the performance of our model. Extensive experiments show that our proposed method outperforms the state-of-the-art unsupervised image enhancement methods in terms of both illumination enhancement and noise reduction.
△ Less
Submitted 28 March, 2022; v1 submitted 6 May, 2020;
originally announced May 2020.
-
TDOA-based localization with NLOS mitigation via robust model transformation and neurodynamic optimization
Authors:
Wenxin Xiong,
Christian Schindelhauer,
Hing Cheung So,
Joan Bordoy,
Andrea Gabbrielli,
Junli Liang
Abstract:
This paper revisits the problem of locating a signal-emitting source from time-difference-of-arrival (TDOA) measurements under non-line-of-sight (NLOS) propagation. Many currently fashionable methods for NLOS mitigation in TDOA-based localization tend to solve their optimization problems by means of convex relaxation and, thus, are computationally inefficient. Besides, previous studies show that m…
▽ More
This paper revisits the problem of locating a signal-emitting source from time-difference-of-arrival (TDOA) measurements under non-line-of-sight (NLOS) propagation. Many currently fashionable methods for NLOS mitigation in TDOA-based localization tend to solve their optimization problems by means of convex relaxation and, thus, are computationally inefficient. Besides, previous studies show that manipulating directly on the TDOA metric usually gives rise to intricate estimators. Aiming at bypassing these challenges, we turn to retrieve the underlying time-of-arrival framework by treating the unknown source onset time as an optimization variable and imposing certain inequality constraints on it, mitigate the NLOS errors through the $\ell_1$-norm robustification, and finally apply a hardware realizable neurodynamic model based on the redefined augmented Lagrangian and projection theorem to solve the resultant nonconvex optimization problem with inequality constraints. It is validated through extensive simulations that the proposed scheme can strike a nice balance between localization accuracy, computational complexity, and prior knowledge requirement.
△ Less
Submitted 20 August, 2020; v1 submitted 22 April, 2020;
originally announced April 2020.
-
Advances in Online Audio-Visual Meeting Transcription
Authors:
Takuya Yoshioka,
Igor Abramovski,
Cem Aksoylar,
Zhuo Chen,
Moshe David,
Dimitrios Dimitriadis,
Yifan Gong,
Ilya Gurvich,
Xuedong Huang,
Yan Huang,
Aviv Hurvitz,
Li Jiang,
Sharon Koubi,
Eyal Krupka,
Ido Leichter,
Changliang Liu,
Partha Parthasarathy,
Alon Vinnikov,
Lingfeng Wu,
Xiong Xiao,
Wayne Xiong,
Huaming Wang,
Zhenghao Wang,
Jun Zhang,
Yong Zhao
, et al. (1 additional authors not shown)
Abstract:
This paper describes a system that generates speaker-annotated transcripts of meetings by using a microphone array and a 360-degree camera. The hallmark of the system is its ability to handle overlapped speech, which has been an unsolved problem in realistic settings for over a decade. We show that this problem can be addressed by using a continuous speech separation approach. In addition, we desc…
▽ More
This paper describes a system that generates speaker-annotated transcripts of meetings by using a microphone array and a 360-degree camera. The hallmark of the system is its ability to handle overlapped speech, which has been an unsolved problem in realistic settings for over a decade. We show that this problem can be addressed by using a continuous speech separation approach. In addition, we describe an online audio-visual speaker diarization method that leverages face tracking and identification, sound source localization, speaker identification, and, if available, prior speaker information for robustness to various real world challenges. All components are integrated in a meeting transcription framework called SRD, which stands for "separate, recognize, and diarize". Experimental results using recordings of natural meetings involving up to 11 attendees are reported. The continuous speech separation improves a word error rate (WER) by 16.1% compared with a highly tuned beamformer. When a complete list of meeting attendees is available, the discrepancy between WER and speaker-attributed WER is only 1.0%, indicating accurate word-to-speaker association. This increases marginally to 1.6% when 50% of the attendees are unknown to the system.
△ Less
Submitted 10 December, 2019;
originally announced December 2019.
-
Security and Privacy Issues for Connected Vehicles
Authors:
Wenjun Xiong,
Robert Lagerström
Abstract:
Modern vehicles contain more than a hundred Electronic Control Units (ECUs) that communicate over different in-vehicle networks, and they are often connected to the Internet, which makes them vulnerable to various cyber-attacks. Besides, data collected by the connected vehicles is directly connected to the vehicular network. Thus, big vehicular data are collected, which are valuable and generate i…
▽ More
Modern vehicles contain more than a hundred Electronic Control Units (ECUs) that communicate over different in-vehicle networks, and they are often connected to the Internet, which makes them vulnerable to various cyber-attacks. Besides, data collected by the connected vehicles is directly connected to the vehicular network. Thus, big vehicular data are collected, which are valuable and generate insights into driver behavior. Previously, a probabilistic modeling and simulation language named vehicleLang is presented to analyze the security of connected vehicles. However, the privacy issues of vehicular data have not been addressed. To fill in the gap, this work present a privacy specification for vehicles based on vehicleLang, which uses the Meta Attack Language (MAL) to assess the security of connected vehicles in a formal way, with a special focus on the privacy aspect. To evaluate this work, test cases are also presented.
△ Less
Submitted 18 December, 2018; v1 submitted 12 December, 2018;
originally announced December 2018.
-
An adaptive software defined radio design based on a standard space telecommunication radio system API
Authors:
Wenhao Xiong,
Xin Tian,
Genshe Chen,
Khanh Pham,
Erik Blasch
Abstract:
Software defined radio (SDR) has become a popular tool for the implementation and testing for communications performance. The advantage of the SDR approach includes: a re-configurable design, adaptive response to changing conditions, efficient development, and highly versatile implementation. In order to understand the benefits of SDR, the space telecommunication radio system (STRS) was proposed b…
▽ More
Software defined radio (SDR) has become a popular tool for the implementation and testing for communications performance. The advantage of the SDR approach includes: a re-configurable design, adaptive response to changing conditions, efficient development, and highly versatile implementation. In order to understand the benefits of SDR, the space telecommunication radio system (STRS) was proposed by NASA Glenn research center (GRC) along with the standard application program interface (API) structure. Each component of the system uses a well-defined API to communicate with other components. The benefit of standard API is to relax the platform limitation of each component for addition options. For example, the waveform generating process can support a field programmable gate array (FPGA), personal computer (PC), or an embedded system. As long as the API defines the requirements, the generated waveform selection will work with the complete system. In this paper, we demonstrate the design and development of adaptive SDR following the STRS and standard API protocol. We introduce step by step the SDR testbed system including the controlling graphic user interface (GUI), database, GNU radio hardware control, and universal software radio peripheral (USRP) tranceiving front end. In addition, a performance evaluation in shown on the effectiveness of the SDR approach for space telecommunication.
△ Less
Submitted 25 November, 2017;
originally announced November 2017.
-
Achieving Human Parity in Conversational Speech Recognition
Authors:
W. Xiong,
J. Droppo,
X. Huang,
F. Seide,
M. Seltzer,
A. Stolcke,
D. Yu,
G. Zweig
Abstract:
Conversational speech recognition has served as a flagship speech recognition task since the release of the Switchboard corpus in the 1990s. In this paper, we measure the human error rate on the widely used NIST 2000 test set, and find that our latest automated system has reached human parity. The error rate of professional transcribers is 5.9% for the Switchboard portion of the data, in which new…
▽ More
Conversational speech recognition has served as a flagship speech recognition task since the release of the Switchboard corpus in the 1990s. In this paper, we measure the human error rate on the widely used NIST 2000 test set, and find that our latest automated system has reached human parity. The error rate of professional transcribers is 5.9% for the Switchboard portion of the data, in which newly acquainted pairs of people discuss an assigned topic, and 11.3% for the CallHome portion where friends and family members have open-ended conversations. In both cases, our automated system establishes a new state of the art, and edges past the human benchmark, achieving error rates of 5.8% and 11.0%, respectively. The key to our system's performance is the use of various convolutional and LSTM acoustic model architectures, combined with a novel spatial smoothing method and lattice-free MMI acoustic training, multiple recurrent neural network language modeling approaches, and a systematic use of system combination.
△ Less
Submitted 17 February, 2017; v1 submitted 17 October, 2016;
originally announced October 2016.
-
The Microsoft 2016 Conversational Speech Recognition System
Authors:
W. Xiong,
J. Droppo,
X. Huang,
F. Seide,
M. Seltzer,
A. Stolcke,
D. Yu,
G. Zweig
Abstract:
We describe Microsoft's conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training…
▽ More
We describe Microsoft's conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training provide significant gains for all acoustic model architectures. Language model rescoring with multiple forward and backward running RNNLMs, and word posterior-based system combination provide a 20% boost. The best single system uses a ResNet architecture acoustic model with RNNLM rescoring, and achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The combined system has an error rate of 6.2%, representing an improvement over previously reported results on this benchmark task.
△ Less
Submitted 25 January, 2017; v1 submitted 12 September, 2016;
originally announced September 2016.