-
HoneyGPT: Breaking the Trilemma in Terminal Honeypots with Large Language Model
Authors:
Ziyang Wang,
Jianzhou You,
Haining Wang,
Tianwei Yuan,
Shichao Lv,
Yang Wang,
Limin Sun
Abstract:
Honeypots, as a strategic cyber-deception mechanism designed to emulate authentic interactions and bait unauthorized entities, continue to struggle with balancing flexibility, interaction depth, and deceptive capability despite their evolution over decades. Often they also lack the capability of proactively adapting to an attacker's evolving tactics, which restricts the depth of engagement and sub…
▽ More
Honeypots, as a strategic cyber-deception mechanism designed to emulate authentic interactions and bait unauthorized entities, continue to struggle with balancing flexibility, interaction depth, and deceptive capability despite their evolution over decades. Often they also lack the capability of proactively adapting to an attacker's evolving tactics, which restricts the depth of engagement and subsequent information gathering. Under this context, the emergent capabilities of large language models, in tandem with pioneering prompt-based engineering techniques, offer a transformative shift in the design and deployment of honeypot technologies. In this paper, we introduce HoneyGPT, a pioneering honeypot architecture based on ChatGPT, heralding a new era of intelligent honeypot solutions characterized by their cost-effectiveness, high adaptability, and enhanced interactivity, coupled with a predisposition for proactive attacker engagement. Furthermore, we present a structured prompt engineering framework that augments long-term interaction memory and robust security analytics. This framework, integrating thought of chain tactics attuned to honeypot contexts, enhances interactivity and deception, deepens security analytics, and ensures sustained engagement.
The evaluation of HoneyGPT includes two parts: a baseline comparison based on a collected dataset and a field evaluation in real scenarios for four weeks. The baseline comparison demonstrates HoneyGPT's remarkable ability to strike a balance among flexibility, interaction depth, and deceptive capability. The field evaluation further validates HoneyGPT's efficacy, showing its marked superiority in enticing attackers into more profound interactive engagements and capturing a wider array of novel attack vectors in comparison to existing honeypot technologies.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Research progress on intelligent optimization techniques for energy-efficient design of ship hull forms
Authors:
Shuwei Zhu,
Siying Lv,
Kaifeng Chen,
Wei Fang,
Leilei Cao
Abstract:
The design optimization of ship hull form based on hydrodynamics theory and simulation-based design (SBD) technologies generally considers ship performance and energy efficiency performance as the design objective, which plays an important role in smart design and manufacturing of green ship. An optimal design of sustainable energy system requires multidisciplinary tools to build ships with the le…
▽ More
The design optimization of ship hull form based on hydrodynamics theory and simulation-based design (SBD) technologies generally considers ship performance and energy efficiency performance as the design objective, which plays an important role in smart design and manufacturing of green ship. An optimal design of sustainable energy system requires multidisciplinary tools to build ships with the least resistance and energy consumption. Through a systematic approach, this paper presents the research progress of energy-efficient design of ship hull forms based on intelligent optimization techniques. We discuss different methods involved in the optimization procedure, especially the latest developments of intelligent optimization algorithms and surrogate models. Moreover, current development trends and technical challenges of multidisciplinary design optimization and surrogate-assisted evolutionary algorithms for ship design are further analyzed. We explore the gaps and potential future directions, so as to paving the way towards the design of the next generation of more energy-efficient ship hull form.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement
Authors:
Weiming Xu,
Zhouxuan Chen,
Zhili Tan,
Shubo Lv,
Runduo Han,
Wenjiang Zhou,
Weifeng Zhao,
Lei Xie
Abstract:
A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-freque…
▽ More
A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-frequency neural network (MBTFNet) for singing voice enhancement, which particularly removes background music, noise and even backing vocals from singing recordings. MBTFNet combines inter and intra-band modeling for better processing of full-band signals. Dual-path modeling are introduced to expand the receptive field of the model. We propose an implicit personalized enhancement (IPE) stage based on signal-to-noise ratio (SNR) estimation, which further improves the performance of MBTFNet. Experiments show that our proposed model significantly outperforms several state-of-the-art SE and MSS models.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
RIS-aided Near-Field MIMO Communications: Codebook and Beam Training Design
Authors:
Suyu Lv,
Yuanwei Liu,
Xiaodong Xu,
Arumugam Nallanathan,
A. Lee Swindlehurst
Abstract:
Downlink reconfigurable intelligent surface (RIS)-assisted multi-input-multi-output (MIMO) systems are considered with far-field, near-field, and hybrid-far-near-field channels. According to the angular or distance information contained in the received signals, 1) a distance-based codebook is designed for near-field MIMO channels, based on which a hierarchical beam training scheme is proposed to r…
▽ More
Downlink reconfigurable intelligent surface (RIS)-assisted multi-input-multi-output (MIMO) systems are considered with far-field, near-field, and hybrid-far-near-field channels. According to the angular or distance information contained in the received signals, 1) a distance-based codebook is designed for near-field MIMO channels, based on which a hierarchical beam training scheme is proposed to reduce the training overhead; 2) a combined angular-distance codebook is designed for mixed-far-near-field MIMO channels, based on which a two-stage beam training scheme is proposed to achieve alignment in the angular and distance domains separately. For maximizing the achievable rate while reducing the complexity, an alternating optimization algorithm is proposed to carry out the joint optimization iteratively. Specifically, the RIS coefficient matrix is optimized through the beam training process, the optimal combining matrix is obtained from the closed-form solution for the mean square error (MSE) minimization problem, and the active beamforming matrix is optimized by exploiting the relationship between the achievable rate and MSE. Numerical results reveal that: 1) the proposed beam training schemes achieve near-optimal performance with a significantly decreased training overhead; 2) compared to the angular-only far-field channel model, taking the additional distance information into consideration will effectively improve the achievable rate when carrying out beam design for near-field communications.
△ Less
Submitted 30 September, 2023;
originally announced October 2023.
-
Personalized Federated Learning via Amortized Bayesian Meta-Learning
Authors:
Shiyu Liu,
Shaogao Lv,
Dun Zeng,
Zenglin Xu,
Hui Wang,
Yue Yu
Abstract:
Federated learning is a decentralized and privacy-preserving technique that enables multiple clients to collaborate with a server to learn a global model without exposing their private data. However, the presence of statistical heterogeneity among clients poses a challenge, as the global model may struggle to perform well on each client's specific task. To address this issue, we introduce a new pe…
▽ More
Federated learning is a decentralized and privacy-preserving technique that enables multiple clients to collaborate with a server to learn a global model without exposing their private data. However, the presence of statistical heterogeneity among clients poses a challenge, as the global model may struggle to perform well on each client's specific task. To address this issue, we introduce a new perspective on personalized federated learning through Amortized Bayesian Meta-Learning. Specifically, we propose a novel algorithm called \emph{FedABML}, which employs hierarchical variational inference across clients. The global prior aims to capture representations of common intrinsic structures from heterogeneous clients, which can then be transferred to their respective tasks and aid in the generation of accurate client-specific approximate posteriors through a few local updates. Our theoretical analysis provides an upper bound on the average generalization error and guarantees the generalization performance on unseen data. Finally, several empirical results are implemented to demonstrate that \emph{FedABML} outperforms several competitive baselines.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Robust Graph Structure Learning with the Alignment of Features and Adjacency Matrix
Authors:
Shaogao Lv,
Gang Wen,
Shiyu Liu,
Linsen Wei,
Ming Li
Abstract:
To improve the robustness of graph neural networks (GNN), graph structure learning (GSL) has attracted great interest due to the pervasiveness of noise in graph data. Many approaches have been proposed for GSL to jointly learn a clean graph structure and corresponding representations. To extend the previous work, this paper proposes a novel regularized GSL approach, particularly with an alignment…
▽ More
To improve the robustness of graph neural networks (GNN), graph structure learning (GSL) has attracted great interest due to the pervasiveness of noise in graph data. Many approaches have been proposed for GSL to jointly learn a clean graph structure and corresponding representations. To extend the previous work, this paper proposes a novel regularized GSL approach, particularly with an alignment of feature information and graph information, which is motivated mainly by our derived lower bound of node-level Rademacher complexity for GNNs. Additionally, our proposed approach incorporates sparse dimensional reduction to leverage low-dimensional node features that are relevant to the graph structure. To evaluate the effectiveness of our approach, we conduct experiments on real-world graphs. The results demonstrate that our proposed GSL method outperforms several competitive baselines, especially in scenarios where the graph structures are heavily affected by noise. Overall, our research highlights the importance of integrating feature and graph information alignment in GSL, as inspired by our derived theoretical result, and showcases the superiority of our approach in handling noisy graph structures through comprehensive experiments on real-world datasets.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Autonomous Drone Racing: Time-Optimal Spatial Iterative Learning Control within a Virtual Tube
Authors:
Shuli Lv,
Yan Gao,
Jiaxing Che,
Quan Quan
Abstract:
It is often necessary for drones to complete delivery, photography, and rescue in the shortest time to increase efficiency. Many autonomous drone races provide platforms to pursue algorithms to finish races as quickly as possible for the above purpose. Unfortunately, existing methods often fail to keep training and racing time short in drone racing competitions. This motivates us to develop a high…
▽ More
It is often necessary for drones to complete delivery, photography, and rescue in the shortest time to increase efficiency. Many autonomous drone races provide platforms to pursue algorithms to finish races as quickly as possible for the above purpose. Unfortunately, existing methods often fail to keep training and racing time short in drone racing competitions. This motivates us to develop a high-efficient learning method by imitating the training experience of top racing drivers. Unlike traditional iterative learning control methods for accurate tracking, the proposed approach iteratively learns a trajectory online to finish the race as quickly as possible. Simulations and experiments using different models show that the proposed approach is model-free and is able to achieve the optimal result with low computation requirements. Furthermore, this approach surpasses some state-of-the-art methods in racing time on a benchmark drone racing platform. An experiment on a real quadcopter is also performed to demonstrate its effectiveness.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
DCCRN-KWS: an audio bias based model for noise robust small-footprint keyword spotting
Authors:
Shubo Lv,
Xiong Wang,
Sining Sun,
Long Ma,
Lei Xie
Abstract:
Real-world complex acoustic environments especially the ones with a low signal-to-noise ratio (SNR) will bring tremendous challenges to a keyword spotting (KWS) system. Inspired by the recent advances of neural speech enhancement and context bias in speech recognition, we propose a robust audio context bias based DCCRN-KWS model to address this challenge. We form the whole architecture as a multi-…
▽ More
Real-world complex acoustic environments especially the ones with a low signal-to-noise ratio (SNR) will bring tremendous challenges to a keyword spotting (KWS) system. Inspired by the recent advances of neural speech enhancement and context bias in speech recognition, we propose a robust audio context bias based DCCRN-KWS model to address this challenge. We form the whole architecture as a multi-task learning framework for both denosing and keyword spotting, where the DCCRN encoder is connected with the KWS model. Helped with the denoising task, we further introduce an audio context bias module to leverage the real keyword samples and bias the network to better iscriminate keywords in noisy conditions. Feature merge and complex context linear modules are also introduced to strength such discrimination and to effectively leverage contextual information respectively. Experiments on the internal challenging dataset and the HIMIYA public dataset show that our DCCRN-KWS system is superior in performance, while ablation study demonstrates the good design of the whole model.
△ Less
Submitted 12 June, 2023; v1 submitted 20 May, 2023;
originally announced May 2023.
-
Stability and Generalization of lp-Regularized Stochastic Learning for GCN
Authors:
Shiyu Liu,
Linsen Wei,
Shaogao Lv,
Ming Li
Abstract:
Graph convolutional networks (GCN) are viewed as one of the most popular representations among the variants of graph neural networks over graph data and have shown powerful performance in empirical experiments. That $\ell_2$-based graph smoothing enforces the global smoothness of GCN, while (soft) $\ell_1$-based sparse graph learning tends to promote signal sparsity to trade for discontinuity. Thi…
▽ More
Graph convolutional networks (GCN) are viewed as one of the most popular representations among the variants of graph neural networks over graph data and have shown powerful performance in empirical experiments. That $\ell_2$-based graph smoothing enforces the global smoothness of GCN, while (soft) $\ell_1$-based sparse graph learning tends to promote signal sparsity to trade for discontinuity. This paper aims to quantify the trade-off of GCN between smoothness and sparsity, with the help of a general $\ell_p$-regularized $(1<p\leq 2)$ stochastic learning proposed within. While stability-based generalization analyses have been given in prior work for a second derivative objectiveness function, our $\ell_p$-regularized learning scheme does not satisfy such a smooth condition. To tackle this issue, we propose a novel SGD proximal algorithm for GCNs with an inexact operator. For a single-layer GCN, we establish an explicit theoretical understanding of GCN with the $\ell_p$-regularized stochastic learning by analyzing the stability of our SGD proximal algorithm. We conduct multiple empirical experiments to validate our theoretical findings.
△ Less
Submitted 19 June, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
Two-stage Neural Network for ICASSP 2023 Speech Signal Improvement Challenge
Authors:
Mingshuai Liu,
Shubo Lv,
Zihan Zhang,
Runduo Han,
Xiang Hao,
Xianjun Xia,
Li Chen,
Yijian Xiao,
Lei Xie
Abstract:
In ICASSP 2023 speech signal improvement challenge, we developed a dual-stage neural model which improves speech signal quality induced by different distortions in a stage-wise divide-and-conquer fashion. Specifically, in the first stage, the speech improvement network focuses on recovering the missing components of the spectrum, while in the second stage, our model aims to further suppress noise,…
▽ More
In ICASSP 2023 speech signal improvement challenge, we developed a dual-stage neural model which improves speech signal quality induced by different distortions in a stage-wise divide-and-conquer fashion. Specifically, in the first stage, the speech improvement network focuses on recovering the missing components of the spectrum, while in the second stage, our model aims to further suppress noise, reverberation, and artifacts introduced by the first-stage model. Achieving 0.446 in the final score and 0.517 in the P.835 score, our system ranks 4th in the non-real-time track.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
spatial-dccrn: dccrn equipped with frame-level angle feature and hybrid filtering for multi-channel speech enhancement
Authors:
Shubo Lv,
Yihui Fu,
Yukai Jv,
Lei Xie,
Weixin Zhu,
Wei Rao,
Yannan Wang
Abstract:
Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal. To make full use of spatial information and neural network based masking estimation, we propose a multi-channel denoising neural network -- Spatial DCCRN. Firstly, we extend S-DCCRN to multi-channel scenario, aiming at performing cascaded su…
▽ More
Recently, multi-channel speech enhancement has drawn much interest due to the use of spatial information to distinguish target speech from interfering signal. To make full use of spatial information and neural network based masking estimation, we propose a multi-channel denoising neural network -- Spatial DCCRN. Firstly, we extend S-DCCRN to multi-channel scenario, aiming at performing cascaded sub-channel and full-channel processing strategy, which can model different channels separately. Moreover, instead of only adopting multi-channel spectrum or concatenating first-channel's magnitude and IPD as the model's inputs, we apply an angle feature extraction module (AFE) to extract frame-level angle feature embeddings, which can help the model to apparently perceive spatial information. Finally, since the phenomenon of residual noise will be more serious when the noise and speech exist in the same time frequency (TF) bin, we particularly design a masking and map** filtering method to substitute the traditional filter-and-sum operation, with the purpose of cascading coarsely denoising, dereverberation and residual noise suppression. The proposed model, Spatial-DCCRN, has surpassed EaBNet, FasNet as well as several competitive models on the L3DAS22 Challenge dataset. Not only the 3D scenario, Spatial-DCCRN outperforms state-of-the-art (SOTA) model MIMO-UNet by a large margin in multiple evaluation metrics on the multi-channel ConferencingSpeech2021 Challenge dataset. Ablation studies also demonstrate the effectiveness of different contributions.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Efficient Long Sequential User Data Modeling for Click-Through Rate Prediction
Authors:
Qiwei Chen,
Yue Xu,
Changhua Pei,
Shanshan Lv,
Tao Zhuang,
Junfeng Ge
Abstract:
Recent studies on Click-Through Rate (CTR) prediction has reached new levels by modeling longer user behavior sequences. Among others, the two-stage methods stand out as the state-of-the-art (SOTA) solution for industrial applications. The two-stage methods first train a retrieval model to truncate the long behavior sequence beforehand and then use the truncated sequences to train a CTR model. How…
▽ More
Recent studies on Click-Through Rate (CTR) prediction has reached new levels by modeling longer user behavior sequences. Among others, the two-stage methods stand out as the state-of-the-art (SOTA) solution for industrial applications. The two-stage methods first train a retrieval model to truncate the long behavior sequence beforehand and then use the truncated sequences to train a CTR model. However, the retrieval model and the CTR model are trained separately. So the retrieved subsequences in the CTR model is inaccurate, which degrades the final performance. In this paper, we propose an end-to-end paradigm to model long behavior sequences, which is able to achieve superior performance along with remarkable cost-efficiency compared to existing models. Our contribution is three-fold: First, we propose a hashing-based efficient target attention (TA) network named ETA-Net to enable end-to-end user behavior retrieval based on low-cost bit-wise operations. The proposed ETA-Net can reduce the complexity of standard TA by orders of magnitude for sequential data modeling. Second, we propose a general system architecture as one viable solution to deploy ETA-Net on industrial systems. Particularly, ETA-Net has been deployed on the recommender system of Taobao, and brought 1.8% lift on CTR and 3.1% lift on Gross Merchandise Value (GMV) compared to the SOTA two-stage methods. Third, we conduct extensive experiments on both offline datasets and online A/B test. The results verify that the proposed model outperforms existing CTR models considerably, in terms of both CTR prediction performance and online cost-efficiency. ETA-Net now serves the main traffic of Taobao, delivering services to hundreds of millions of users towards billions of items every day.
△ Less
Submitted 25 September, 2022;
originally announced September 2022.
-
WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic Segmentation for Lung Adenocarcinoma
Authors:
Chu Han,
Xipeng Pan,
Lixu Yan,
Huan Lin,
Bingbing Li,
Su Yao,
Shanshan Lv,
Zhenwei Shi,
**hai Mai,
Jiatai Lin,
Bingchao Zhao,
Zeyan Xu,
Zhizhen Wang,
Yumeng Wang,
Yuan Zhang,
Huihui Wang,
Chao Zhu,
Chunhui Lin,
Lijian Mao,
Min Wu,
Luwen Duan,
**gsong Zhu,
Dong Hu,
Zijie Fang,
Yang Chen
, et al. (18 additional authors not shown)
Abstract:
Lung cancer is the leading cause of cancer death worldwide, and adenocarcinoma (LUAD) is the most common subtype. Exploiting the potential value of the histopathology images can promote precision medicine in oncology. Tissue segmentation is the basic upstream task of histopathology image analysis. Existing deep learning models have achieved superior segmentation performance but require sufficient…
▽ More
Lung cancer is the leading cause of cancer death worldwide, and adenocarcinoma (LUAD) is the most common subtype. Exploiting the potential value of the histopathology images can promote precision medicine in oncology. Tissue segmentation is the basic upstream task of histopathology image analysis. Existing deep learning models have achieved superior segmentation performance but require sufficient pixel-level annotations, which is time-consuming and expensive. To enrich the label resources of LUAD and to alleviate the annotation efforts, we organize this challenge WSSS4LUAD to call for the outstanding weakly-supervised semantic segmentation (WSSS) techniques for histopathology images of LUAD. Participants have to design the algorithm to segment tumor epithelial, tumor-associated stroma and normal tissue with only patch-level labels. This challenge includes 10,091 patch-level annotations (the training set) and over 130 million labeled pixels (the validation and test sets), from 87 WSIs (67 from GDPH, 20 from TCGA). All the labels were generated by a pathologist-in-the-loop pipeline with the help of AI models and checked by the label review board. Among 532 registrations, 28 teams submitted the results in the test phase with over 1,000 submissions. Finally, the first place team achieved mIoU of 0.8413 (tumor: 0.8389, stroma: 0.7931, normal: 0.8919). According to the technical reports of the top-tier teams, CAM is still the most popular approach in WSSS. Cutmix data augmentation has been widely adopted to generate more reliable samples. With the success of this challenge, we believe that WSSS approaches with patch-level annotations can be a complement to the traditional pixel annotations while reducing the annotation efforts. The entire dataset has been released to encourage more researches on computational pathology in LUAD and more novel WSSS techniques.
△ Less
Submitted 13 April, 2022; v1 submitted 13 April, 2022;
originally announced April 2022.
-
Predict the Rover Mobility over Soft Terrain using Articulated Wheeled Bevameter
Authors:
Wenyao Zhang,
Shipeng Lv,
Feng Xue,
Chen Yao,
Zheng Zhu,
Zhenzhong Jia
Abstract:
Robot mobility is critical for mission success, especially in soft or deformable terrains, where the complex wheel-soil interaction mechanics often leads to excessive wheel slip and sinkage, causing the eventual mission failure. To improve the success rate, online mobility prediction using vision, infrared imaging, or model-based stochastic methods have been used in the literature. This paper prop…
▽ More
Robot mobility is critical for mission success, especially in soft or deformable terrains, where the complex wheel-soil interaction mechanics often leads to excessive wheel slip and sinkage, causing the eventual mission failure. To improve the success rate, online mobility prediction using vision, infrared imaging, or model-based stochastic methods have been used in the literature. This paper proposes an on-board mobility prediction approach using an articulated wheeled bevameter that consists of a force-controlled arm and an instrumented bevameter (with force and vision sensors) as its end-effector. The proposed bevameter, which emulates the traditional terramechanics tests such as pressure-sinkage and shear experiments, can measure contact parameters ahead of the rover's body in real-time, and predict the slip and sinkage of supporting wheels over the probed region. Based on the predicted mobility, the rover can select a safer path in order to avoid dangerous regions such as those covered with quicksand. Compared to the literature, our proposed method can avoid the complicated terramechanics modeling and time-consuming stochastic prediction; it can also mitigate the inaccuracy issues arising in non-contact vision-based methods. We also conduct multiple experiments to validate the proposed approach.
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement
Authors:
Shubo Lv,
Yihui Fu,
Mengtao Xing,
Jiayao Sun,
Lei Xie,
Jun Huang,
Yannan Wang,
Tao Yu
Abstract:
In speech enhancement, complex neural network has shown promising performance due to their effectiveness in processing complex-valued spectrum. Most of the recent speech enhancement approaches mainly focus on wide-band signal with a sampling rate of 16K Hz. However, research on super wide band (e.g., 32K Hz) or even full-band (48K) denoising is still lacked due to the difficulty of modeling more f…
▽ More
In speech enhancement, complex neural network has shown promising performance due to their effectiveness in processing complex-valued spectrum. Most of the recent speech enhancement approaches mainly focus on wide-band signal with a sampling rate of 16K Hz. However, research on super wide band (e.g., 32K Hz) or even full-band (48K) denoising is still lacked due to the difficulty of modeling more frequency bands and particularly high frequency components. In this paper, we extend our previous deep complex convolution recurrent neural network (DCCRN) substantially to a super wide band version -- S-DCCRN, to perform speech denoising on speech of 32K Hz sampling rate. We first employ a cascaded sub-band and full-band processing module, which consists of two small-footprint DCCRNs -- one operates on sub-band signal and one operates on full-band signal, aiming at benefiting from both local and global frequency information. Moreover, instead of simply adopting the STFT feature as input, we use a complex feature encoder trained in an end-to-end manner to refine the information of different frequency bands. We also use a complex feature decoder to revert the feature to time-frequency domain. Finally, a learnable spectrum compression method is adopted to adjust the energy of different frequency bands, which is beneficial for neural network learning. The proposed model, S-DCCRN, has surpassed PercepNet as well as several competitive models and achieves state-of-the-art performance in terms of speech quality and intelligibility. Ablation studies also demonstrate the effectiveness of different contributions.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
SStaGCN: Simplified stacking based graph convolutional networks
Authors:
Jia Cai,
Zhilong Xiong,
Shaogao Lv
Abstract:
Graph convolutional network (GCN) is a powerful model studied broadly in various graph structural data learning tasks. However, to mitigate the over-smoothing phenomenon, and deal with heterogeneous graph structural data, the design of GCN model remains a crucial issue to be investigated. In this paper, we propose a novel GCN called SStaGCN (Simplified stacking based GCN) by utilizing the ideas of…
▽ More
Graph convolutional network (GCN) is a powerful model studied broadly in various graph structural data learning tasks. However, to mitigate the over-smoothing phenomenon, and deal with heterogeneous graph structural data, the design of GCN model remains a crucial issue to be investigated. In this paper, we propose a novel GCN called SStaGCN (Simplified stacking based GCN) by utilizing the ideas of stacking and aggregation, which is an adaptive general framework for tackling heterogeneous graph data. Specifically, we first use the base models of stacking to extract the node features of a graph. Subsequently, aggregation methods such as mean, attention and voting techniques are employed to further enhance the ability of node features extraction. Thereafter, the node features are considered as inputs and fed into vanilla GCN model. Furthermore, theoretical generalization bound analysis of the proposed model is explicitly given. Extensive experiments on $3$ public citation networks and another $3$ heterogeneous tabular data demonstrate the effectiveness and efficiency of the proposed approach over state-of-the-art GCNs. Notably, the proposed SStaGCN can efficiently mitigate the over-smoothing problem of GCN.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation
Authors:
Yihui Fu,
Yun Liu,
**gdong Li,
Dawei Luo,
Shubo Lv,
Yukai Jv,
Lei Xie
Abstract:
Complex spectrum and magnitude are considered as two major features of speech enhancement and dereverberation. Traditional approaches always treat these two features separately, ignoring their underlying relationship. In this paper, we propose Uformer, a Unet based dilated complex & real dual-path conformer network in both complex and magnitude domain for simultaneous speech enhancement and dereve…
▽ More
Complex spectrum and magnitude are considered as two major features of speech enhancement and dereverberation. Traditional approaches always treat these two features separately, ignoring their underlying relationship. In this paper, we propose Uformer, a Unet based dilated complex & real dual-path conformer network in both complex and magnitude domain for simultaneous speech enhancement and dereverberation. We exploit time attention (TA) and dilated convolution (DC) to leverage local and global contextual information and frequency attention (FA) to model dimensional information. These three sub-modules contained in the proposed dilated complex & real dual-path conformer module effectively improve the speech enhancement and dereverberation performance. Furthermore, hybrid encoder and decoder are adopted to simultaneously model the complex spectrum and magnitude and promote the information interaction between two domains. Encoder decoder attention is also applied to enhance the interaction between encoder and decoder. Our experimental results outperform all SOTA time and complex domain models objectively and subjectively. Specifically, Uformer reaches 3.6032 DNSMOS on the blind test set of Interspeech 2021 DNS Challenge, which outperforms all top-performed models. We also carry out ablation experiments to tease apart all proposed sub-modules that are most important.
△ Less
Submitted 4 May, 2022; v1 submitted 10 November, 2021;
originally announced November 2021.
-
A New Rational Approach to the Square Root of 5
Authors:
Shenghui Su,
Jianhua Zheng,
Shuwang Lv
Abstract:
In this paper, authors construct a new type of sequence which is named an extra-super increasing sequence, and give the definitions of the minimal super increasing sequence {a[0], a[1], ..., a[n]} and minimal extra-super increasing sequence {z[0], z[1], ..., z[n]}. Find that there always exists a fit n which makes (z[n] / z[n-1] - a[n] / a[n-1])= PHI, where PHI is the golden ratio conjugate with a…
▽ More
In this paper, authors construct a new type of sequence which is named an extra-super increasing sequence, and give the definitions of the minimal super increasing sequence {a[0], a[1], ..., a[n]} and minimal extra-super increasing sequence {z[0], z[1], ..., z[n]}. Find that there always exists a fit n which makes (z[n] / z[n-1] - a[n] / a[n-1])= PHI, where PHI is the golden ratio conjugate with a finite precision in the range of computer expression. Further, derive the formula radic(5) = 2(z[n] / z[n-1] - a[n] / a[n-1]) + 1, where n corresponds to the demanded precision. Experiments demonstrate that the approach to radic(5) through a term ratio difference is more smooth and expeditious than through a Taylor power series, and convince the authors that lim(n to infinity) (z[n] / z[n-1] - a[n] / a[n-1]) = PHI holds.
△ Less
Submitted 7 September, 2021; v1 submitted 30 August, 2021;
originally announced August 2021.
-
A New Lever Function with Adequate Indeterminacy
Authors:
Shenghui Su,
** Luo,
Shuwang Lv,
Maozhi Xu
Abstract:
The key transform of the REESSE1+ asymmetrical cryptosystem is Ci = (Ai * W ^ l(i)) ^ d (% M) with l(i) in Omega = {5, 7, ..., 2n + 3} for i = 1, ..., n, where l(i) is called a lever function. In this paper, the authors give a simplified key transform Ci = Ai * W ^ l(i) (% M) with a new lever function l(i) from {1, ..., n} to Omega = {+/-5, +/-6, ..., +/-(n + 4)}, where "+/-" means the selection o…
▽ More
The key transform of the REESSE1+ asymmetrical cryptosystem is Ci = (Ai * W ^ l(i)) ^ d (% M) with l(i) in Omega = {5, 7, ..., 2n + 3} for i = 1, ..., n, where l(i) is called a lever function. In this paper, the authors give a simplified key transform Ci = Ai * W ^ l(i) (% M) with a new lever function l(i) from {1, ..., n} to Omega = {+/-5, +/-6, ..., +/-(n + 4)}, where "+/-" means the selection of the "+" or "-" sign. Discuss the necessity of the new l(i), namely that a simplified private key is insecure if the new l(i) is a constant but not one-to-one function. Further, expound the sufficiency of the new l(i) from four aspects: (1) indeterminacy of the new l(i), (2) insufficient conditions for neutralizing the powers of W and W ^-1 even if Omega = {5, 6, ..., n + 4}, (3) verification by examples, and (4) running times of continued fraction attack and W-parameter intersection attack which are the two most efficient algorithms of the probabilistic polytime attacks so far. Last, the authors detail the relation between a lever function and a random oracle.
△ Less
Submitted 25 April, 2023; v1 submitted 30 August, 2021;
originally announced August 2021.
-
PAIR: Leveraging Passage-Centric Similarity Relation for Improving Dense Passage Retrieval
Authors:
Ruiyang Ren,
Shangwen Lv,
Yingqi Qu,
**g Liu,
Wayne Xin Zhao,
QiaoQiao She,
Hua Wu,
Haifeng Wang,
Ji-Rong Wen
Abstract:
Recently, dense passage retrieval has become a mainstream approach to finding relevant information in various natural language processing tasks. A number of studies have been devoted to improving the widely adopted dual-encoder architecture. However, most of the previous studies only consider query-centric similarity relation when learning the dual-encoder retriever. In order to capture more compr…
▽ More
Recently, dense passage retrieval has become a mainstream approach to finding relevant information in various natural language processing tasks. A number of studies have been devoted to improving the widely adopted dual-encoder architecture. However, most of the previous studies only consider query-centric similarity relation when learning the dual-encoder retriever. In order to capture more comprehensive similarity relations, we propose a novel approach that leverages both query-centric and PAssage-centric sImilarity Relations (called PAIR) for dense passage retrieval. To implement our approach, we make three major technical contributions by introducing formal formulations of the two kinds of similarity relations, generating high-quality pseudo labeled data via knowledge distillation, and designing an effective two-stage training procedure that incorporates passage-centric similarity relation constraint. Extensive experiments show that our approach significantly outperforms previous state-of-the-art models on both MSMARCO and Natural Questions datasets.
△ Less
Submitted 23 April, 2023; v1 submitted 12 August, 2021;
originally announced August 2021.
-
End-to-End User Behavior Retrieval in Click-Through RatePrediction Model
Authors:
Qiwei Chen,
Changhua Pei,
Shanshan Lv,
Chao Li,
Junfeng Ge,
Wenwu Ou
Abstract:
Click-Through Rate (CTR) prediction is one of the core tasks in recommender systems (RS). It predicts a personalized click probability for each user-item pair. Recently, researchers have found that the performance of CTR model can be improved greatly by taking user behavior sequence into consideration, especially long-term user behavior sequence. The report on an e-commerce website shows that 23\%…
▽ More
Click-Through Rate (CTR) prediction is one of the core tasks in recommender systems (RS). It predicts a personalized click probability for each user-item pair. Recently, researchers have found that the performance of CTR model can be improved greatly by taking user behavior sequence into consideration, especially long-term user behavior sequence. The report on an e-commerce website shows that 23\% of users have more than 1000 clicks during the past 5 months. Though there are numerous works focus on modeling sequential user behaviors, few works can handle long-term user behavior sequence due to the strict inference time constraint in real world system. Two-stage methods are proposed to push the limit for better performance. At the first stage, an auxiliary task is designed to retrieve the top-$k$ similar items from long-term user behavior sequence. At the second stage, the classical attention mechanism is conducted between the candidate item and $k$ items selected in the first stage. However, information gap happens between retrieval stage and the main CTR task. This goal divergence can greatly diminishing the performance gain of long-term user sequence. In this paper, inspired by Reformer, we propose a locality-sensitive hashing (LSH) method called ETA (End-to-end Target Attention) which can greatly reduce the training and inference cost and make the end-to-end training with long-term user behavior sequence possible. Both offline and online experiments confirm the effectiveness of our model. We deploy ETA into a large-scale real world E-commerce system and achieve extra 3.1\% improvements on GMV (Gross Merchandise Value) compared to a two-stage long user sequence CTR model.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
S2Looking: A Satellite Side-Looking Dataset for Building Change Detection
Authors:
Li Shen,
Yao Lu,
Hao Chen,
Hao Wei,
Donghai Xie,
Jiabao Yue,
Rui Chen,
Shouye Lv,
Bitao Jiang
Abstract:
Building-change detection underpins many important applications, especially in the military and crisis-management domains. Recent methods used for change detection have shifted towards deep learning, which depends on the quality of its training data. The assembly of large-scale annotated satellite imagery datasets is therefore essential for global building-change surveillance. Existing datasets al…
▽ More
Building-change detection underpins many important applications, especially in the military and crisis-management domains. Recent methods used for change detection have shifted towards deep learning, which depends on the quality of its training data. The assembly of large-scale annotated satellite imagery datasets is therefore essential for global building-change surveillance. Existing datasets almost exclusively offer near-nadir viewing angles. This limits the range of changes that can be detected. By offering larger observation ranges, the scroll imaging mode of optical satellites presents an opportunity to overcome this restriction. This paper therefore introduces S2Looking, a building-change-detection dataset that contains large-scale side-looking satellite images captured at various off-nadir angles. The dataset consists of 5000 bitemporal image pairs of rural areas and more than 65,920 annotated instances of changes throughout the world. The dataset can be used to train deep-learning-based change-detection algorithms. It expands upon existing datasets by providing (1) larger viewing angles; (2) large illumination variances; and (3) the added complexity of rural images. To facilitate {the} use of the dataset, a benchmark task has been established, and preliminary tests suggest that deep-learning algorithms find the dataset significantly more challenging than the closest-competing near-nadir dataset, LEVIR-CD+. S2Looking may therefore promote important advances in existing building-change-detection algorithms. The dataset is available at https://github.com/S2Looking/.
△ Less
Submitted 11 January, 2022; v1 submitted 19 July, 2021;
originally announced July 2021.
-
DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement
Authors:
Shubo Lv,
Yanxin Hu,
Shimin Zhang,
Lei Xie
Abstract:
Deep complex convolution recurrent network (DCCRN), which extends CRN with complex structure, has achieved superior performance in MOS evaluation in Interspeech 2020 deep noise suppression challenge (DNS2020). This paper further extends DCCRN with the following significant revisions. We first extend the model to sub-band processing where the bands are split and merged by learnable neural network f…
▽ More
Deep complex convolution recurrent network (DCCRN), which extends CRN with complex structure, has achieved superior performance in MOS evaluation in Interspeech 2020 deep noise suppression challenge (DNS2020). This paper further extends DCCRN with the following significant revisions. We first extend the model to sub-band processing where the bands are split and merged by learnable neural network filters instead of engineered FIR filters, leading to a faster noise suppressor trained in an end-to-end manner. Then the LSTM is further substituted with a complex TF-LSTM to better model temporal dependencies along both time and frequency axes. Moreover, instead of simply concatenating the output of each encoder layer to the input of the corresponding decoder layer, we use convolution blocks to first aggregate essential information from the encoder output before feeding it to the decoder layers. We specifically formulate the decoder with an extra a priori SNR estimation module to maintain good speech quality while removing noise. Finally a post-processing module is adopted to further suppress the unnatural residual noise. The new model, named DCCRN+, has surpassed the original DCCRN as well as several competitive models in terms of PESQ and DNSMOS, and has achieved superior performance in the new Interspeech 2021 DNS challenge
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
F-T-LSTM based Complex Network for Joint Acoustic Echo Cancellation and Speech Enhancement
Authors:
Shimin Zhang,
Yuxiang Kong,
Shubo Lv,
Yanxin Hu,
Lei Xie
Abstract:
With the increasing demand for audio communication and online conference, ensuring the robustness of Acoustic Echo Cancellation (AEC) under the complicated acoustic scenario including noise, reverberation and nonlinear distortion has become a top issue. Although there have been some traditional methods that consider nonlinear distortion, they are still inefficient for echo suppression and the perf…
▽ More
With the increasing demand for audio communication and online conference, ensuring the robustness of Acoustic Echo Cancellation (AEC) under the complicated acoustic scenario including noise, reverberation and nonlinear distortion has become a top issue. Although there have been some traditional methods that consider nonlinear distortion, they are still inefficient for echo suppression and the performance will be attenuated when noise is present. In this paper, we present a real-time AEC approach using complex neural network to better modeling the important phase information and frequency-time-LSTMs (F-T-LSTM), which scan both frequency and time axis, for better temporal modeling. Moreover, we utilize modified SI-SNR as cost function to make the model to have better echo cancellation and noise suppression (NS) performance. With only 1.4M parameters, the proposed approach outperforms the AEC-challenge baseline by 0.27 in terms of Mean Opinion Score (MOS).
△ Less
Submitted 16 June, 2021; v1 submitted 14 June, 2021;
originally announced June 2021.
-
Combining Supervised and Un-supervised Learning for Automatic Citrus Segmentation
Authors:
Heqing Huang,
Tongbin Huang,
Zhen Li,
Zhiwei Wei,
Shilei Lv
Abstract:
Citrus segmentation is a key step of automatic citrus picking. While most current image segmentation approaches achieve good segmentation results by pixel-wise segmentation, these supervised learning-based methods require a large amount of annotated data, and do not consider the continuous temporal changes of citrus position in real-world applications. In this paper, we first train a simple CNN wi…
▽ More
Citrus segmentation is a key step of automatic citrus picking. While most current image segmentation approaches achieve good segmentation results by pixel-wise segmentation, these supervised learning-based methods require a large amount of annotated data, and do not consider the continuous temporal changes of citrus position in real-world applications. In this paper, we first train a simple CNN with a small number of labelled citrus images in a supervised manner, which can roughly predict the citrus location from each frame. Then, we extend a state-of-the-art unsupervised learning approach to pre-learn the citrus's potential movements between frames from unlabelled citrus's videos. To take advantages of both networks, we employ the multimodal transformer to combine supervised learned static information and unsupervised learned movement information. The experimental results show that combing both network allows the prediction accuracy reached at 88.3$\%$ IOU and 93.6$\%$ precision, outperforming the original supervised baseline 1.2$\%$ and 2.4$\%$. Compared with most of the existing citrus segmentation methods, our method uses a small amount of supervised data and a large number of unsupervised data, while learning the pixel level location information and the temporal information of citrus changes to enhance the segmentation effect.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
Authors:
Yihui Fu,
Luyao Cheng,
Shubo Lv,
Yukai Jv,
Yuxiang Kong,
Zhuo Chen,
Yanxin Hu,
Lei Xie,
Jian Wu,
Hui Bu,
Xin Xu,
Jun Du,
**gdong Chen
Abstract:
In this paper, we present AISHELL-4, a sizable real-recorded Mandarin speech dataset collected by 8-channel circular microphone array for speech processing in conference scenario. The dataset consists of 211 recorded meeting sessions, each containing 4 to 8 speakers, with a total length of 120 hours. This dataset aims to bridge the advanced research on multi-speaker processing and the practical ap…
▽ More
In this paper, we present AISHELL-4, a sizable real-recorded Mandarin speech dataset collected by 8-channel circular microphone array for speech processing in conference scenario. The dataset consists of 211 recorded meeting sessions, each containing 4 to 8 speakers, with a total length of 120 hours. This dataset aims to bridge the advanced research on multi-speaker processing and the practical application scenario in three aspects. With real recorded meetings, AISHELL-4 provides realistic acoustics and rich natural speech characteristics in conversation such as short pause, speech overlap, quick speaker turn, noise, etc. Meanwhile, accurate transcription and speaker voice activity are provided for each meeting in AISHELL-4. This allows the researchers to explore different aspects in meeting processing, ranging from individual tasks such as speech front-end processing, speech recognition and speaker diarization, to multi-modality modeling and joint optimization of relevant tasks. Given most open source dataset for multi-speaker tasks are in English, AISHELL-4 is the only Mandarin dataset for conversation speech, providing additional value for data diversity in speech community. We also release a PyTorch-based training and evaluation framework as baseline system to promote reproducible research in this field.
△ Less
Submitted 10 August, 2021; v1 submitted 8 April, 2021;
originally announced April 2021.
-
Communication-efficient Byzantine-robust distributed learning with statistical guarantee
Authors:
Xingcai Zhou,
Le Chang,
Pengfei Xu,
Shaogao Lv
Abstract:
Communication efficiency and robustness are two major issues in modern distributed learning framework. This is due to the practical situations where some computing nodes may have limited communication power or may behave adversarial behaviors. To address the two issues simultaneously, this paper develops two communication-efficient and robust distributed learning algorithms for convex problems. Ou…
▽ More
Communication efficiency and robustness are two major issues in modern distributed learning framework. This is due to the practical situations where some computing nodes may have limited communication power or may behave adversarial behaviors. To address the two issues simultaneously, this paper develops two communication-efficient and robust distributed learning algorithms for convex problems. Our motivation is based on surrogate likelihood framework and the median and trimmed mean operations. Particularly, the proposed algorithms are provably robust against Byzantine failures, and also achieve optimal statistical rates for strong convex losses and convex (non-smooth) penalties. For typical statistical models such as generalized linear models, our results show that statistical errors dominate optimization errors in finite iterations. Simulated and real data experiments are conducted to demonstrate the numerical performance of our algorithms.
△ Less
Submitted 27 February, 2021;
originally announced March 2021.
-
Generalization bounds for graph convolutional neural networks via Rademacher complexity
Authors:
Shaogao Lv
Abstract:
This paper aims at studying the sample complexity of graph convolutional networks (GCNs), by providing tight upper bounds of Rademacher complexity for GCN models with a single hidden layer. Under regularity conditions, theses derived complexity bounds explicitly depend on the largest eigenvalue of graph convolution filter and the degree distribution of the graph. Again, we provide a lower bound of…
▽ More
This paper aims at studying the sample complexity of graph convolutional networks (GCNs), by providing tight upper bounds of Rademacher complexity for GCN models with a single hidden layer. Under regularity conditions, theses derived complexity bounds explicitly depend on the largest eigenvalue of graph convolution filter and the degree distribution of the graph. Again, we provide a lower bound of Rademacher complexity for GCNs to show optimality of our derived upper bounds. Taking two commonly used examples as representatives, we discuss the implications of our results in designing graph convolution filters an graph distribution.
△ Less
Submitted 19 February, 2021;
originally announced February 2021.
-
DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
Authors:
Yanxin Hu,
Yun Liu,
Shubo Lv,
Mengtao Xing,
Shimin Zhang,
Yihui Fu,
Jian Wu,
Bihong Zhang,
Lei Xie
Abstract:
Speech enhancement has benefited from the success of deep learning in terms of intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods focus on predicting TF-masks or speech spectrum, via a naive convolution neural network (CNN) or recurrent neural network (RNN). Some recent studies use complex-valued spectrogram as a training target but train in a real-valued netwo…
▽ More
Speech enhancement has benefited from the success of deep learning in terms of intelligibility and perceptual quality. Conventional time-frequency (TF) domain methods focus on predicting TF-masks or speech spectrum, via a naive convolution neural network (CNN) or recurrent neural network (RNN). Some recent studies use complex-valued spectrogram as a training target but train in a real-valued network, predicting the magnitude and phase component or real and imaginary part, respectively. Particularly, convolution recurrent network (CRN) integrates a convolutional encoder-decoder (CED) structure and long short-term memory (LSTM), which has been proven to be helpful for complex targets. In order to train the complex target more effectively, in this paper, we design a new network structure simulating the complex-valued operation, called Deep Complex Convolution Recurrent Network (DCCRN), where both CNN and RNN structures can handle complex-valued operation. The proposed DCCRN models are very competitive over other previous networks, either on objective or subjective metric. With only 3.7M parameters, our DCCRN models submitted to the Interspeech 2020 Deep Noise Suppression (DNS) challenge ranked first for the real-time-track and second for the non-real-time track in terms of Mean Opinion Score (MOS).
△ Less
Submitted 22 September, 2020; v1 submitted 1 August, 2020;
originally announced August 2020.
-
Establishing Secrecy Region for Directional Modulation Scheme with Random Frequency Diverse Array
Authors:
Sheng** Lv,
**song Hu,
Youjia Chen,
Zhimeng Xu,
Zhizhang,
Chen
Abstract:
Random frequency diverse array (RFDA) based directional modulation (DM) was proposed as a promising technology in secure communications to achieve a precise transmission of confidential messages, and artificial noise (AN) was considered as an important helper in RFDA-DM. Compared with previous works that only focus on the spot of the desired receiver, in this work, we investigate a secrecy region…
▽ More
Random frequency diverse array (RFDA) based directional modulation (DM) was proposed as a promising technology in secure communications to achieve a precise transmission of confidential messages, and artificial noise (AN) was considered as an important helper in RFDA-DM. Compared with previous works that only focus on the spot of the desired receiver, in this work, we investigate a secrecy region around the desired receiver, that is, a specific range and angle resolution around the desired receiver. Firstly, the minimum number of antennas and the bandwidth needed to achieve a secrecy region are derived. Moreover, based on the lower bound of the secrecy capacity in RFDA-DM-AN scheme, we investigate the performance impact of AN on the secrecy capacity. From this work, we conclude that: 1) AN is not always beneficial to the secure transmission. Specifically, when the number of antennas is sufficiently large and the transmit power is smaller than a specified value, AN will reduce secrecy capacity due to the consumption of limited transmit power. 2) Increasing bandwidth will enlarge the set for randomly allocating frequencies and thus lead to a higher secrecy capacity. 3) The minimum number of antennas increases as the predefined secrecy transmission rate increases.
△ Less
Submitted 9 July, 2020;
originally announced July 2020.
-
Deep learning to estimate the physical proportion of infected region of lung for COVID-19 pneumonia with CT image set
Authors:
Wei Wu,
Yu Shi,
Xukun Li,
Yukun Zhou,
Peng Du,
Shuangzhi Lv,
Tingbo Liang,
Jifang Sheng
Abstract:
Utilizing computed tomography (CT) images to quickly estimate the severity of cases with COVID-19 is one of the most straightforward and efficacious methods. Two tasks were studied in this present paper. One was to segment the mask of intact lung in case of pneumonia. Another was to generate the masks of regions infected by COVID-19. The masks of these two parts of images then were converted to co…
▽ More
Utilizing computed tomography (CT) images to quickly estimate the severity of cases with COVID-19 is one of the most straightforward and efficacious methods. Two tasks were studied in this present paper. One was to segment the mask of intact lung in case of pneumonia. Another was to generate the masks of regions infected by COVID-19. The masks of these two parts of images then were converted to corresponding volumes to calculate the physical proportion of infected region of lung. A total of 129 CT image set were herein collected and studied. The intrinsic Hounsfiled value of CT images was firstly utilized to generate the initial dirty version of labeled masks both for intact lung and infected regions. Then, the samples were carefully adjusted and improved by two professional radiologists to generate the final training set and test benchmark. Two deep learning models were evaluated: UNet and 2.5D UNet. For the segment of infected regions, a deep learning based classifier was followed to remove unrelated blur-edged regions that were wrongly segmented out such as air tube and blood vessel tissue etc. For the segmented masks of intact lung and infected regions, the best method could achieve 0.972 and 0.757 measure in mean Dice similarity coefficient on our test benchmark. As the overall proportion of infected region of lung, the final result showed 0.961 (Pearson's correlation coefficient) and 11.7% (mean absolute percent error). The instant proportion of infected regions of lung could be used as a visual evidence to assist clinical physician to determine the severity of the case. Furthermore, a quantified report of infected regions can help predict the prognosis for COVID-19 cases which were scanned periodically within the treatment cycle.
△ Less
Submitted 8 June, 2020;
originally announced June 2020.
-
NTIRE 2020 Challenge on Real Image Denoising: Dataset, Methods and Results
Authors:
Abdelrahman Abdelhamed,
Mahmoud Afifi,
Radu Timofte,
Michael S. Brown,
Yue Cao,
Zhilu Zhang,
Wangmeng Zuo,
Xiaoling Zhang,
Jiye Liu,
Wendong Chen,
Changyuan Wen,
Meng Liu,
Shuailin Lv,
Yunchao Zhang,
Zhihong Pan,
Baopu Li,
Teng Xi,
Yanwen Fan,
Xiyu Yu,
Gang Zhang,
**gtuo Liu,
Junyu Han,
Errui Ding,
Songhyun Yu,
Bumjun Park
, et al. (65 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2020 challenge on real image denoising with focus on the newly introduced dataset, the proposed methods and their results. The challenge is a new version of the previous NTIRE 2019 challenge on real image denoising that was based on the SIDD benchmark. This challenge is based on a newly collected validation and testing image datasets, and hence, named SIDD+. This chall…
▽ More
This paper reviews the NTIRE 2020 challenge on real image denoising with focus on the newly introduced dataset, the proposed methods and their results. The challenge is a new version of the previous NTIRE 2019 challenge on real image denoising that was based on the SIDD benchmark. This challenge is based on a newly collected validation and testing image datasets, and hence, named SIDD+. This challenge has two tracks for quantitatively evaluating image denoising performance in (1) the Bayer-pattern rawRGB and (2) the standard RGB (sRGB) color spaces. Each track ~250 registered participants. A total of 22 teams, proposing 24 methods, competed in the final phase of the challenge. The proposed methods by the participating teams represent the current state-of-the-art performance in image denoising targeting real noisy images. The newly collected SIDD+ datasets are publicly available at: https://bit.ly/siddplus_data.
△ Less
Submitted 8 May, 2020;
originally announced May 2020.
-
Pre-training Text Representations as Meta Learning
Authors:
Shangwen Lv,
Yuechen Wang,
Daya Guo,
Duyu Tang,
Nan Duan,
Fuqing Zhu,
Ming Gong,
Linjun Shou,
Ryan Ma,
Daxin Jiang,
Guihong Cao,
Ming Zhou,
Songlin Hu
Abstract:
Pre-training text representations has recently been shown to significantly improve the state-of-the-art in many natural language processing tasks. The central goal of pre-training is to learn text representations that are useful for subsequent tasks. However, existing approaches are optimized by minimizing a proxy objective, such as the negative log likelihood of language modeling. In this work, w…
▽ More
Pre-training text representations has recently been shown to significantly improve the state-of-the-art in many natural language processing tasks. The central goal of pre-training is to learn text representations that are useful for subsequent tasks. However, existing approaches are optimized by minimizing a proxy objective, such as the negative log likelihood of language modeling. In this work, we introduce a learning algorithm which directly optimizes model's ability to learn text representations for effective learning of downstream tasks. We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps. The standard multi-task learning objective adopted in BERT is a special case of our learning algorithm where the depth of meta-train is zero. We study the problem in two settings: unsupervised pre-training and supervised pre-training with different pre-training objects to verify the generality of our approach.Experimental results show that our algorithm brings improvements and learns better initializations for a variety of downstream tasks.
△ Less
Submitted 12 April, 2020;
originally announced April 2020.
-
Deep Learning System to Screen Coronavirus Disease 2019 Pneumonia
Authors:
Xiaowei Xu,
Xiangao Jiang,
Chunlian Ma,
Peng Du,
Xukun Li,
Shuangzhi Lv,
Liang Yu,
Yanfei Chen,
Junwei Su,
Guan**g Lang,
Yongtao Li,
Hong Zhao,
Kai** Xu,
Lingxiang Ruan,
Wei Wu
Abstract:
We found that the real time reverse transcription-polymerase chain reaction (RT-PCR) detection of viral RNA from sputum or nasopharyngeal swab has a relatively low positive rate in the early stage to determine COVID-19 (named by the World Health Organization). The manifestations of computed tomography (CT) imaging of COVID-19 had their own characteristics, which are different from other types of v…
▽ More
We found that the real time reverse transcription-polymerase chain reaction (RT-PCR) detection of viral RNA from sputum or nasopharyngeal swab has a relatively low positive rate in the early stage to determine COVID-19 (named by the World Health Organization). The manifestations of computed tomography (CT) imaging of COVID-19 had their own characteristics, which are different from other types of viral pneumonia, such as Influenza-A viral pneumonia. Therefore, clinical doctors call for another early diagnostic criteria for this new type of pneumonia as soon as possible.This study aimed to establish an early screening model to distinguish COVID-19 pneumonia from Influenza-A viral pneumonia and healthy cases with pulmonary CT images using deep learning techniques. The candidate infection regions were first segmented out using a 3-dimensional deep learning model from pulmonary CT image set. These separated images were then categorized into COVID-19, Influenza-A viral pneumonia and irrelevant to infection groups, together with the corresponding confidence scores using a location-attention classification model. Finally the infection type and total confidence score of this CT case were calculated with Noisy-or Bayesian function.The experiments result of benchmark dataset showed that the overall accuracy was 86.7 % from the perspective of CT cases as a whole.The deep learning models established in this study were effective for the early screening of COVID-19 patients and demonstrated to be a promising supplementary diagnostic method for frontline clinical doctors.
△ Less
Submitted 21 February, 2020;
originally announced February 2020.
-
Financial Market Directional Forecasting With Stacked Denoising Autoencoder
Authors:
Shaogao Lv,
Yongchao Hou,
Hongwei Zhou
Abstract:
Forecasting stock market direction is always an amazing but challenging problem in finance. Although many popular shallow computational methods (such as Backpropagation Network and Support Vector Machine) have extensively been proposed, most algorithms have not yet attained a desirable level of applicability. In this paper, we present a deep learning model with strong ability to generate high leve…
▽ More
Forecasting stock market direction is always an amazing but challenging problem in finance. Although many popular shallow computational methods (such as Backpropagation Network and Support Vector Machine) have extensively been proposed, most algorithms have not yet attained a desirable level of applicability. In this paper, we present a deep learning model with strong ability to generate high level feature representations for accurate financial prediction. Precisely, a stacked denoising autoencoder (SDAE) from deep learning is applied to predict the daily CSI 300 index, from Shanghai and Shenzhen Stock Exchanges in China. We use six evaluation criteria to evaluate its performance compared with the back propagation network, support vector machine. The experiment shows that the underlying financial model with deep machine technology has a significant advantage for the prediction of the CSI 300 index.
△ Less
Submitted 2 December, 2019;
originally announced December 2019.
-
Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering
Authors:
Shangwen Lv,
Daya Guo,
**g**g Xu,
Duyu Tang,
Nan Duan,
Ming Gong,
Linjun Shou,
Daxin Jiang,
Guihong Cao,
Songlin Hu
Abstract:
Commonsense question answering aims to answer questions which require background knowledge that is not explicitly expressed in the question. The key challenge is how to obtain evidence from external knowledge and make predictions based on the evidence. Recent works either learn to generate evidence from human-annotated evidence which is expensive to collect, or extract evidence from either structu…
▽ More
Commonsense question answering aims to answer questions which require background knowledge that is not explicitly expressed in the question. The key challenge is how to obtain evidence from external knowledge and make predictions based on the evidence. Recent works either learn to generate evidence from human-annotated evidence which is expensive to collect, or extract evidence from either structured or unstructured knowledge bases which fails to take advantages of both sources. In this work, we propose to automatically extract evidence from heterogeneous knowledge sources, and answer questions based on the extracted evidence. Specifically, we extract evidence from both structured knowledge base (i.e. ConceptNet) and Wikipedia plain texts. We construct graphs for both sources to obtain the relational structures of evidence. Based on these graphs, we propose a graph-based approach consisting of a graph-based contextual word representation learning module and a graph-based inference module. The first module utilizes graph structural information to re-define the distance between words for learning better contextual word representations. The second module adopts graph convolutional network to encode neighbor information into the representations of nodes, and aggregates evidence with graph attention mechanism for predicting the final answer. Experimental results on CommonsenseQA dataset illustrate that our graph-based approach over both knowledge sources brings improvement over strong baselines. Our approach achieves the state-of-the-art accuracy (75.3%) on the CommonsenseQA leaderboard.
△ Less
Submitted 8 June, 2020; v1 submitted 9 September, 2019;
originally announced September 2019.
-
Learning review representations from user and product level information for spam detection
Authors:
Chunyuan Yuan,
Wei Zhou,
Qianwen Ma,
Shangwen Lv,
Jizhong Han,
Songlin Hu
Abstract:
Opinion spam has become a widespread problem in social media, where hired spammers write deceptive reviews to promote or demote products to mislead the consumers for profit or fame. Existing works mainly focus on manually designing discrete textual or behavior features, which cannot capture complex semantics of reviews. Although recent works apply deep learning methods to learn review-level semant…
▽ More
Opinion spam has become a widespread problem in social media, where hired spammers write deceptive reviews to promote or demote products to mislead the consumers for profit or fame. Existing works mainly focus on manually designing discrete textual or behavior features, which cannot capture complex semantics of reviews. Although recent works apply deep learning methods to learn review-level semantic features, their models ignore the impact of the user-level and product-level information on learning review semantics and the inherent user-review-product relationship information. In this paper, we propose a Hierarchical Fusion Attention Network (HFAN) to automatically learn the semantics of reviews from the user and product level. Specifically, we design a multi-attention unit to extract user(product)-related review information. Then, we use orthogonal decomposition and fusion attention to learn a user, review, and product representation from the review information. Finally, we take the review as a relation between user and product entity and apply TransH to jointly encode this relationship into review representation. Experimental results obtained more than 10\% absolute precision improvement over the state-of-the-art performances on four real-world datasets, which show the effectiveness and versatility of the model.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
The Prototype of Decentralized Multilateral Co-Governing Post-IP Internet Architecture and Its Testing on Operator Networks
Authors:
Hui Li,
Jiangxing Wu,
Kaixuan Xing,
Peng Yi,
Julong Lan,
Xinsheng Ji,
Qinrang Liu,
Shisheng Chen,
Wei Liang,
**wu Wei,
Wei Li,
Fusheng Zhu,
Kaiyan Tian,
Jiang Zhu,
Yiqin Lu,
Ke Xu,
Jiaxing Song,
Yijun Liu,
Junfeng Ma,
Rui Xu,
Jianming Que,
Weihao Yang,
Weihao Miu,
Zefeng Zheng,
Guohua Wei
, et al. (10 additional authors not shown)
Abstract:
The Internet has become the most important infrastructure of modern society, while the existing IP network is unable to provide high-quality service. The unilateralism IP network is unable to satisfy the Co-managing and Co-governing demands to Cyberspace for most Nations in the world as well. Facing this challenge, we propose a novel Decentralized Multilateral Co-Governing Post-IP Internet archite…
▽ More
The Internet has become the most important infrastructure of modern society, while the existing IP network is unable to provide high-quality service. The unilateralism IP network is unable to satisfy the Co-managing and Co-governing demands to Cyberspace for most Nations in the world as well. Facing this challenge, we propose a novel Decentralized Multilateral Co-Governing Post-IP Internet architecture. To verify its effectiveness, we develop the prototype on the operator's networks including China Mainland, Hong Kong, and Macao. The experiments and testing results show that this architecture is feasible for co-existing of Content-Centric Networking and IP network, and it might become a Chinese Solution to the world.
△ Less
Submitted 17 June, 2019;
originally announced June 2019.
-
Forest Representation Learning Guided by Margin Distribution
Authors:
Shen-Huan Lv,
Liang Yang,
Zhi-Hua Zhou
Abstract:
In this paper, we reformulate the forest representation learning approach as an additive model which boosts the augmented feature instead of the prediction. We substantially improve the upper bound of generalization gap from $\mathcal{O}(\sqrt\frac{\ln m}{m})$ to $\mathcal{O}(\frac{\ln m}{m})$, while $λ$ - the margin ratio between the margin standard deviation and the margin mean is small enough.…
▽ More
In this paper, we reformulate the forest representation learning approach as an additive model which boosts the augmented feature instead of the prediction. We substantially improve the upper bound of generalization gap from $\mathcal{O}(\sqrt\frac{\ln m}{m})$ to $\mathcal{O}(\frac{\ln m}{m})$, while $λ$ - the margin ratio between the margin standard deviation and the margin mean is small enough. This tighter upper bound inspires us to optimize the margin distribution ratio $λ$. Therefore, we design the margin distribution reweighting approach (mdDF) to achieve small ratio $λ$ by boosting the augmented feature. Experiments and visualizations confirm the effectiveness of the approach in terms of performance and representation learning ability. This study offers a novel understanding of the cascaded deep forest from the margin-theory perspective and further uses the mdDF approach to guide the layer-by-layer forest representation learning.
△ Less
Submitted 7 May, 2019;
originally announced May 2019.
-
Conditional BERT Contextual Augmentation
Authors:
Xing Wu,
Shangwen Lv,
Liangjun Zang,
Jizhong Han,
Songlin Hu
Abstract:
We propose a novel data augmentation method for labeled sentences called conditional BERT contextual augmentation. Data augmentation methods are often applied to prevent overfitting and improve generalization of deep neural network models. Recently proposed contextual augmentation augments labeled sentences by randomly replacing words with more varied substitutions predicted by language model. BER…
▽ More
We propose a novel data augmentation method for labeled sentences called conditional BERT contextual augmentation. Data augmentation methods are often applied to prevent overfitting and improve generalization of deep neural network models. Recently proposed contextual augmentation augments labeled sentences by randomly replacing words with more varied substitutions predicted by language model. BERT demonstrates that a deep bidirectional language model is more powerful than either an unidirectional language model or the shallow concatenation of a forward and backward model. We retrofit BERT to conditional BERT by introducing a new conditional masked language model\footnote{The term "conditional masked language model" appeared once in original BERT paper, which indicates context-conditional, is equivalent to term "masked language model". In our paper, "conditional masked language model" indicates we apply extra label-conditional constraint to the "masked language model".} task. The well trained conditional BERT can be applied to enhance contextual augmentation. Experiments on six various different text classification tasks show that our method can be easily applied to both convolutional or recurrent neural networks classifier to obtain obvious improvement.
△ Less
Submitted 17 December, 2018;
originally announced December 2018.
-
Performance Analysis of Low-Interference N-Continuous OFDM
Authors:
Peng Wei,
Yue Xiao,
Lilin Dan,
Shichao Lv,
Wei Xiang
Abstract:
The low-interference N-continuous orthogonal frequency division multiplexing (NC-OFDM) system [25], [26] is investigated in terms of power spectrum density (PSD) and bit error rate (BER), to prove and quantify its advantages over traditional NC-OFDM. The PSD and BER performances of the low-interference scheme are analyzed and compared under the parameters of the highest derivative order (HDO) and…
▽ More
The low-interference N-continuous orthogonal frequency division multiplexing (NC-OFDM) system [25], [26] is investigated in terms of power spectrum density (PSD) and bit error rate (BER), to prove and quantify its advantages over traditional NC-OFDM. The PSD and BER performances of the low-interference scheme are analyzed and compared under the parameters of the highest derivative order (HDO) and the length of the smooth signal. In the context of PSD, different from one discontinuous point per NC-OFDM symbol in [25], the sidelobe suppression performance is evaluated upon considering two discontinuous points due to the finite continuity of the smooth signal and its higher-order derivatives. It was shown that with an increased HDO and an increased length of the smooth signal, a more rapid sidelobe decaying is achieved, for the significant continuity improvement of the OFDM signal and its higher-order derivatives. However, our PSD analysis also shows that if the length of the smooth signal is set inappropriately, the performance may be degraded, even if the HDO is large. Furthermore, it was shown in the error performance analysis that under the assumptions of perfect and imperfect synchronization, the low-interference scheme incurs small BER performance degradation for a short length of the smooth signal or a small HDO as opposed to conventional NC-OFDM. Based on analysis and simulation results, the trade-offs between sidelobe suppression and BER are studied with the above two parameters.
△ Less
Submitted 3 November, 2020; v1 submitted 27 November, 2018;
originally announced November 2018.
-
Intrusion Prediction with System-call Sequence-to-Sequence Model
Authors:
ShaoHua Lv,
Jian Wang,
YinQi Yang,
JiQiang Liu
Abstract:
The advanced development of the Internet facilitates efficient information exchange while also been exploited by adversaries. Intrusion detection system (IDS) as an important defense component of network security has always been widely studied in security research. However, research on intrusion prediction, which is more critical for network security, is received less attention. We argue that the…
▽ More
The advanced development of the Internet facilitates efficient information exchange while also been exploited by adversaries. Intrusion detection system (IDS) as an important defense component of network security has always been widely studied in security research. However, research on intrusion prediction, which is more critical for network security, is received less attention. We argue that the advanced anticipation and timely impede of invasion is more vital than simple alarms in security defenses. General research methods regarding prediction are analyzing short term of system-calls to predict forthcoming abnormal behaviors. In this paper we take advantages of the remarkable performance of recurrent neural networks (RNNs) in dealing with long sequential problem, introducing the sequence-to-sequence model into our intrusion prediction work. By semantic modeling system-calls we build a robust system-call sequence-to-sequence prediction model. With taking the system-call traces invoked during the program running as known prerequisite, our model predicts sequence of system-calls that is most likely to be executed in a near future period of time that enabled the ability of monitoring system status and prophesying the intrusion behaviors. Our experiments show that the predict method proposed in this paper achieved well prediction performance on ADFALD intrusion detection test data set. Moreover, the predicted sequence, combined with the known invoked traces of system, significantly improves the performance of intrusion detection verified on various classifiers.
△ Less
Submitted 5 August, 2018;
originally announced August 2018.
-
Efficient kernel-based variable selection with sparsistency
Authors:
Xin He,
Junhui Wang,
Shaogao Lv
Abstract:
Variable selection is central to high-dimensional data analysis, and various algorithms have been developed. Ideally, a variable selection algorithm shall be flexible, scalable, and with theoretical guarantee, yet most existing algorithms cannot attain these properties at the same time. In this article, a three-step variable selection algorithm is developed, involving kernel-based estimation of th…
▽ More
Variable selection is central to high-dimensional data analysis, and various algorithms have been developed. Ideally, a variable selection algorithm shall be flexible, scalable, and with theoretical guarantee, yet most existing algorithms cannot attain these properties at the same time. In this article, a three-step variable selection algorithm is developed, involving kernel-based estimation of the regression function and its gradient functions as well as a hard thresholding. Its key advantage is that it assumes no explicit model assumption, admits general predictor effects, allows for scalable computation, and attains desirable asymptotic sparsistency. The proposed algorithm can be adapted to any reproducing kernel Hilbert space (RKHS) with different kernel functions, and can be extended to interaction selection with slight modification. Its computational cost is only linear in the data dimension, and can be further improved through parallel computing. The sparsistency of the proposed algorithm is established for general RKHS under mild conditions, including linear and Gaussian kernels as special cases. Its effectiveness is also supported by a variety of simulated and real examples.
△ Less
Submitted 3 February, 2021; v1 submitted 26 February, 2018;
originally announced February 2018.
-
Exploring Temporal Preservation Networks for Precise Temporal Action Localization
Authors:
Ke Yang,
Peng Qiao,
Dongsheng Li,
Shaohe Lv,
Yong Dou
Abstract:
Temporal action localization is an important task of computer vision. Though a variety of methods have been proposed, it still remains an open question how to predict the temporal boundaries of action segments precisely. Most works use segment-level classifiers to select video segments pre-determined by action proposal or dense sliding windows. However, in order to achieve more precise action boun…
▽ More
Temporal action localization is an important task of computer vision. Though a variety of methods have been proposed, it still remains an open question how to predict the temporal boundaries of action segments precisely. Most works use segment-level classifiers to select video segments pre-determined by action proposal or dense sliding windows. However, in order to achieve more precise action boundaries, a temporal localization system should make dense predictions at a fine granularity. A newly proposed work exploits Convolutional-Deconvolutional-Convolutional (CDC) filters to upsample the predictions of 3D ConvNets, making it possible to perform per-frame action predictions and achieving promising performance in terms of temporal action localization. However, CDC network loses temporal information partially due to the temporal downsampling operation. In this paper, we propose an elegant and powerful Temporal Preservation Convolutional (TPC) Network that equips 3D ConvNets with TPC filters. TPC network can fully preserve temporal resolution and downsample the spatial resolution simultaneously, enabling frame-level granularity action localization. TPC network can be trained in an end-to-end manner. Experiment results on public datasets show that TPC network achieves significant improvement on per-frame action prediction and competing results on segment-level temporal action localization.
△ Less
Submitted 11 September, 2017; v1 submitted 10 August, 2017;
originally announced August 2017.
-
Qualitative Action Recognition by Wireless Radio Signals in Human-Machine Systems
Authors:
Shaohe Lv,
Yong Lu,
Mianxiong Dong,
Xiaodong Wang,
Yong Dou,
Weihua Zhuang
Abstract:
Human-machine systems required a deep understanding of human behaviors. Most existing research on action recognition has focused on discriminating between different actions, however, the quality of executing an action has received little attention thus far. In this paper, we study the quality assessment of driving behaviors and present WiQ, a system to assess the quality of actions based on radio…
▽ More
Human-machine systems required a deep understanding of human behaviors. Most existing research on action recognition has focused on discriminating between different actions, however, the quality of executing an action has received little attention thus far. In this paper, we study the quality assessment of driving behaviors and present WiQ, a system to assess the quality of actions based on radio signals. This system includes three key components, a deep neural network based learning engine to extract the quality information from the changes of signal strength, a gradient based method to detect the signal boundary for an individual action, and an activitybased fusion policy to improve the recognition performance in a noisy environment. By using the quality information, WiQ can differentiate a triple body status with an accuracy of 97%, while for identification among 15 drivers, the average accuracy is 88%. Our results show that, via dedicated analysis of radio signals, a fine-grained action characterization can be achieved, which can facilitate a large variety of applications, such as smart driving assistants.
△ Less
Submitted 3 March, 2017; v1 submitted 1 March, 2017;
originally announced March 2017.
-
Evaluation of Trace Alignment Quality and its Application in Medical Process Mining
Authors:
Moliang Zhou,
Sen Yang,
Shuyu Lv,
Xinyu Li,
Shuhong Chen,
Ivan Marsic,
Richard Farneth,
Randall Burd
Abstract:
Trace alignment algorithms have been used in process mining for discovering the consensus treatment procedures and process deviations. Different alignment algorithms, however, may produce very different results. No widely-adopted method exists for evaluating the results of trace alignment. Existing reference-free evaluation methods cannot adequately and comprehensively assess the alignment quality…
▽ More
Trace alignment algorithms have been used in process mining for discovering the consensus treatment procedures and process deviations. Different alignment algorithms, however, may produce very different results. No widely-adopted method exists for evaluating the results of trace alignment. Existing reference-free evaluation methods cannot adequately and comprehensively assess the alignment quality. We analyzed and compared the existing evaluation methods, identifying their limitations, and introduced improvements in two reference-free evaluation methods. Our approach assesses the alignment result globally instead of locally, and therefore helps the algorithm to optimize overall alignment quality. We also introduced a novel metric to measure the alignment complexity, which can be used as a constraint on alignment algorithm optimization. We tested our evaluation methods on a trauma resuscitation dataset and provided the medical explanation of the activities and patterns identified as deviations using our proposed evaluation methods.
△ Less
Submitted 13 August, 2017; v1 submitted 15 February, 2017;
originally announced February 2017.
-
Weakly supervised object detection using pseudo-strong labels
Authors:
Ke Yang,
Dongsheng Li,
Yong Dou,
Shaohe Lv,
Qiang Wang
Abstract:
Object detection is an import task of computer vision.A variety of methods have been proposed,but methods using the weak labels still do not have a satisfactory result.In this paper,we propose a new framework that using the weakly supervised method's output as the pseudo-strong labels to train a strongly supervised model.One weakly supervised method is treated as black-box to generate class-specif…
▽ More
Object detection is an import task of computer vision.A variety of methods have been proposed,but methods using the weak labels still do not have a satisfactory result.In this paper,we propose a new framework that using the weakly supervised method's output as the pseudo-strong labels to train a strongly supervised model.One weakly supervised method is treated as black-box to generate class-specific bounding boxes on train dataset.A de-noise method is then applied to the noisy bounding boxes.Then the de-noised pseudo-strong labels are used to train a strongly object detection network.The whole framework is still weakly supervised because the entire process only uses the image-level labels.The experiment results on PASCAL VOC 2007 prove the validity of our framework, and we get result 43.4% on mean average precision compared to 39.5% of the previous best result and 34.5% of the initial method,respectively.And this frame work is simple and distinct,and is promising to be applied to other method easily.
△ Less
Submitted 16 July, 2016;
originally announced July 2016.
-
Relative distance features for gait recognition with Kinect
Authors:
Ke Yang,
Yong Dou,
Shaohe Lv,
Fei Zhang,
Qi Lv
Abstract:
Gait and static body measurement are important biometric technologies for passive human recognition. Many previous works argue that recognition performance based completely on the gait feature is limited. The reason for this limited performance remains unclear. This study focuses on human recognition with gait feature obtained by Kinect and shows that gait feature can effectively distinguish from…
▽ More
Gait and static body measurement are important biometric technologies for passive human recognition. Many previous works argue that recognition performance based completely on the gait feature is limited. The reason for this limited performance remains unclear. This study focuses on human recognition with gait feature obtained by Kinect and shows that gait feature can effectively distinguish from different human beings through a novel representation -- relative distance-based gait features. Experimental results show that the recognition accuracy with relative distance features reaches up to 85%, which is comparable with that of anthropometric features. The combination of relative distance features and anthropometric features can provide an accuracy of more than 95%. Results indicate that the relative distance feature is quite effective and worthy of further study in more general scenarios (e.g., without Kinect).
△ Less
Submitted 17 May, 2016;
originally announced May 2016.
-
A Novel Face Recognition Method using Nearest Line Projection
Authors:
Huanguo Zhang,
Sha Lv,
Wei Li,
Xun Qu
Abstract:
Face recognition is a popular application of pat- tern recognition methods, and it faces challenging problems including illumination, expression, and pose. The most popular way is to learn the subspaces of the face images so that it could be project to another discriminant space where images of different persons can be separated. In this paper, a nearest line projection algorithm is developed to r…
▽ More
Face recognition is a popular application of pat- tern recognition methods, and it faces challenging problems including illumination, expression, and pose. The most popular way is to learn the subspaces of the face images so that it could be project to another discriminant space where images of different persons can be separated. In this paper, a nearest line projection algorithm is developed to represent the face images for face recognition. Instead of projecting an image to its nearest image, we try to project it to its nearest line spanned by two different face images. The subspaces are learned so that each face image to its nearest line is minimized. We evaluated the proposed algorithm on some benchmark face image database, and also compared it to some other image projection algorithms. The experiment results showed that the proposed algorithm outperforms other ones.
△ Less
Submitted 24 February, 2014;
originally announced February 2014.
-
Asymptotic Granularity Reduction and Its Application
Authors:
Shenghui Su,
Shuwang Lv,
Xiubin Fan
Abstract:
It is well known that the inverse function of y = x with the derivative y' = 1 is x = y, the inverse function of y = c with the derivative y' = 0 is inexistent, and so on. Hence, on the assumption that the noninvertibility of the univariate increasing function y = f(x) with x > 0 is in direct proportion to the growth rate reflected by its derivative, the authors put forward a method of comparing d…
▽ More
It is well known that the inverse function of y = x with the derivative y' = 1 is x = y, the inverse function of y = c with the derivative y' = 0 is inexistent, and so on. Hence, on the assumption that the noninvertibility of the univariate increasing function y = f(x) with x > 0 is in direct proportion to the growth rate reflected by its derivative, the authors put forward a method of comparing difficulties in inverting two functions on a continuous or discrete interval called asymptotic granularity reduction (AGR) which integrates asymptotic analysis with logarithmic granularities, and is an extension and a complement to polynomial time (Turing) reduction (PTR). Prove by AGR that inverting y = x ^ x (mod p) is computationally harder than inverting y = g ^ x (mod p), and inverting y = g ^ (x ^ n) (mod p) is computationally equivalent to inverting y = g ^ x (mod p), which are compatible with the results from PTR. Besides, apply AGR to the comparison of inverting y = x ^ n (mod p) with y = g ^ x (mod p), y = g ^ (g1 ^ x) (mod p) with y = g ^ x (mod p), and y = x ^ n + x + 1 (mod p) with y = x ^ n (mod p) in difficulty, and observe that the results are consistent with existing facts, which further illustrates that AGR is suitable for comparison of inversion problems in difficulty. Last, prove by AGR that inverting y = (x ^ n)(g ^ x) (mod p) is computationally equivalent to inverting y = g ^ x (mod p) when PTR can not be utilized expediently. AGR with the assumption partitions the complexities of problems more detailedly, and finds out some new evidence for the security of cryptosystems.
△ Less
Submitted 1 November, 2014; v1 submitted 1 June, 2011;
originally announced June 2011.