Search | arXiv e-print repository

doi 10.1016/j.knosys.2024.111563

REB: Reducing Biases in Representation for Industrial Anomaly Detection

Authors: Shuai Lyu, Dongmei Mo, Waikeung Wong

Abstract: Existing representation-based methods usually conduct industrial anomaly detection in two stages: obtain feature representations with a pre-trained model and perform distance measures for anomaly detection. Among them, K-nearest neighbor (KNN) retrieval-based anomaly detection methods show promising results. However, the features are not fully exploited as these methods ignore domain bias of pre-t… ▽ More Existing representation-based methods usually conduct industrial anomaly detection in two stages: obtain feature representations with a pre-trained model and perform distance measures for anomaly detection. Among them, K-nearest neighbor (KNN) retrieval-based anomaly detection methods show promising results. However, the features are not fully exploited as these methods ignore domain bias of pre-trained models and the difference of local density in feature space, which limits the detection performance. In this paper, we propose Reducing Biases (REB) in representation by considering the domain bias and building a self-supervised learning task for better domain adaption with a defect generation strategy (DefectMaker) that ensures a strong diversity in the synthetic defects. Additionally, we propose a local-density KNN (LDKNN) to reduce the local density bias in the feature space and obtain effective anomaly detection. The proposed REB method achieves a promising result of 99.5\% Im.AUROC on the widely used MVTec AD, with smaller backbone networks such as Vgg11 and Resnet18. The method also achieves an impressive 88.8\% Im.AUROC on the MVTec LOCO AD dataset and a remarkable 96.0\% on the BTAD dataset, outperforming other representation-based approaches. These results indicate the effectiveness and efficiency of REB for practical industrial applications. Code:https://github.com/ShuaiLYU/REB. △ Less

Submitted 17 May, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

Comments: 14 pages, 7 figures, 7 tables

arXiv:2308.07013 [pdf, other]

Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads

Authors: Dingheng Mo, Fanchao Chen, Siqiang Luo, Caihua Shan

Abstract: LSM-trees are widely adopted as the storage backend of key-value stores. However, optimizing the system performance under dynamic workloads has not been sufficiently studied or evaluated in previous work. To fill the gap, we present RusKey, a key-value store with the following new features: (1) RusKey is a first attempt to orchestrate LSM-tree structures online to enable robust performance under t… ▽ More LSM-trees are widely adopted as the storage backend of key-value stores. However, optimizing the system performance under dynamic workloads has not been sufficiently studied or evaluated in previous work. To fill the gap, we present RusKey, a key-value store with the following new features: (1) RusKey is a first attempt to orchestrate LSM-tree structures online to enable robust performance under the context of dynamic workloads; (2) RusKey is the first study to use Reinforcement Learning (RL) to guide LSM-tree transformations; (3) RusKey includes a new LSM-tree design, named FLSM-tree, for an efficient transition between different compaction policies -- the bottleneck of dynamic key-value stores. We justify the superiority of the new design with theoretical analysis; (4) RusKey requires no prior workload knowledge for system adjustment, in contrast to state-of-the-art techniques. Experiments show that RusKey exhibits strong performance robustness in diverse workloads, achieving up to 4x better end-to-end performance than the RocksDB system under various settings. △ Less

Submitted 17 September, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: 25 pages, 13 figures

arXiv:2207.09179 [pdf, other]

doi 10.14778/3551793.3551866

SCARA: Scalable Graph Neural Networks with Feature-Oriented Optimization

Authors: Ningyi Liao, Dingheng Mo, Siqiang Luo, Xiang Li, Pengcheng Yin

Abstract: Recent advances in data processing have stimulated the demand for learning graphs of very large scales. Graph Neural Networks (GNNs), being an emerging and powerful approach in solving graph learning tasks, are known to be difficult to scale up. Most scalable models apply node-based techniques in simplifying the expensive graph message-passing propagation procedure of GNN. However, we find such ac… ▽ More Recent advances in data processing have stimulated the demand for learning graphs of very large scales. Graph Neural Networks (GNNs), being an emerging and powerful approach in solving graph learning tasks, are known to be difficult to scale up. Most scalable models apply node-based techniques in simplifying the expensive graph message-passing propagation procedure of GNN. However, we find such acceleration insufficient when applied to million- or even billion-scale graphs. In this work, we propose SCARA, a scalable GNN with feature-oriented optimization for graph computation. SCARA efficiently computes graph embedding from node features, and further selects and reuses feature computation results to reduce overhead. Theoretical analysis indicates that our model achieves sub-linear time complexity with a guaranteed precision in propagation process as well as GNN training and inference. We conduct extensive experiments on various datasets to evaluate the efficacy and efficiency of SCARA. Performance comparison with baselines shows that SCARA can reach up to 100x graph propagation acceleration than current state-of-the-art methods with fast convergence and comparable accuracy. Most notably, it is efficient to process precomputation on the largest available billion-scale GNN dataset Papers100M (111M nodes, 1.6B edges) in 100 seconds. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Journal ref: Proceedings of the VLDB Endowment 15 (2022) 3240-3248

arXiv:1811.12322 [pdf, other]

Binary Sequence Set Design for Interferer Rejection in Multi-Branch Modulation

Authors: Dian Mo, Marco F. Duarte

Abstract: Wideband communication is often expected to deal with a very wide spectrum, which in many environments of interest includes strong interferers. Thus receivers for the wideband communication systems often need to mitigate interferers to reduce the distortion caused by the amplifier nonlinearity and noise. Recently, a new architecture for communication receivers known as random modulation mixes a si… ▽ More Wideband communication is often expected to deal with a very wide spectrum, which in many environments of interest includes strong interferers. Thus receivers for the wideband communication systems often need to mitigate interferers to reduce the distortion caused by the amplifier nonlinearity and noise. Recently, a new architecture for communication receivers known as random modulation mixes a signal with different pseudorandom sequences using multiple branches of channels before sampling. While random modulation is used in these receivers to acquire the signal at low sampling rates, the modulation sequences used lack the ability to suppress interferers due to their flat spectra. In previous work, we introduced the design of a single spectrally shaped binary sequence that mitigates interferers to replace the pseudorandom sequence in a channel. However, the designed sequences cannot provide the stable recovery achieved by pseudorandom sequence approaches. In this paper, we extend our previous sequence design to guarantee stable recovery by designing a set of sequences to be orthogonal to each other. We show that it is difficult to find the necessary number of sequences featuring mutual orthogonality and introduce oversampling to the sequence set design to improve the recovery performance. We propose an algorithm for multi-branch sequence design as a binary optimization problem, which is solved using a semidefinite program relaxation and randomized projection. While it is common to model narrowband interferers as a subspace spanned by a subset of elements from the Fourier basis, we show that the Slepian basis provides an alternative and more suitable compact representation for signals with components contained in narrow spectrum bands. Numerical experiments using the proposed sequence sets show their advantages against pseudorandom sequences and our previous work. △ Less

Submitted 3 June, 2020; v1 submitted 29 November, 2018; originally announced November 2018.

Comments: 11 pages, 6 figures. To appear in IEEE Transactions on Signal Processing

arXiv:1811.05873 [pdf, other]

Design of Spectrally Shaped Binary Sequences via Randomized Convex Relaxation

Authors: Dian Mo, Marco F. Duarte

Abstract: Wideband communication receivers often deal with the problems of detecting weak signals from distant sources received together with strong nearby interferers. When the techniques of random modulation are used in communication system receivers, one can design a spectrally shaped sequence that mitigates interferer bands while preserving message bands. Common implementation constraints require sequen… ▽ More Wideband communication receivers often deal with the problems of detecting weak signals from distant sources received together with strong nearby interferers. When the techniques of random modulation are used in communication system receivers, one can design a spectrally shaped sequence that mitigates interferer bands while preserving message bands. Common implementation constraints require sequence quantization, which turns the design problem formulation to an integer optimization problem solved using a semidefinite program on a matrix that is restricted to have rank one. Common approximation schemes for this problem are not amenable due to the distortion to the spectrum caused by the required quantization. We propose a method that leverages a randomized projection and quantization of the solution of a semidefinite program, an approach that has been previously used for related integer programs. We provide a theoretical and numerical analysis on the feasibility and quality of the approximation provided by the proposed approach. Furthermore, numerical simulations show that our proposed approach returns the same sequence as an exhaustive search (when feasible), showcasing its accuracy and efficiency. Furthermore, our proposed method succeeds in finding suitable spectrally shaped sequences for cases where exhaustive search is not feasible, achieving better performance than existing alternatives. △ Less

Submitted 14 November, 2018; originally announced November 2018.

Comments: 27 Pages, 7 figures

arXiv:1412.6724 [pdf, other]

doi 10.1016/j.sigpro.2017.07.003

Performance of Compressive Parameter Estimation via K-Median Clustering

Authors: Dian Mo, Marco F. Duarte

Abstract: Compressive sensing (CS) has attracted significant attention in parameter estimation tasks, where parametric dictionaries (PDs) collect signal observations for a sampling of the parameter space and yield sparse representations for signals of interest when the sampling is dense. While this sampling also leads to high dictionary coherence, one can leverage structured sparsity models to prevent highl… ▽ More Compressive sensing (CS) has attracted significant attention in parameter estimation tasks, where parametric dictionaries (PDs) collect signal observations for a sampling of the parameter space and yield sparse representations for signals of interest when the sampling is dense. While this sampling also leads to high dictionary coherence, one can leverage structured sparsity models to prevent highly coherent dictionary elements from appearing simultaneously in the recovered signal. However, the resulting approaches depend heavily on the careful setting of the maximum allowable coherence; furthermore, their guarantees are not concerned with general parameter estimation performance. We propose the use of earth mover's distance (EMD), as applied to a pair of true and estimated PD coefficient vectors, to measure the parameter estimation error. We formally analyze the connection between the EMD and the parameter estimation error and show that the EMD provides a better-suited metric for parameter estimation performance than the Euclidean distance. Additionally, we analyze the previously described relationship between K-median clustering and EMD-optimal sparse approximation and leverage it to develop improved PD-based parameter estimation algorithms. Finally, our numerical experiments verify our theoretical results and show that the proposed compressive parameter estimation algorithms have improved performance over existing approaches. △ Less

Submitted 11 May, 2017; v1 submitted 21 December, 2014; originally announced December 2014.

Comments: 41 pages, 8 figures; revision includes additional discussions and experiment

Showing 1–6 of 6 results for author: Mo, D