-
Automatic Target Detection for Sparse Hyperspectral Images
Authors:
Ahmad W. Bitar,
Jean-Philippe Ovarlez,
Loong-Fah Cheong,
Ali Chehab
Abstract:
In this work, a novel target detector for hyperspectral imagery is developed. The detector is independent on the unknown covariance matrix, behaves well in large dimensions, distributional free, invariant to atmospheric effects, and does not require a background dictionary to be constructed. Based on a modification of the robust principal component analysis (RPCA), a given hyperspectral image (HSI…
▽ More
In this work, a novel target detector for hyperspectral imagery is developed. The detector is independent on the unknown covariance matrix, behaves well in large dimensions, distributional free, invariant to atmospheric effects, and does not require a background dictionary to be constructed. Based on a modification of the robust principal component analysis (RPCA), a given hyperspectral image (HSI) is regarded as being made up of the sum of a low-rank background HSI and a sparse target HSI that contains the targets based on a pre-learned target dictionary specified by the user. The sparse component is directly used for the detection, that is, the targets are simply detected at the non-zero entries of the sparse target HSI. Hence, a novel target detector is developed, which is simply a sparse HSI generated automatically from the original HSI, but containing only the targets with the background is suppressed. The detector is evaluated on real experiments, and the results of which demonstrate its effectiveness for hyperspectral target detection especially when the targets are well matched to the surroundings.
△ Less
Submitted 5 March, 2020; v1 submitted 14 April, 2019;
originally announced April 2019.
-
Parallel Machine Scheduling with a Single Resource per Job
Authors:
T. Janssen,
C. Swennenhuis,
A. Bitar,
T. Bosman,
D. Gijswijt,
L. van Iersel,
S. Dauzére-Pérès,
C. Yugma
Abstract:
We study the problem of scheduling jobs on parallel machines minimizing the total completion time, with each job using exactly one resource. First, we derive fundamental properties of the problem and show that the problem is polynomially solvable if $p_j = 1$. Then we look at a variant of the shortest processing time rule as an approximation algorithm for the problem and show that it gives at leas…
▽ More
We study the problem of scheduling jobs on parallel machines minimizing the total completion time, with each job using exactly one resource. First, we derive fundamental properties of the problem and show that the problem is polynomially solvable if $p_j = 1$. Then we look at a variant of the shortest processing time rule as an approximation algorithm for the problem and show that it gives at least a $(2-\frac{1}{m})$-approximation. Subsequently, we show that, although the complexity of the problem remains open, three related problems are $\mathcal{NP}$-hard. In the first problem, every resource also has a subset of machines on which it can be used. In the second problem, once a resource has been used on a machine it cannot be used on any other machine, hence all jobs using the same resource need to be scheduled on the same machine. In the third problem, every job needs exactly two resources instead of just one.
△ Less
Submitted 16 November, 2018; v1 submitted 13 September, 2018;
originally announced September 2018.
-
DLA: Compiler and FPGA Overlay for Neural Network Inference Acceleration
Authors:
Mohamed S. Abdelfattah,
David Han,
Andrew Bitar,
Roberto DiCecco,
Shane OConnell,
Nitika Shanker,
Joseph Chu,
Ian Prins,
Joshua Fender,
Andrew C. Ling,
Gordon R. Chiu
Abstract:
Overlays have shown significant promise for field-programmable gate-arrays (FPGAs) as they allow for fast development cycles and remove many of the challenges of the traditional FPGA hardware design flow. However, this often comes with a significant performance burden resulting in very little adoption of overlays for practical applications. In this paper, we tailor an overlay to a specific applica…
▽ More
Overlays have shown significant promise for field-programmable gate-arrays (FPGAs) as they allow for fast development cycles and remove many of the challenges of the traditional FPGA hardware design flow. However, this often comes with a significant performance burden resulting in very little adoption of overlays for practical applications. In this paper, we tailor an overlay to a specific application domain, and we show how we maintain its full programmability without paying for the performance overhead traditionally associated with overlays. Specifically, we introduce an overlay targeted for deep neural network inference with only ~1% overhead to support the control and reprogramming logic using a lightweight very-long instruction word (VLIW) network. Additionally, we implement a sophisticated domain specific graph compiler that compiles deep learning languages such as Caffe or Tensorflow to easily target our overlay. We show how our graph compiler performs architecture-driven software optimizations to significantly boost performance of both convolutional and recurrent neural networks (CNNs/RNNs) - we demonstrate a 3x improvement on ResNet-101 and a 12x improvement for long short-term memory (LSTM) cells, compared to naive implementations. Finally, we describe how we can tailor our hardware overlay, and use our graph compiler to achieve ~900 fps on GoogLeNet on an Intel Arria 10 1150 - the fastest ever reported on comparable FPGAs.
△ Less
Submitted 13 July, 2018;
originally announced July 2018.
-
Efficient Implementation of a Recognition System Using the Cortex Ventral Stream Model
Authors:
Ahmad W. Bitar,
Mohammad M. Mansour,
Ali Chehab
Abstract:
In this paper, an efficient implementation for a recognition system based on the original HMAX model of the visual cortex is proposed. Various optimizations targeted to increase accuracy at the so-called layers S1, C1, and S2 of the HMAX model are proposed. At layer S1, all unimportant information such as illumination and expression variations are eliminated from the images. Each image is then con…
▽ More
In this paper, an efficient implementation for a recognition system based on the original HMAX model of the visual cortex is proposed. Various optimizations targeted to increase accuracy at the so-called layers S1, C1, and S2 of the HMAX model are proposed. At layer S1, all unimportant information such as illumination and expression variations are eliminated from the images. Each image is then convolved with 64 separable Gabor filters in the spatial domain. At layer C1, the minimum scales values are exploited to be embedded into the maximum ones using the additive embedding space. At layer S2, the prototypes are generated in a more efficient way using Partitioning Around Medoid (PAM) clustering algorithm. The impact of these optimizations in terms of accuracy and computational complexity was evaluated on the Caltech101 database, and compared with the baseline performance using support vector machine (SVM) and nearest neighbor (NN) classifiers. The results show that our model provides significant improvement in accuracy at the S1 layer by more than 10% where the computational complexity is also reduced. The accuracy is slightly increased for both approximations at the C1 and S2 layers.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.