-
DAFE-FD: Density Aware Feature Enrichment for Face Detection
Authors:
Vishwanath A. Sindagi,
Vishal M. Patel
Abstract:
Recent research on face detection, which is focused primarily on improving accuracy of detecting smaller faces, attempt to develop new anchor design strategies to facilitate increased overlap between anchor boxes and ground truth faces of smaller sizes. In this work, we approach the problem of small face detection with the motivation of enriching the feature maps using a density map estimation mod…
▽ More
Recent research on face detection, which is focused primarily on improving accuracy of detecting smaller faces, attempt to develop new anchor design strategies to facilitate increased overlap between anchor boxes and ground truth faces of smaller sizes. In this work, we approach the problem of small face detection with the motivation of enriching the feature maps using a density map estimation module. This module, inspired by recent crowd counting/density estimation techniques, performs the task of estimating the per pixel density of people/faces present in the image. Output of this module is employed to accentuate the feature maps from the backbone network using a feature enrichment module before being used for detecting smaller faces. The proposed approach can be used to complement recent anchor-design based novel methods to further improve their results. Experiments conducted on different datasets such as WIDER, FDDB and Pascal-Faces demonstrate the effectiveness of the proposed approach.
△ Less
Submitted 16 January, 2019;
originally announced January 2019.
-
Polarimetric Thermal to Visible Face Verification via Attribute Preserved Synthesis
Authors:
Xing Di,
He Zhang,
Vishal M. Patel
Abstract:
Thermal to visible face verification is a challenging problem due to the large domain discrepancy between the modalities. Existing approaches either attempt to synthesize visible faces from thermal faces or extract robust features from these modalities for cross-modal matching. In this paper, we take a different approach in which we make use of the attributes extracted from the visible image to sy…
▽ More
Thermal to visible face verification is a challenging problem due to the large domain discrepancy between the modalities. Existing approaches either attempt to synthesize visible faces from thermal faces or extract robust features from these modalities for cross-modal matching. In this paper, we take a different approach in which we make use of the attributes extracted from the visible image to synthesize the attribute-preserved visible image from the input thermal image for cross-modal matching. A pre-trained VGG-Face network is used to extract the attributes from the visible image. Then, a novel Attribute Preserved Generative Adversarial Network (AP-GAN) is proposed to synthesize the visible image from the thermal image guided by the extracted attributes. Finally, a deep network is used to extract features from the synthesized image and the input visible image for verification. Extensive experiments on the ARL Polarimetric face dataset show that the proposed method achieves significant improvements over the state-of-the-art methods.
△ Less
Submitted 3 January, 2019;
originally announced January 2019.
-
On zero-free regions for the anti-ferromagnetic Potts model on bounded-degree graphs
Authors:
Ferenc Bencs,
Ewan Davies,
Viresh Patel,
Guus Regts
Abstract:
For a graph $G=(V,E)$, $k\in \mathbb{N}$, and a complex number $w$ the partition function of the univariate Potts model is defined as \[ {\bf Z}(G;k,w):=\sum_{φ:V\to [k]}\prod_{\substack{uv\in E \\ φ(u)=φ(v)}}w, \] where $[k]:=\{1,\ldots,k\}$. In this paper we give zero-free regions for the partition function of the anti-ferromagnetic Potts model on bounded degree graphs. In particular we show tha…
▽ More
For a graph $G=(V,E)$, $k\in \mathbb{N}$, and a complex number $w$ the partition function of the univariate Potts model is defined as \[ {\bf Z}(G;k,w):=\sum_{φ:V\to [k]}\prod_{\substack{uv\in E \\ φ(u)=φ(v)}}w, \] where $[k]:=\{1,\ldots,k\}$. In this paper we give zero-free regions for the partition function of the anti-ferromagnetic Potts model on bounded degree graphs. In particular we show that for any $Δ\in \mathbb{N}$ and any $k\geq eΔ+1$, there exists an open set $U$ in the complex plane that contains the interval $[0,1)$ such that ${\bf Z}(G;k,w)\neq 0$ for any $w\in U$ and any graph $G$ of maximum degree at most $Δ$. (Here $e$ denotes the base of the natural logarithm.) For small values of $Δ$ we are able to give better results.
As an application of our results we obtain improved bounds on $k$ for the existence of deterministic approximation algorithms for counting the number of proper $k$-colourings of graphs of small maximum degree.
△ Less
Submitted 5 April, 2021; v1 submitted 18 December, 2018;
originally announced December 2018.
-
Improving the Performance of Unimodal Dynamic Hand-Gesture Recognition with Multimodal Training
Authors:
Mahdi Abavisani,
Hamid Reza Vaezi Joze,
Vishal M. Patel
Abstract:
We present an efficient approach for leveraging the knowledge from multiple modalities in training unimodal 3D convolutional neural networks (3D-CNNs) for the task of dynamic hand gesture recognition. Instead of explicitly combining multimodal information, which is commonplace in many state-of-the-art methods, we propose a different framework in which we embed the knowledge of multiple modalities…
▽ More
We present an efficient approach for leveraging the knowledge from multiple modalities in training unimodal 3D convolutional neural networks (3D-CNNs) for the task of dynamic hand gesture recognition. Instead of explicitly combining multimodal information, which is commonplace in many state-of-the-art methods, we propose a different framework in which we embed the knowledge of multiple modalities in individual networks so that each unimodal network can achieve an improved performance. In particular, we dedicate separate networks per available modality and enforce them to collaborate and learn to develop networks with common semantics and better representations. We introduce a "spatiotemporal semantic alignment" loss (SSA) to align the content of the features from different networks. In addition, we regularize this loss with our proposed "focal regularization parameter" to avoid negative knowledge transfer. Experimental results show that our framework improves the test time recognition accuracy of unimodal networks, and provides the state-of-the-art performance on various dynamic hand gesture recognition datasets.
△ Less
Submitted 12 August, 2019; v1 submitted 14 December, 2018;
originally announced December 2018.
-
Synthesis of High-Quality Visible Faces from Polarimetric Thermal Faces using Generative Adversarial Networks
Authors:
He Zhang,
Benjamin S. Riggan,
Shuowen Hu,
Nathaniel J. Short,
Vishal M. Patel
Abstract:
The large domain discrepancy between faces captured in polarimetric (or conventional) thermal and visible domain makes cross-domain face verification a highly challenging problem for human examiners as well as computer vision algorithms. Previous approaches utilize either a two-step procedure (visible feature estimation and visible image reconstruction) or an input-level fusion technique, where di…
▽ More
The large domain discrepancy between faces captured in polarimetric (or conventional) thermal and visible domain makes cross-domain face verification a highly challenging problem for human examiners as well as computer vision algorithms. Previous approaches utilize either a two-step procedure (visible feature estimation and visible image reconstruction) or an input-level fusion technique, where different Stokes images are concatenated and used as a multi-channel input to synthesize the visible image given the corresponding polarimetric signatures. Although these methods have yielded improvements, we argue that input-level fusion alone may not be sufficient to realize the full potential of the available Stokes images. We propose a Generative Adversarial Networks (GAN) based multi-stream feature-level fusion technique to synthesize high-quality visible images from prolarimetric thermal images. The proposed network consists of a generator sub-network, constructed using an encoder-decoder network based on dense residual blocks, and a multi-scale discriminator sub-network. The generator network is trained by optimizing an adversarial loss in addition to a perceptual loss and an identity preserving loss to enable photo realistic generation of visible images while preserving discriminative characteristics. An extended dataset consisting of polarimetric thermal facial signatures of 111 subjects is also introduced. Multiple experiments evaluated on different experimental protocols demonstrate that the proposed method achieves state-of-the-art performance. Code will be made available at https://github.com/hezhangsprinter.
△ Less
Submitted 12 December, 2018;
originally announced December 2018.
-
The dynamic spectral signatures from Lunar Occultation: A simulation study
Authors:
Jigisha V. Patel,
Avinash A. Deshpande
Abstract:
Lunar occultation, which occurs when the Moon crosses sight-lines to distant sources, has been studied extensively through apparent intensity pattern resulting from Fresnel diffraction, and has been successfully used to measure angular sizes of extragalactic sources. However, such observations to-date have been mainly over narrow bandwidth, or averaged over the observing band, and the associated i…
▽ More
Lunar occultation, which occurs when the Moon crosses sight-lines to distant sources, has been studied extensively through apparent intensity pattern resulting from Fresnel diffraction, and has been successfully used to measure angular sizes of extragalactic sources. However, such observations to-date have been mainly over narrow bandwidth, or averaged over the observing band, and the associated intensity pattern in time has rarely been examined in detail as a function of frequency over a wide band. Here, we revisit the phenomenon of lunar occultation with a view to study the associated intensity pattern as a function of both time and frequency. Through analytical and simulation approach, we examine the variation of intensity across the dynamic spectra, and look for chromatic signatures which could appear as discrete dispersed signal tracks, when the diffraction pattern is adequately smoothed by a finite source size. We particularly explore circumstances in which such diffraction pattern might closely follow the interstellar dispersion law followed by pulsars and transients, such as the Fast Radio Bursts (FRBs), which remain a mystery even after a decade of their discovery. In this paper, we describe details of this investigation, relevant to radio frequencies at which FRBs have been detected, and discuss our findings, along with their implications. We also show how a band-averaged light curve suffers from temporal smearing, and consequent reduction in contrast of intensity variation, with increasing bandwidth. We suggest a way to recover the underlying diffraction signature, as well as the sensitivity improvement commensurate with usage of large bandwidths.
△ Less
Submitted 30 November, 2018;
originally announced November 2018.
-
Shifted powers in Lucas-Lehmer sequences
Authors:
Michael Bennett,
Vandita Patel,
Samir Siksek
Abstract:
We develop a general framework for finding all perfect powers in sequences derived by shifting non-degenerate quadratic Lucas-Lehmer binary recurrence sequences by a fixed integer. By combining this setup with bounds for linear forms in logarithms and results based upon the modularity of elliptic curves defined over totally real fields, we are able to answer a question of Bugeaud, Luca, Mignotte a…
▽ More
We develop a general framework for finding all perfect powers in sequences derived by shifting non-degenerate quadratic Lucas-Lehmer binary recurrence sequences by a fixed integer. By combining this setup with bounds for linear forms in logarithms and results based upon the modularity of elliptic curves defined over totally real fields, we are able to answer a question of Bugeaud, Luca, Mignotte and the third author by explicitly finding all perfect powers of the shape $F_k \pm 2 $ where $F_k$ is the $k$-th term in the Fibonacci sequence.
△ Less
Submitted 27 November, 2018;
originally announced November 2018.
-
Perfect Powers that are Sums of Squares of an Arithmetic Progression
Authors:
Debanjana Kundu,
Vandita Patel
Abstract:
In this paper, we determine all primitive solutions to the equation $(x+r)^2 +(x+2r)^2 +\cdots +(x+dr)^2 = y^n$ for $2\leq d\leq 10$ and for $1\leq r\leq 10^4$. We make use of a factorization argument and the Primitive Divisors Theorem due to Bilu, Hanrot and Voutier.
In this paper, we determine all primitive solutions to the equation $(x+r)^2 +(x+2r)^2 +\cdots +(x+dr)^2 = y^n$ for $2\leq d\leq 10$ and for $1\leq r\leq 10^4$. We make use of a factorization argument and the Primitive Divisors Theorem due to Bilu, Hanrot and Voutier.
△ Less
Submitted 20 December, 2019; v1 submitted 24 September, 2018;
originally announced September 2018.
-
Disentangled Variational Representation for Heterogeneous Face Recognition
Authors:
Xiang Wu,
Huaibo Huang,
Vishal M. Patel,
Ran He,
Zhenan Sun
Abstract:
Visible (VIS) to near infrared (NIR) face matching is a challenging problem due to the significant domain discrepancy between the domains and a lack of sufficient data for training cross-modal matching algorithms. Existing approaches attempt to tackle this problem by either synthesizing visible faces from NIR faces, extracting domain-invariant features from these modalities, or projecting heteroge…
▽ More
Visible (VIS) to near infrared (NIR) face matching is a challenging problem due to the significant domain discrepancy between the domains and a lack of sufficient data for training cross-modal matching algorithms. Existing approaches attempt to tackle this problem by either synthesizing visible faces from NIR faces, extracting domain-invariant features from these modalities, or projecting heterogeneous data onto a common latent space for cross-modal matching. In this paper, we take a different approach in which we make use of the Disentangled Variational Representation (DVR) for cross-modal matching. First, we model a face representation with an intrinsic identity information and its within-person variations. By exploring the disentangled latent variable space, a variational lower bound is employed to optimize the approximate posterior for NIR and VIS representations. Second, aiming at obtaining more compact and discriminative disentangled latent space, we impose a minimization of the identity information for the same subject and a relaxed correlation alignment constraint between the NIR and VIS modality variations. An alternative optimization scheme is proposed for the disentangled variational representation part and the heterogeneous face recognition network part. The mutual promotion between these two parts effectively reduces the NIR and VIS domain discrepancy and alleviates over-fitting. Extensive experiments on three challenging NIR-VIS heterogeneous face recognition databases demonstrate that the proposed method achieves significant improvements over the state-of-the-art methods.
△ Less
Submitted 23 January, 2019; v1 submitted 6 September, 2018;
originally announced September 2018.
-
Simultaneous Segmentation and Classification of Bone Surfaces from Ultrasound Using a Multi-feature Guided CNN
Authors:
Puyang Wang,
Vishal M. Patel,
Ilker Hacihaliloglu
Abstract:
Various imaging artifacts, low signal-to-noise ratio, and bone surfaces appearing several millimeters in thickness have hindered the success of ultrasound (US) guided computer assisted orthopedic surgery procedures. In this work, a multi-feature guided convolutional neural network (CNN) architecture is proposed for simultaneous enhancement, segmentation, and classification of bone surfaces from US…
▽ More
Various imaging artifacts, low signal-to-noise ratio, and bone surfaces appearing several millimeters in thickness have hindered the success of ultrasound (US) guided computer assisted orthopedic surgery procedures. In this work, a multi-feature guided convolutional neural network (CNN) architecture is proposed for simultaneous enhancement, segmentation, and classification of bone surfaces from US data. The proposed CNN consists of two main parts: a pre-enhancing net, that takes the concatenation of B-mode US scan and three filtered image features for the enhancement of bone surfaces, and a modified U-net with a classification layer. The proposed method was validated on 650 in vivo US scans collected using two US machines, by scanning knee, femur, distal radius and tibia bones. Validation, against expert annotation, achieved statistically significant improvements in segmentation of bone surfaces compared to state-of-the-art.
△ Less
Submitted 25 June, 2018;
originally announced June 2018.
-
Pushing the Limits of Unconstrained Face Detection: a Challenge Dataset and Baseline Results
Authors:
Hajime Nada,
Vishwanath A. Sindagi,
He Zhang,
Vishal M. Patel
Abstract:
Face detection has witnessed immense progress in the last few years, with new milestones being surpassed every year. While many challenges such as large variations in scale, pose, appearance are successfully addressed, there still exist several issues which are not specifically captured by existing methods or datasets. In this work, we identify the next set of challenges that requires attention fr…
▽ More
Face detection has witnessed immense progress in the last few years, with new milestones being surpassed every year. While many challenges such as large variations in scale, pose, appearance are successfully addressed, there still exist several issues which are not specifically captured by existing methods or datasets. In this work, we identify the next set of challenges that requires attention from the research community and collect a new dataset of face images that involve these issues such as weather-based degradations, motion blur, focus blur and several others. We demonstrate that there is a considerable gap in the performance of state-of-the-art detectors and real-world requirements. Hence, in an attempt to fuel further research in unconstrained face detection, we present a new annotated Unconstrained Face Detection Dataset (UFDD) with several challenges and benchmark recent methods. Additionally, we provide an in-depth analysis of the results and failure cases of these methods. The dataset as well as baseline results will be made publicly available in due time. The UFDD dataset as well as baseline results are available at: www.ufdd.info/
△ Less
Submitted 8 August, 2018; v1 submitted 26 April, 2018;
originally announced April 2018.
-
Deep Multimodal Subspace Clustering Networks
Authors:
Mahdi Abavisani,
Vishal M. Patel
Abstract:
We present convolutional neural network (CNN) based approaches for unsupervised multimodal subspace clustering. The proposed framework consists of three main stages - multimodal encoder, self-expressive layer, and multimodal decoder. The encoder takes multimodal data as input and fuses them to a latent space representation. The self-expressive layer is responsible for enforcing the self-expressive…
▽ More
We present convolutional neural network (CNN) based approaches for unsupervised multimodal subspace clustering. The proposed framework consists of three main stages - multimodal encoder, self-expressive layer, and multimodal decoder. The encoder takes multimodal data as input and fuses them to a latent space representation. The self-expressive layer is responsible for enforcing the self-expressiveness property and acquiring an affinity matrix corresponding to the data points. The decoder reconstructs the original input data. The network uses the distance between the decoder's reconstruction and the original input in its training. We investigate early, late and intermediate fusion techniques and propose three different encoders corresponding to them for spatial fusion. The self-expressive layers and multimodal decoders are essentially the same for different spatial fusion-based approaches. In addition to various spatial fusion-based methods, an affinity fusion-based network is also proposed in which the self-expressive layer corresponding to different modalities is enforced to be the same. Extensive experiments on three datasets show that the proposed methods significantly outperform the state-of-the-art multimodal subspace clustering methods.
△ Less
Submitted 4 January, 2019; v1 submitted 17 April, 2018;
originally announced April 2018.
-
Densely Connected Pyramid Dehazing Network
Authors:
He Zhang,
Vishal M. Patel
Abstract:
We propose a new end-to-end single image dehazing method, called Densely Connected Pyramid Dehazing Network (DCPDN), which can jointly learn the transmission map, atmospheric light and dehazing all together. The end-to-end learning is achieved by directly embedding the atmospheric scattering model into the network, thereby ensuring that the proposed method strictly follows the physics-driven scatt…
▽ More
We propose a new end-to-end single image dehazing method, called Densely Connected Pyramid Dehazing Network (DCPDN), which can jointly learn the transmission map, atmospheric light and dehazing all together. The end-to-end learning is achieved by directly embedding the atmospheric scattering model into the network, thereby ensuring that the proposed method strictly follows the physics-driven scattering model for dehazing. Inspired by the dense network that can maximize the information flow along features from different levels, we propose a new edge-preserving densely connected encoder-decoder structure with multi-level pyramid pooling module for estimating the transmission map. This network is optimized using a newly introduced edge-preserving loss function. To further incorporate the mutual structural information between the estimated transmission map and the dehazed result, we propose a joint-discriminator based on generative adversarial network framework to decide whether the corresponding dehazed image and the estimated transmission map are real or fake. An ablation study is conducted to demonstrate the effectiveness of each module evaluated at both estimated transmission map and dehazed result. Extensive experiments demonstrate that the proposed method achieves significant improvements over the state-of-the-art methods. Code will be made available at: https://github.com/hezhangsprinter
△ Less
Submitted 22 March, 2018;
originally announced March 2018.
-
Generating High Quality Visible Images from SAR Images Using CNNs
Authors:
Puyang Wang,
Vishal M. Patel
Abstract:
We propose a novel approach for generating high quality visible-like images from Synthetic Aperture Radar (SAR) images using Deep Convolutional Generative Adversarial Network (GAN) architectures. The proposed approach is based on a cascaded network of convolutional neural nets (CNNs) for despeckling and image colorization. The cascaded structure results in faster convergence during training and pr…
▽ More
We propose a novel approach for generating high quality visible-like images from Synthetic Aperture Radar (SAR) images using Deep Convolutional Generative Adversarial Network (GAN) architectures. The proposed approach is based on a cascaded network of convolutional neural nets (CNNs) for despeckling and image colorization. The cascaded structure results in faster convergence during training and produces high quality visible images from the corresponding SAR images. Experimental results on both simulated and real SAR images show that the proposed method can produce visible-like images better compared to the recent state-of-the-art deep learning-based methods.
△ Less
Submitted 27 February, 2018;
originally announced February 2018.
-
Perfect powers that are sums of squares in a three term arithmetic progression
Authors:
Angelos Koutsianas,
Vandita Patel
Abstract:
We determine primitive solutions to the equation $(x-r)^2 + x^2 + (x+r)^2 = y^n$ for $1 \le r \le 5,000$, making use of a factorization argument and the Primitive Divisors Theorem due to Bilu, Hanrot and Voutier.
We determine primitive solutions to the equation $(x-r)^2 + x^2 + (x+r)^2 = y^n$ for $1 \le r \le 5,000$, making use of a factorization argument and the Primitive Divisors Theorem due to Bilu, Hanrot and Voutier.
△ Less
Submitted 21 February, 2018;
originally announced February 2018.
-
Density-aware Single Image De-raining using a Multi-stream Dense Network
Authors:
He Zhang,
Vishal M. Patel
Abstract:
Single image rain streak removal is an extremely challenging problem due to the presence of non-uniform rain densities in images. We present a novel density-aware multi-stream densely connected convolutional neural network-based algorithm, called DID-MDN, for joint rain density estimation and de-raining. The proposed method enables the network itself to automatically determine the rain-density inf…
▽ More
Single image rain streak removal is an extremely challenging problem due to the presence of non-uniform rain densities in images. We present a novel density-aware multi-stream densely connected convolutional neural network-based algorithm, called DID-MDN, for joint rain density estimation and de-raining. The proposed method enables the network itself to automatically determine the rain-density information and then efficiently remove the corresponding rain-streaks guided by the estimated rain-density label. To better characterize rain-streaks with different scales and shapes, a multi-stream densely connected de-raining network is proposed which efficiently leverages features from different scales. Furthermore, a new dataset containing images with rain-density labels is created and used to train the proposed density-aware network. Extensive experiments on synthetic and real datasets demonstrate that the proposed method achieves significant improvements over the recent state-of-the-art methods. In addition, an ablation study is performed to demonstrate the improvements obtained by different modules in the proposed method. Code can be found at: https://github.com/hezhangsprinter
△ Less
Submitted 20 February, 2018;
originally announced February 2018.
-
Learning Deep Features for One-Class Classification
Authors:
Pramuditha Perera,
Vishal M. Patel
Abstract:
We propose a deep learning-based solution for the problem of feature learning in one-class classification. The proposed method operates on top of a Convolutional Neural Network (CNN) of choice and produces descriptive features while maintaining a low intra-class variance in the feature space for the given class. For this purpose two loss functions, compactness loss and descriptiveness loss are pro…
▽ More
We propose a deep learning-based solution for the problem of feature learning in one-class classification. The proposed method operates on top of a Convolutional Neural Network (CNN) of choice and produces descriptive features while maintaining a low intra-class variance in the feature space for the given class. For this purpose two loss functions, compactness loss and descriptiveness loss are proposed along with a parallel CNN architecture. A template matching-based framework is introduced to facilitate the testing process. Extensive experiments on publicly available anomaly detection, novelty detection and mobile active authentication datasets show that the proposed Deep One-Class (DOC) classification method achieves significant improvements over the state-of-the-art.
△ Less
Submitted 16 May, 2019; v1 submitted 16 January, 2018;
originally announced January 2018.
-
Robust Sparse Fourier Transform Based on The Fourier Projection-Slice Theorem
Authors:
Shaogang Wang,
Vishal M. Patel,
Athina Petropulu
Abstract:
The state-of-the-art automotive radars employ multidimensional discrete Fourier transforms (DFT) in order to estimate various target parameters. The DFT is implemented using the fast Fourier transform (FFT), at sample and computational complexity of $O(N)$ and $O(N \log N)$, respectively, where $N$ is the number of samples in the signal space. We have recently proposed a sparse Fourier transform b…
▽ More
The state-of-the-art automotive radars employ multidimensional discrete Fourier transforms (DFT) in order to estimate various target parameters. The DFT is implemented using the fast Fourier transform (FFT), at sample and computational complexity of $O(N)$ and $O(N \log N)$, respectively, where $N$ is the number of samples in the signal space. We have recently proposed a sparse Fourier transform based on the Fourier projection-slice theorem (FPS-SFT), which applies to multidimensional signals that are sparse in the frequency domain. FPS-SFT achieves sample complexity of $O(K)$ and computational complexity of $O(K \log K)$ for a multidimensional, $K$-sparse signal. While FPS-SFT considers the ideal scenario, i.e., exactly sparse data that contains on-grid frequencies, in this paper, by extending FPS-SFT into a robust version (RFPS-SFT), we emphasize on addressing noisy signals that contain off-grid frequencies; such signals arise from radar applications. This is achieved by employing a windowing technique and a voting-based frequency decoding procedure; the former reduces the frequency leakage of the off-grid frequencies below the noise level to preserve the sparsity of the signal, while the latter significantly lowers the frequency localization error stemming from the noise. The performance of the proposed method is demonstrated both theoretically and numerically.
△ Less
Submitted 10 December, 2017;
originally announced January 2018.
-
Face Synthesis from Visual Attributes via Sketch using Conditional VAEs and GANs
Authors:
Xing Di,
Vishal M. Patel
Abstract:
Automatic synthesis of faces from visual attributes is an important problem in computer vision and has wide applications in law enforcement and entertainment. With the advent of deep generative convolutional neural networks (CNNs), attempts have been made to synthesize face images from attributes and text descriptions. In this paper, we take a different approach, where we formulate the original pr…
▽ More
Automatic synthesis of faces from visual attributes is an important problem in computer vision and has wide applications in law enforcement and entertainment. With the advent of deep generative convolutional neural networks (CNNs), attempts have been made to synthesize face images from attributes and text descriptions. In this paper, we take a different approach, where we formulate the original problem as a stage-wise learning problem. We first synthesize the facial sketch corresponding to the visual attributes and then we reconstruct the face image based on the synthesized sketch. The proposed Attribute2Sketch2Face framework, which is based on a combination of deep Conditional Variational Autoencoder (CVAE) and Generative Adversarial Networks (GANs), consists of three stages: (1) Synthesis of facial sketch from attributes using a CVAE architecture, (2) Enhancement of coarse sketches to produce sharper sketches using a GAN-based framework, and (3) Synthesis of face from sketch using another GAN-based network. Extensive experiments and comparison with recent methods are performed to verify the effectiveness of the proposed attribute-based three stage face synthesis method.
△ Less
Submitted 29 December, 2017;
originally announced January 2018.
-
SPRK: A Low-Cost Stewart Platform For Motion Study In Surgical Robotics
Authors:
Vatsal Patel,
Sanjay Krishnan,
Aimee Goncalves,
Ken Goldberg
Abstract:
To simulate body organ motion due to breathing, heart beats, or peristaltic movements, we designed a low-cost, miniaturized SPRK (Stewart Platform Research Kit) to translate and rotate phantom tissue. This platform is 20cm x 20cm x 10cm to fit in the workspace of a da Vinci Research Kit (DVRK) surgical robot and costs $250, two orders of magnitude less than a commercial Stewart platform. The platf…
▽ More
To simulate body organ motion due to breathing, heart beats, or peristaltic movements, we designed a low-cost, miniaturized SPRK (Stewart Platform Research Kit) to translate and rotate phantom tissue. This platform is 20cm x 20cm x 10cm to fit in the workspace of a da Vinci Research Kit (DVRK) surgical robot and costs $250, two orders of magnitude less than a commercial Stewart platform. The platform has a range of motion of +/- 1.27 cm in translation along x, y, and z directions and has motion modes for sinusoidal motion and breathing-inspired motion. Modular platform mounts were also designed for pattern cutting and debridement experiments. The platform's positional controller has a time-constant of 0.2 seconds and the root-mean-square error is 1.22 mm, 1.07 mm, and 0.20 mm in x, y, and z directions respectively. All the details, CAD models, and control software for the platform is available at github.com/BerkeleyAutomation/sprk.
△ Less
Submitted 7 December, 2017;
originally announced December 2017.
-
Using Intermittent Synchronization to Compensate for Rhythmic Body Motion During Autonomous Surgical Cutting and Debridement
Authors:
Vatsal Patel,
Sanjay Krishnan,
Aimee Goncalves,
Carolyn Chen,
Walter Doug Boyd,
Ken Goldberg
Abstract:
Anatomical structures are rarely static during a surgical procedure due to breathing, heartbeats, and peristaltic movements. Inspired by observing an expert surgeon, we propose an intermittent synchronization with the extrema of the rhythmic motion (i.e., the lowest velocity windows). We performed 2 experiments: (1) pattern cutting, and (2) debridement. In (1), we found that the intermittent synch…
▽ More
Anatomical structures are rarely static during a surgical procedure due to breathing, heartbeats, and peristaltic movements. Inspired by observing an expert surgeon, we propose an intermittent synchronization with the extrema of the rhythmic motion (i.e., the lowest velocity windows). We performed 2 experiments: (1) pattern cutting, and (2) debridement. In (1), we found that the intermittent synchronization approach, while 1.8x slower than tracking motion, was significantly more robust to noise and control latency, and it reduced the max cutting error by 2.6x In (2), a baseline approach with no synchronization achieves 62% success rate for each removal, while intermittent synchronization achieves 80%.
△ Less
Submitted 7 December, 2017;
originally announced December 2017.
-
FPS-SFT: A Multi-dimensional Sparse Fourier Transform Based on the Fourier Projection-slice Theorem
Authors:
Shaogang Wang,
Vishal M. Patel,
Athina Petropulu
Abstract:
We propose a multi-dimensional (M-D) sparse Fourier transform inspired by the idea of the Fourier projection-slice theorem, called FPS-SFT. FPS-SFT extracts samples along lines (1-dimensional slices from an M-D data cube), which are parameterized by random slopes and offsets. The discrete Fourier transform (DFT) along those lines represents projections of M-D DFT of the M-D data onto those lines.…
▽ More
We propose a multi-dimensional (M-D) sparse Fourier transform inspired by the idea of the Fourier projection-slice theorem, called FPS-SFT. FPS-SFT extracts samples along lines (1-dimensional slices from an M-D data cube), which are parameterized by random slopes and offsets. The discrete Fourier transform (DFT) along those lines represents projections of M-D DFT of the M-D data onto those lines. The M-D sinusoids that are contained in the signal can be reconstructed from the DFT along lines with a low sample and computational complexity provided that the signal is sparse in the frequency domain and the lines are appropriately designed. The performance of FPS-SFT is demonstrated both theoretically and numerically. A sparse image reconstruction application is illustrated, which shows the capability of the FPS-SFT in solving practical problems.
△ Less
Submitted 1 November, 2017;
originally announced November 2017.
-
In2I : Unsupervised Multi-Image-to-Image Translation Using Generative Adversarial Networks
Authors:
Pramuditha Perera,
Mahdi Abavisani,
Vishal M. Patel
Abstract:
In unsupervised image-to-image translation, the goal is to learn the map** between an input image and an output image using a set of unpaired training images. In this paper, we propose an extension of the unsupervised image-to-image translation problem to multiple input setting. Given a set of paired images from multiple modalities, a transformation is learned to translate the input into a speci…
▽ More
In unsupervised image-to-image translation, the goal is to learn the map** between an input image and an output image using a set of unpaired training images. In this paper, we propose an extension of the unsupervised image-to-image translation problem to multiple input setting. Given a set of paired images from multiple modalities, a transformation is learned to translate the input into a specified domain. For this purpose, we introduce a Generative Adversarial Network (GAN) based framework along with a multi-modal generator structure and a new loss term, latent consistency loss. Through various experiments we show that leveraging multiple inputs generally improves the visual quality of the translated images. Moreover, we show that the proposed method outperforms current state-of-the-art unsupervised image-to-image translation methods.
△ Less
Submitted 25 November, 2017;
originally announced November 2017.
-
Solving Graph Isomorphism Problem for a Special case
Authors:
Vaibhav Amit Patel
Abstract:
Graph isomorphism is an important computer science problem. The problem for the general case is unknown to be in polynomial time. The base algorithm for the general case works in quasi-polynomial time. The solutions in polynomial time for some special type of classes are known. In this work, we have worked with a special type of graphs. We have proposed a method to represent these graphs and findi…
▽ More
Graph isomorphism is an important computer science problem. The problem for the general case is unknown to be in polynomial time. The base algorithm for the general case works in quasi-polynomial time. The solutions in polynomial time for some special type of classes are known. In this work, we have worked with a special type of graphs. We have proposed a method to represent these graphs and finding isomorphism between these graphs. The method uses a modified version of the degree list of a graph and neighbourhood degree list. These special type of graphs have a property that neighbourhood degree list of any two immediate neighbours is different for every vertex.The representation becomes invariant to the order in which the node was selected for giving the representation making the isomorphism problem trivial for this case. The algorithm works in $O(n^4)$ time, where n is the number of vertices present in the graph. The proposed algorithm runs faster than quasi-polynomial time for the graphs used in the study.
△ Less
Submitted 22 November, 2017;
originally announced November 2017.
-
WAKE: Wavelet Decomposition Coupled with Adaptive Kalman Filtering for Pathological Tremor Extraction
Authors:
Soroosh Shahtalebi,
Seyed Farokh Atashzar,
Rajni V. Patel,
Arash Mohammadi
Abstract:
Pathological Hand Tremor (PHT) is among common symptoms of several neurological movement disorders, which can significantly degrade quality of life of affected individuals. Beside pharmaceutical and surgical therapies, mechatronic technologies have been utilized to control PHTs. Most of these technologies function based on estimation, extraction, and characterization of tremor movement signals. Re…
▽ More
Pathological Hand Tremor (PHT) is among common symptoms of several neurological movement disorders, which can significantly degrade quality of life of affected individuals. Beside pharmaceutical and surgical therapies, mechatronic technologies have been utilized to control PHTs. Most of these technologies function based on estimation, extraction, and characterization of tremor movement signals. Real-time extraction of tremor signal is of paramount importance because of its application in assistive and rehabilitative devices. In this paper, we propose a novel on-line adaptive method which can adjust the hyper-parameters of the filter to the variable characteristics of the tremor. The proposed "WAKE: Wavelet decomposition coupled with Adaptive Kalman filtering technique for pathological tremor Extraction, referred to as the WAKE framework" is composed of a new adaptive Kalman filter and a wavelet transform core to provide indirect prediction of the tremor, one sample ahead of time, to be used for its suppression. In this paper, the design, implementation and evaluation of WAKE are given. The performance is evaluated based on three different datasets, the first one is a synthetic dataset, developed in this work, that simulates hand tremor under ten different conditions. The second and third ones are real datasets recorded from patients with PHTs. The results obtained from the proposed WAKE framework demonstrate significant improvements in the estimation accuracy in comparison with two well regarded techniques in the literature.
△ Less
Submitted 10 October, 2018; v1 submitted 18 November, 2017;
originally announced November 2017.
-
On perfect powers that are sums of cubes of a three term arithmetic progression
Authors:
Alejandro Argáez-García,
Vandita Patel
Abstract:
Using only elementary arguments, Cassels and Uchiyama (independently) determined all squares that are sums of three consecutive cubes. Zhongfeng Zhang extended this result and determined all perfect powers that are sums of three consecutive cubes. Recently, the equation $(x-r)^k + x^k + (x+r)^k$ has been studied for $k=4$ by Zhongfeng Zhang and for $k=2$ by Koutsianas. In this paper, we complement…
▽ More
Using only elementary arguments, Cassels and Uchiyama (independently) determined all squares that are sums of three consecutive cubes. Zhongfeng Zhang extended this result and determined all perfect powers that are sums of three consecutive cubes. Recently, the equation $(x-r)^k + x^k + (x+r)^k$ has been studied for $k=4$ by Zhongfeng Zhang and for $k=2$ by Koutsianas. In this paper, we complement the work of Cassels, Koutsianas and Zhang by considering the case when $k=3$ and showing that the equation $(x-r)^3+x^3+(x+r)^3=y^n$ with $n\geq 5$ a prime and $0 < r \leq 10^6$ only has trivial solutions $(x,y,n)$ which satisfy $xy=0$.
△ Less
Submitted 16 November, 2017;
originally announced November 2017.
-
High-Quality Facial Photo-Sketch Synthesis Using Multi-Adversarial Networks
Authors:
Lidan Wang,
Vishwanath A. Sindagi,
Vishal M. Patel
Abstract:
Synthesizing face sketches from real photos and its inverse have many applications. However, photo/sketch synthesis remains a challenging problem due to the fact that photo and sketch have different characteristics. In this work, we consider this task as an image-to-image translation problem and explore the recently popular generative models (GANs) to generate high-quality realistic photos from sk…
▽ More
Synthesizing face sketches from real photos and its inverse have many applications. However, photo/sketch synthesis remains a challenging problem due to the fact that photo and sketch have different characteristics. In this work, we consider this task as an image-to-image translation problem and explore the recently popular generative models (GANs) to generate high-quality realistic photos from sketches and sketches from photos. Recent GAN-based methods have shown promising results on image-to-image translation problems and photo-to-sketch synthesis in particular, however, they are known to have limited abilities in generating high-resolution realistic images. To this end, we propose a novel synthesis framework called Photo-Sketch Synthesis using Multi-Adversarial Networks, (PS2-MAN) that iteratively generates low resolution to high resolution images in an adversarial way. The hidden layers of the generator are supervised to first generate lower resolution images followed by implicit refinement in the network to generate higher resolution images. Furthermore, since photo-sketch synthesis is a coupled/paired translation problem, we leverage the pair information using CycleGAN framework. Both Image Quality Assessment (IQA) and Photo-Sketch Matching experiments are conducted to demonstrate the superior performance of our framework in comparison to existing state-of-the-art solutions. Code available at: https://github.com/lidan1/PhotoSketchMAN.
△ Less
Submitted 2 March, 2018; v1 submitted 27 October, 2017;
originally announced October 2017.
-
Design & development of position sensitive detector for hard X-ray using SiPM and new generation scintillators
Authors:
S. K. Goyal,
Amisha P. Naik,
Mithun N. P. S.,
S. V. Vadawale,
Neeraj K. Tiwari,
T. Chattopadhyay,
N. Nagrani,
S. Madhavi,
T. ladiya,
A. R. Patel,
M. Shanmugam,
H. L. Adalja,
V. R. patel,
G. P. Ubale
Abstract:
There is growing interest in high-energy astrophysics community for the development of sensitive instruments in the hard X-ray energy extending to few hundred keV. This requires position sensitive detector modules with high efficiency in the hard X-ray energy range. Here, we present development of a detector module, which consists of 25 mm x 25 mm CeBr3 scintillation detector, read out by a custom…
▽ More
There is growing interest in high-energy astrophysics community for the development of sensitive instruments in the hard X-ray energy extending to few hundred keV. This requires position sensitive detector modules with high efficiency in the hard X-ray energy range. Here, we present development of a detector module, which consists of 25 mm x 25 mm CeBr3 scintillation detector, read out by a custom designed two dimensional array of Silicon Photo-Multipliers (SiPM). Readout of common cathode of SiPMs provides the spectral measurement whereas the readout of individual SiPM anodes provides measurement of interaction position in the crystal. Preliminary results for spectral and position measurements with the detector module are presented here.
△ Less
Submitted 19 October, 2017;
originally announced October 2017.
-
GP-GAN: Gender Preserving GAN for Synthesizing Faces from Landmarks
Authors:
Xing Di,
Vishwanath A. Sindagi,
Vishal M. Patel
Abstract:
Facial landmarks constitute the most compressed representation of faces and are known to preserve information such as pose, gender and facial structure present in the faces. Several works exist that attempt to perform high-level face-related analysis tasks based on landmarks. In contrast, in this work, an attempt is made to tackle the inverse problem of synthesizing faces from their respective lan…
▽ More
Facial landmarks constitute the most compressed representation of faces and are known to preserve information such as pose, gender and facial structure present in the faces. Several works exist that attempt to perform high-level face-related analysis tasks based on landmarks. In contrast, in this work, an attempt is made to tackle the inverse problem of synthesizing faces from their respective landmarks. The primary aim of this work is to demonstrate that information preserved by landmarks (gender in particular) can be further accentuated by leveraging generative models to synthesize corresponding faces. Though the problem is particularly challenging due to its ill-posed nature, we believe that successful synthesis will enable several applications such as boosting performance of high-level face related tasks using landmark points and performing dataset augmentation. To this end, a novel face-synthesis method known as Gender Preserving Generative Adversarial Network (GP-GAN) that is guided by adversarial loss, perceptual loss and a gender preserving loss is presented. Further, we propose a novel generator sub-network UDeNet for GP-GAN that leverages advantages of U-Net and DenseNet architectures. Extensive experiments and comparison with recent methods are performed to verify the effectiveness of the proposed method.
△ Less
Submitted 25 April, 2018; v1 submitted 2 October, 2017;
originally announced October 2017.
-
The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems
Authors:
Vivak Patel
Abstract:
In several experimental reports on nonconvex optimization problems in machine learning, stochastic gradient descent (SGD) was observed to prefer minimizers with flat basins in comparison to more deterministic methods, yet there is very little rigorous understanding of this phenomenon. In fact, the lack of such work has led to an unverified, but widely-accepted stochastic mechanism describing why S…
▽ More
In several experimental reports on nonconvex optimization problems in machine learning, stochastic gradient descent (SGD) was observed to prefer minimizers with flat basins in comparison to more deterministic methods, yet there is very little rigorous understanding of this phenomenon. In fact, the lack of such work has led to an unverified, but widely-accepted stochastic mechanism describing why SGD prefers flatter minimizers to sharper minimizers. However, as we demonstrate, the stochastic mechanism fails to explain this phenomenon. Here, we propose an alternative deterministic mechanism that can accurately explain why SGD prefers flatter minimizers to sharper minimizers. We derive this mechanism based on a detailed analysis of a generic stochastic quadratic problem, which generalizes known results for classical gradient descent. Finally, we verify the predictions of our deterministic mechanism on two nonconvex problems.
△ Less
Submitted 5 May, 2018; v1 submitted 14 September, 2017;
originally announced September 2017.
-
Solutions to the Cosmic Initial Entropy Problem without Equilibrium Initial Conditions
Authors:
Vihan M. Patel,
and Charles H. Lineweaver
Abstract:
The entropy of the observable universe is increasing. Thus, at earlier times the entropy was lower. However, the cosmic microwave background radiation reveals an apparently high entropy universe close to thermal and chemical equilibrium. A two-part solution to this cosmic initial entropy problem is proposed. Following Penrose, we argue that the evenly distributed matter of the early universe is eq…
▽ More
The entropy of the observable universe is increasing. Thus, at earlier times the entropy was lower. However, the cosmic microwave background radiation reveals an apparently high entropy universe close to thermal and chemical equilibrium. A two-part solution to this cosmic initial entropy problem is proposed. Following Penrose, we argue that the evenly distributed matter of the early universe is equivalent to low gravitational entropy. There are two competing explanations for how this initial low gravitational entropy comes about. (1) Inflation and baryogenesis produce a virtually homogeneous distribution of matter with a low gravitational entropy. (2) Dissatisfied with explaining a low gravitational entropy as the product of a 'special' scalar field, some theorists argue (following Boltzmann) for a 'more natural' initial condition in which the entire universe is in an initial equilibrium state of maximum entropy. In this equilibrium model, our observable universe is an unusual low entropy fluctuation embedded in a high entropy universe. The anthropic principle and the fluctuation theorem suggest that this low entropy region should be as small as possible and have as large an entropy as possible, consistent with our existence. However, our low entropy universe is much larger than needed to produce observers, and we see no evidence for an embedding in a higher entropy background. The initial conditions of inflationary models are as natural as the equilibrium background favored by many theorists.
△ Less
Submitted 10 August, 2017;
originally announced August 2017.
-
Generative Adversarial Network-based Synthesis of Visible Faces from Polarimetric Thermal Faces
Authors:
He Zhang,
Vishal M. Patel,
Benjamin S. Riggan,
Shuowen Hu
Abstract:
The large domain discrepancy between faces captured in polarimetric (or conventional) thermal and visible domain makes cross-domain face recognition quite a challenging problem for both human-examiners and computer vision algorithms. Previous approaches utilize a two-step procedure (visible feature estimation and visible image reconstruction) to synthesize the visible image given the corresponding…
▽ More
The large domain discrepancy between faces captured in polarimetric (or conventional) thermal and visible domain makes cross-domain face recognition quite a challenging problem for both human-examiners and computer vision algorithms. Previous approaches utilize a two-step procedure (visible feature estimation and visible image reconstruction) to synthesize the visible image given the corresponding polarimetric thermal image. However, these are regarded as two disjoint steps and hence may hinder the performance of visible face reconstruction. We argue that joint optimization would be a better way to reconstruct more photo-realistic images for both computer vision algorithms and human-examiners to examine. To this end, this paper proposes a Generative Adversarial Network-based Visible Face Synthesis (GAN-VFS) method to synthesize more photo-realistic visible face images from their corresponding polarimetric images. To ensure that the encoded visible-features contain more semantically meaningful information in reconstructing the visible face image, a guidance sub-network is involved into the training procedure. To achieve photo realistic property while preserving discriminative characteristics for the reconstructed outputs, an identity loss combined with the perceptual loss are optimized in the framework. Multiple experiments evaluated on different experimental protocols demonstrate that the proposed method achieves state-of-the-art performance.
△ Less
Submitted 8 August, 2017;
originally announced August 2017.
-
Generating High-Quality Crowd Density Maps using Contextual Pyramid CNNs
Authors:
Vishwanath A. Sindagi,
Vishal M. Patel
Abstract:
We present a novel method called Contextual Pyramid CNN (CP-CNN) for generating high-quality crowd density and count estimation by explicitly incorporating global and local contextual information of crowd images. The proposed CP-CNN consists of four modules: Global Context Estimator (GCE), Local Context Estimator (LCE), Density Map Estimator (DME) and a Fusion-CNN (F-CNN). GCE is a VGG-16 based CN…
▽ More
We present a novel method called Contextual Pyramid CNN (CP-CNN) for generating high-quality crowd density and count estimation by explicitly incorporating global and local contextual information of crowd images. The proposed CP-CNN consists of four modules: Global Context Estimator (GCE), Local Context Estimator (LCE), Density Map Estimator (DME) and a Fusion-CNN (F-CNN). GCE is a VGG-16 based CNN that encodes global context and it is trained to classify input images into different density classes, whereas LCE is another CNN that encodes local context information and it is trained to perform patch-wise classification of input images into different density classes. DME is a multi-column architecture-based CNN that aims to generate high-dimensional feature maps from the input image which are fused with the contextual information estimated by GCE and LCE using F-CNN. To generate high resolution and high-quality density maps, F-CNN uses a set of convolutional and fractionally-strided convolutional layers and it is trained along with the DME in an end-to-end fashion using a combination of adversarial loss and pixel-level Euclidean loss. Extensive experiments on highly challenging datasets show that the proposed method achieves significant improvements over the state-of-the-art methods.
△ Less
Submitted 2 August, 2017;
originally announced August 2017.
-
Joint Transmission Map Estimation and Dehazing using Deep Networks
Authors:
He Zhang,
Vishwanath Sindagi,
Vishal M. Patel
Abstract:
Single image haze removal is an extremely challenging problem due to its inherent ill-posed nature. Several prior-based and learning-based methods have been proposed in the literature to solve this problem and they have achieved superior results. However, most of the existing methods assume constant atmospheric light model and tend to follow a two-step procedure involving prior-based methods for e…
▽ More
Single image haze removal is an extremely challenging problem due to its inherent ill-posed nature. Several prior-based and learning-based methods have been proposed in the literature to solve this problem and they have achieved superior results. However, most of the existing methods assume constant atmospheric light model and tend to follow a two-step procedure involving prior-based methods for estimating transmission map followed by calculation of dehazed image using the closed form solution. In this paper, we relax the constant atmospheric light assumption and propose a novel unified single image dehazing network that jointly estimates the transmission map and performs dehazing. In other words, our new approach provides an end-to-end learning framework, where the inherent transmission map and dehazed result are learned directly from the loss function. Extensive experiments on synthetic and real datasets with challenging hazy images demonstrate that the proposed method achieves significant improvements over the state-of-the-art methods.
△ Less
Submitted 20 April, 2019; v1 submitted 1 August, 2017;
originally announced August 2017.
-
CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting
Authors:
Vishwanath A. Sindagi,
Vishal M. Patel
Abstract:
Estimating crowd count in densely crowded scenes is an extremely challenging task due to non-uniform scale variations. In this paper, we propose a novel end-to-end cascaded network of CNNs to jointly learn crowd count classification and density map estimation. Classifying crowd count into various groups is tantamount to coarsely estimating the total count in the image thereby incorporating a high-…
▽ More
Estimating crowd count in densely crowded scenes is an extremely challenging task due to non-uniform scale variations. In this paper, we propose a novel end-to-end cascaded network of CNNs to jointly learn crowd count classification and density map estimation. Classifying crowd count into various groups is tantamount to coarsely estimating the total count in the image thereby incorporating a high-level prior into the density estimation network. This enables the layers in the network to learn globally relevant discriminative features which aid in estimating highly refined density maps with lower count error. The joint training is performed in an end-to-end fashion. Extensive experiments on highly challenging publicly available datasets show that the proposed method achieves lower count error and better quality density maps as compared to the recent state-of-the-art methods.
△ Less
Submitted 16 August, 2017; v1 submitted 30 July, 2017;
originally announced July 2017.
-
Perfect powers that are sums of consecutive squares
Authors:
Vandita Patel
Abstract:
We determine all perfect powers that can be written as the sum of at most 10 consecutive squares.
We determine all perfect powers that can be written as the sum of at most 10 consecutive squares.
△ Less
Submitted 20 July, 2017;
originally announced July 2017.
-
Computing the number of induced copies of a fixed graph in a bounded degree graph
Authors:
Viresh Patel,
Guus Regts
Abstract:
In this paper we show that for any graph $H$ of order $m$ and any graph $G$ of order $n$ and maximum degree $Δ$ one can compute the number of subsets $S$ of $V(G)$ that induces a graph isomorphic to $H $in time $O(c^m\cdot n)$ for some constant $c = c(Δ) > 0$. This is essentially best possible.
In this paper we show that for any graph $H$ of order $m$ and any graph $G$ of order $n$ and maximum degree $Δ$ one can compute the number of subsets $S$ of $V(G)$ that induces a graph isomorphic to $H $in time $O(c^m\cdot n)$ for some constant $c = c(Δ) > 0$. This is essentially best possible.
△ Less
Submitted 20 September, 2017; v1 submitted 14 July, 2017;
originally announced July 2017.
-
Synthesis-based Robust Low Resolution Face Recognition
Authors:
Sumit Shekhar,
Vishal M. Patel,
Rama Chellappa
Abstract:
Recognition of low resolution face images is a challenging problem in many practical face recognition systems. Methods have been proposed in the face recognition literature for the problem which assume that the probe is low resolution, but a high resolution gallery is available for recognition. These attempts have been aimed at modifying the probe image such that the resultant image provides bette…
▽ More
Recognition of low resolution face images is a challenging problem in many practical face recognition systems. Methods have been proposed in the face recognition literature for the problem which assume that the probe is low resolution, but a high resolution gallery is available for recognition. These attempts have been aimed at modifying the probe image such that the resultant image provides better discrimination. We formulate the problem differently by leveraging the information available in the high resolution gallery image and propose a dictionary learning approach for classifying the low-resolution probe image. An important feature of our algorithm is that it can handle resolution change along with illumination variations. Furthermore, we also kernelize the algorithm to handle non-linearity in data and present a joint dictionary learning technique for robust recognition at low resolutions. The effectiveness of the proposed method is demonstrated using standard datasets and a challenging outdoor face dataset. It is shown that our method is efficient and can perform significantly better than many competitive low resolution face recognition algorithms.
△ Less
Submitted 10 July, 2017;
originally announced July 2017.
-
A Survey of Recent Advances in CNN-based Single Image Crowd Counting and Density Estimation
Authors:
Vishwanath A. Sindagi,
Vishal M. Patel
Abstract:
Estimating count and density maps from crowd images has a wide range of applications such as video surveillance, traffic monitoring, public safety and urban planning. In addition, techniques developed for crowd counting can be applied to related tasks in other fields of study such as cell microscopy, vehicle counting and environmental survey. The task of crowd counting and density map estimation i…
▽ More
Estimating count and density maps from crowd images has a wide range of applications such as video surveillance, traffic monitoring, public safety and urban planning. In addition, techniques developed for crowd counting can be applied to related tasks in other fields of study such as cell microscopy, vehicle counting and environmental survey. The task of crowd counting and density map estimation is riddled with many challenges such as occlusions, non-uniform density, intra-scene and inter-scene variations in scale and perspective. Nevertheless, over the last few years, crowd count analysis has evolved from earlier methods that are often limited to small variations in crowd density and scales to the current state-of-the-art methods that have developed the ability to perform successfully on a wide range of scenarios. The success of crowd counting methods in the recent years can be largely attributed to deep learning and publications of challenging datasets. In this paper, we provide a comprehensive survey of recent Convolutional Neural Network (CNN) based approaches that have demonstrated significant improvements over earlier methods that rely largely on hand-crafted representations. First, we briefly review the pioneering methods that use hand-crafted representations and then we delve in detail into the deep learning-based approaches and recently published datasets. Furthermore, we discuss the merits and drawbacks of existing CNN-based approaches and identify promising avenues of research in this rapidly evolving field.
△ Less
Submitted 4 July, 2017;
originally announced July 2017.
-
On perfect powers that are sums of two Fibonacci numbers
Authors:
Florian Luca,
Vandita Patel
Abstract:
We study the equation $F_n + F_m = y^p$, where $F_n$ and $F_m$ are respectively the $n$-th and $m$-th Fibonacci numbers and $p \ge 2$. We find all solutions under the assumption $n \equiv m \pmod{2}$.
We study the equation $F_n + F_m = y^p$, where $F_n$ and $F_m$ are respectively the $n$-th and $m$-th Fibonacci numbers and $p \ge 2$. We find all solutions under the assumption $n \equiv m \pmod{2}$.
△ Less
Submitted 30 June, 2017;
originally announced June 2017.
-
SAR Image Despeckling Using a Convolutional Neural Network
Authors:
Puyang Wang,
He Zhang,
Vishal M. Patel
Abstract:
Synthetic Aperture Radar (SAR) images are often contaminated by a multiplicative noise known as speckle. Speckle makes the processing and interpretation of SAR images difficult. We propose a deep learning-based approach called, Image Despeckling Convolutional Neural Network (ID-CNN), for automatically removing speckle from the input noisy images. In particular, ID-CNN uses a set of convolutional l…
▽ More
Synthetic Aperture Radar (SAR) images are often contaminated by a multiplicative noise known as speckle. Speckle makes the processing and interpretation of SAR images difficult. We propose a deep learning-based approach called, Image Despeckling Convolutional Neural Network (ID-CNN), for automatically removing speckle from the input noisy images. In particular, ID-CNN uses a set of convolutional layers along with batch normalization and rectified linear unit (ReLU) activation function and a component-wise division residual layer to estimate speckle and it is trained in an end-to-end fashion using a combination of Euclidean loss and Total Variation (TV) loss. Extensive experiments on synthetic and real SAR images show that the proposed method achieves significant improvements over the state-of-the-art speckle reduction methods.
△ Less
Submitted 25 June, 2018; v1 submitted 2 June, 2017;
originally announced June 2017.
-
Sparse Representation-based Open Set Recognition
Authors:
He Zhang,
Vishal M. Patel
Abstract:
We propose a generalized Sparse Representation- based Classification (SRC) algorithm for open set recognition where not all classes presented during testing are known during training. The SRC algorithm uses class reconstruction errors for classification. As most of the discriminative information for open set recognition is hidden in the tail part of the matched and sum of non-matched reconstructio…
▽ More
We propose a generalized Sparse Representation- based Classification (SRC) algorithm for open set recognition where not all classes presented during testing are known during training. The SRC algorithm uses class reconstruction errors for classification. As most of the discriminative information for open set recognition is hidden in the tail part of the matched and sum of non-matched reconstruction error distributions, we model the tail of those two error distributions using the statistical Extreme Value Theory (EVT). Then we simplify the open set recognition problem into a set of hypothesis testing problems. The confidence scores corresponding to the tail distributions of a novel test sample are then fused to determine its identity. The effectiveness of the proposed method is demonstrated using four publicly available image and object classification datasets and it is shown that this method can perform significantly better than many competitive open set recognition algorithms. Code is public available: https://github.com/hezhangsprinter/SROSR
△ Less
Submitted 5 May, 2017;
originally announced May 2017.
-
Machine Vision System for 3D Plant Phenoty**
Authors:
Ayan Chaudhury,
Christopher Ward,
Ali Talasaz,
Alexander G. Ivanov,
Mark Brophy,
Bernard Grodzinski,
Norman P. A. Huner,
Rajni V. Patel,
John L. Barron
Abstract:
Machine vision for plant phenoty** is an emerging research area for producing high throughput in agriculture and crop science applications. Since 2D based approaches have their inherent limitations, 3D plant analysis is becoming state of the art for current phenoty** technologies. We present an automated system for analyzing plant growth in indoor conditions. A gantry robot system is used to p…
▽ More
Machine vision for plant phenoty** is an emerging research area for producing high throughput in agriculture and crop science applications. Since 2D based approaches have their inherent limitations, 3D plant analysis is becoming state of the art for current phenoty** technologies. We present an automated system for analyzing plant growth in indoor conditions. A gantry robot system is used to perform scanning tasks in an automated manner throughout the lifetime of the plant. A 3D laser scanner mounted as the robot's payload captures the surface point cloud data of the plant from multiple views. The plant is monitored from the vegetative to reproductive stages in light/dark cycles inside a controllable growth chamber. An efficient 3D reconstruction algorithm is used, by which multiple scans are aligned together to obtain a 3D mesh of the plant, followed by surface area and volume computations. The whole system, including the programmable growth chamber, robot, scanner, data transfer and analysis is fully automated in such a way that a naive user can, in theory, start the system with a mouse click and get back the growth analysis results at the end of the lifetime of the plant with no intermediate intervention. As evidence of its functionality, we show and analyze quantitative results of the rhythmic growth patterns of the dicot Arabidopsis thaliana(L.), and the monocot barley (Hordeum vulgare L.) plants under their diurnal light/dark cycles.
△ Less
Submitted 27 April, 2017;
originally announced May 2017.
-
Testing nonlocal models of electron thermal conduction for magnetic and inertial confinement fusion applications
Authors:
Jonathan Peter Brodrick,
Robert J. Kingham,
Michael M. Marinak,
Mehul V. Patel,
Alex V. Chankin,
John Omotani,
Maxim Umansky,
Dario Del Sorbo,
Ben Dudson,
Joseph Thomas Parker,
Gary D. Kerbel,
Mark Sherlock,
Christopher P Ridgers
Abstract:
Three models for nonlocal electron thermal transport are here compared against Vlasov-Fokker-Planck (VFP) codes to assess their accuracy in situations relevant to both inertial fusion hohlraums and tokamak scrape-off layers. The models tested are (i) a moment-based approach using an eigenvector integral closure (EIC) originally developed by Ji, Held and Sovinec; (ii) the non-Fourier Landau-fluid (…
▽ More
Three models for nonlocal electron thermal transport are here compared against Vlasov-Fokker-Planck (VFP) codes to assess their accuracy in situations relevant to both inertial fusion hohlraums and tokamak scrape-off layers. The models tested are (i) a moment-based approach using an eigenvector integral closure (EIC) originally developed by Ji, Held and Sovinec; (ii) the non-Fourier Landau-fluid (NFLF) model of Dimits, Joseph and Umansky; and (iii) Schurtz, Nicolaï and Busquet's multigroup diffusion model (SNB). We find that while the EIC and NFLF models accurately predict the dam** rate of a small-amplitude temperature perturbation (within 10% at moderate collisionalities), they overestimate the peak heat flow by as much as 35% and do not predict preheat in the more relevant case where there is a large temperature difference. The SNB model, however, agrees better with VFP results for the latter problem if care is taken with the definition of the mean free path. Additionally, we present for the first time a comparison of the SNB model against a VFP code for a hohlraum-relevant problem with inhomogeneous ionisation and show that the model overestimates the heat flow in the helium gas-fill by a factor of ~2 despite predicting the peak heat flux to within 16%.
△ Less
Submitted 6 September, 2017; v1 submitted 28 April, 2017;
originally announced April 2017.
-
On the difference between permutation polynomials over finite fields
Authors:
Nurdagül Anbar,
Almasa Oduzak,
Vandita Patel,
Luciane Quoos,
Anna Somoza,
Alev Topuzoğlu
Abstract:
The well-known Chowla and Zassenhaus conjecture, proven by Cohen in 1990, states that if $p>(d^2-3d+4)^2$, then there is no complete map** polynomial $f$ in $\Fp[x]$ of degree $d\ge 2$. For arbitrary finite fields $\Fq$, a similar non-existence result is obtained recently by I\c sık, Topuzo\u glu and Winterhof in terms of the Carlitz rank of $f$.
Cohen, Mullen and Shiue generalized the Chowla-…
▽ More
The well-known Chowla and Zassenhaus conjecture, proven by Cohen in 1990, states that if $p>(d^2-3d+4)^2$, then there is no complete map** polynomial $f$ in $\Fp[x]$ of degree $d\ge 2$. For arbitrary finite fields $\Fq$, a similar non-existence result is obtained recently by I\c sık, Topuzo\u glu and Winterhof in terms of the Carlitz rank of $f$.
Cohen, Mullen and Shiue generalized the Chowla-Zassenhaus-Cohen Theorem significantly in 1995, by considering differences of permutation polynomials. More precisely, they showed that if $f$ and $f+g$ are both permutation polynomials of degree $d\ge 2$ over $\Fp$, with $p>(d^2-3d+4)^2$, then the degree $k$ of $g$ satisfies $k \geq 3d/5$, unless $g$ is constant. In this article, assuming $f$ and $f+g$ are permutation polynomials in $\Fq[x]$, we give lower bounds for $k %=\mathrm{deg(h)} $ in terms of the Carlitz rank of $f$ and $q$. Our results generalize the above mentioned result of I\c sık et al. We also show for a special class of polynomials $f$ of Carlitz rank $n \geq 1$ that if $f+x^k$ is a permutation of $\Fq$, with $\gcd(k+1, q-1)=1$, then $k\geq (q-n)/(n+3)$.
△ Less
Submitted 23 March, 2017;
originally announced March 2017.
-
Learning from Ambiguously Labeled Face Images
Authors:
Ching-Hui Chen,
Vishal M. Patel,
Rama Chellappa
Abstract:
Learning a classifier from ambiguously labeled face images is challenging since training images are not always explicitly-labeled. For instance, face images of two persons in a news photo are not explicitly labeled by their names in the caption. We propose a Matrix Completion for Ambiguity Resolution (MCar) method for predicting the actual labels from ambiguously labeled images. This step is follo…
▽ More
Learning a classifier from ambiguously labeled face images is challenging since training images are not always explicitly-labeled. For instance, face images of two persons in a news photo are not explicitly labeled by their names in the caption. We propose a Matrix Completion for Ambiguity Resolution (MCar) method for predicting the actual labels from ambiguously labeled images. This step is followed by learning a standard supervised classifier from the disambiguated labels to classify new images. To prevent the majority labels from dominating the result of MCar, we generalize MCar to a weighted MCar (WMCar) that handles label imbalance. Since WMCar outputs a soft labeling vector of reduced ambiguity for each instance, we can iteratively refine it by feeding it as the input to WMCar. Nevertheless, such an iterative implementation can be affected by the noisy soft labeling vectors, and thus the performance may degrade. Our proposed Iterative Candidate Elimination (ICE) procedure makes the iterative ambiguity resolution possible by gradually eliminating a portion of least likely candidates in ambiguously labeled face. We further extend MCar to incorporate the labeling constraints between instances when such prior knowledge is available. Compared to existing methods, our approach demonstrates improvement on several ambiguously labeled datasets.
△ Less
Submitted 1 July, 2017; v1 submitted 14 February, 2017;
originally announced February 2017.
-
On SGD's Failure in Practice: Characterizing and Overcoming Stalling
Authors:
Vivak Patel
Abstract:
Stochastic Gradient Descent (SGD) is widely used in machine learning problems to efficiently perform empirical risk minimization, yet, in practice, SGD is known to stall before reaching the actual minimizer of the empirical risk. SGD stalling has often been attributed to its sensitivity to the conditioning of the problem; however, as we demonstrate, SGD will stall even when applied to a simple lin…
▽ More
Stochastic Gradient Descent (SGD) is widely used in machine learning problems to efficiently perform empirical risk minimization, yet, in practice, SGD is known to stall before reaching the actual minimizer of the empirical risk. SGD stalling has often been attributed to its sensitivity to the conditioning of the problem; however, as we demonstrate, SGD will stall even when applied to a simple linear regression problem with unity condition number for standard learning rates. Thus, in this work, we numerically demonstrate and mathematically argue that stalling is a crippling and generic limitation of SGD and its variants in practice. Once we have established the problem of stalling, we generalize an existing framework for hedging against its effects, which (1) deters SGD and its variants from stalling, (2) still provides convergence guarantees, and (3) makes SGD and its variants more practical methods for minimization.
△ Less
Submitted 7 February, 2017; v1 submitted 1 February, 2017;
originally announced February 2017.
-
Image De-raining Using a Conditional Generative Adversarial Network
Authors:
He Zhang,
Vishwanath Sindagi,
Vishal M. Patel
Abstract:
Severe weather conditions such as rain and snow adversely affect the visual quality of images captured under such conditions thus rendering them useless for further usage and sharing. In addition, such degraded images drastically affect performance of vision systems. Hence, it is important to solve the problem of single image de-raining/de-snowing. However, this is a difficult problem to solve due…
▽ More
Severe weather conditions such as rain and snow adversely affect the visual quality of images captured under such conditions thus rendering them useless for further usage and sharing. In addition, such degraded images drastically affect performance of vision systems. Hence, it is important to solve the problem of single image de-raining/de-snowing. However, this is a difficult problem to solve due to its inherent ill-posed nature. Existing approaches attempt to introduce prior information to convert it into a well-posed problem. In this paper, we investigate a new point of view in addressing the single image de-raining problem. Instead of focusing only on deciding what is a good prior or a good framework to achieve good quantitative and qualitative performance, we also ensure that the de-rained image itself does not degrade the performance of a given computer vision algorithm such as detection and classification. In other words, the de-rained result should be indistinguishable from its corresponding clear image to a given discriminator. This criterion can be directly incorporated into the optimization framework by using the recently introduced conditional generative adversarial networks (GANs). To minimize artifacts introduced by GANs and ensure better visual quality, a new refined loss function is introduced. Based on this, we propose a novel single image de-raining method called Image De-raining Conditional General Adversarial Network (ID-CGAN), which considers quantitative, visual and also discriminative performance into the objective function. Experiments evaluated on synthetic images and real images show that the proposed method outperforms many recent state-of-the-art single image de-raining methods in terms of quantitative and visual performance.
△ Less
Submitted 2 June, 2019; v1 submitted 20 January, 2017;
originally announced January 2017.
-
Bi-modal First Impressions Recognition using Temporally Ordered Deep Audio and Stochastic Visual Features
Authors:
Arulkumar Subramaniam,
Vismay Patel,
Ashish Mishra,
Prashanth Balasubramanian,
Anurag Mittal
Abstract:
We propose a novel approach for First Impressions Recognition in terms of the Big Five personality-traits from short videos. The Big Five personality traits is a model to describe human personality using five broad categories: Extraversion, Agreeableness, Conscientiousness, Neuroticism and Openness. We train two bi-modal end-to-end deep neural network architectures using temporally ordered audio a…
▽ More
We propose a novel approach for First Impressions Recognition in terms of the Big Five personality-traits from short videos. The Big Five personality traits is a model to describe human personality using five broad categories: Extraversion, Agreeableness, Conscientiousness, Neuroticism and Openness. We train two bi-modal end-to-end deep neural network architectures using temporally ordered audio and novel stochastic visual features from few frames, without over-fitting. We empirically show that the trained models perform exceptionally well, even after training from a small sub-portions of inputs. Our method is evaluated in ChaLearn LAP 2016 Apparent Personality Analysis (APA) competition using ChaLearn LAP APA2016 dataset and achieved excellent performance.
△ Less
Submitted 31 October, 2016;
originally announced October 2016.
-
Active User Authentication for Smartphones: A Challenge Data Set and Benchmark Results
Authors:
Upal Mahbub,
Sayantan Sarkar,
Vishal M. Patel,
Rama Chellappa
Abstract:
In this paper, automated user verification techniques for smartphones are investigated. A unique non-commercial dataset, the University of Maryland Active Authentication Dataset 02 (UMDAA-02) for multi-modal user authentication research is introduced. This paper focuses on three sensors - front camera, touch sensor and location service while providing a general description for other modalities. Be…
▽ More
In this paper, automated user verification techniques for smartphones are investigated. A unique non-commercial dataset, the University of Maryland Active Authentication Dataset 02 (UMDAA-02) for multi-modal user authentication research is introduced. This paper focuses on three sensors - front camera, touch sensor and location service while providing a general description for other modalities. Benchmark results for face detection, face verification, touch-based user identification and location-based next-place prediction are presented, which indicate that more robust methods fine-tuned to the mobile platform are needed to achieve satisfactory verification accuracy. The dataset will be made available to the research community for promoting additional research.
△ Less
Submitted 25 October, 2016;
originally announced October 2016.