Search | arXiv e-print repository

arXiv:2006.07794 [pdf, other]

doi 10.1609/aaai.v36i1.19938

PatchUp: A Feature-Space Block-Level Regularization Technique for Convolutional Neural Networks

Authors: Mojtaba Faramarzi, Mohammad Amini, Akilesh Badrinaaraayanan, Vikas Verma, Sarath Chandar

Abstract: Large capacity deep learning models are often prone to a high generalization gap when trained with a limited amount of labeled training data. A recent class of methods to address this problem uses various ways to construct a new training sample by mixing a pair (or more) of training samples. We propose PatchUp, a hidden state block-level regularization technique for Convolutional Neural Networks (… ▽ More Large capacity deep learning models are often prone to a high generalization gap when trained with a limited amount of labeled training data. A recent class of methods to address this problem uses various ways to construct a new training sample by mixing a pair (or more) of training samples. We propose PatchUp, a hidden state block-level regularization technique for Convolutional Neural Networks (CNNs), that is applied on selected contiguous blocks of feature maps from a random pair of samples. Our approach improves the robustness of CNN models against the manifold intrusion problem that may occur in other state-of-the-art mixing approaches. Moreover, since we are mixing the contiguous block of features in the hidden space, which has more dimensions than the input space, we obtain more diverse samples for training towards different dimensions. Our experiments on CIFAR10/100, SVHN, Tiny-ImageNet, and ImageNet using ResNet architectures including PreActResnet18/34, WRN-28-10, ResNet101/152 models show that PatchUp improves upon, or equals, the performance of current state-of-the-art regularizers for CNNs. We also show that PatchUp can provide a better generalization to deformed samples and is more robust against adversarial attacks. △ Less

Submitted 7 January, 2023; v1 submitted 14 June, 2020; originally announced June 2020.

Comments: AAAI - 2022

Journal ref: AAAI, vol. 36, no. 1, pp. 589-597, Jun. 2022

arXiv:2006.02158 [pdf, other]

Interpolation-based semi-supervised learning for object detection

Authors: Jisoo Jeong, Vikas Verma, Minsung Hyun, Juho Kannala, Nojun Kwak

Abstract: Despite the data labeling cost for the object detection tasks being substantially more than that of the classification tasks, semi-supervised learning methods for object detection have not been studied much. In this paper, we propose an Interpolation-based Semi-supervised learning method for object Detection (ISD), which considers and solves the problems caused by applying conventional Interpolati… ▽ More Despite the data labeling cost for the object detection tasks being substantially more than that of the classification tasks, semi-supervised learning methods for object detection have not been studied much. In this paper, we propose an Interpolation-based Semi-supervised learning method for object Detection (ISD), which considers and solves the problems caused by applying conventional Interpolation Regularization (IR) directly to object detection. We divide the output of the model into two types according to the objectness scores of both original patches that are mixed in IR. Then, we apply a separate loss suitable for each type in an unsupervised manner. The proposed losses dramatically improve the performance of semi-supervised learning as well as supervised learning. In the supervised learning setting, our method improves the baseline methods by a significant margin. In the semi-supervised learning setting, our algorithm improves the performance on a benchmark dataset (PASCAL VOC and MSCOCO) in a benchmark architecture (SSD). △ Less

Submitted 29 December, 2020; v1 submitted 3 June, 2020; originally announced June 2020.

arXiv:2005.09111 [pdf, other]

Topology optimization of nonlinear periodically microstructured materials for tailored homogenized constitutive properties

Authors: Reza Behrou, Maroun Abi Ghanem, Brianna C. Macnider, Vimarsh Verma, Ryan Alvey, **ho Hong, Ashley F. Emery, Hyunsun Alicia Kim, Nicholas Boechler

Abstract: A topology optimization method is presented for the design of periodic microstructured materials with prescribed homogenized nonlinear constitutive properties over finite strain ranges. The mechanical model assumes linear elastic isotropic materials, geometric nonlinearity at finite strain, and a quasi-static response. The optimization problem is solved by a nonlinear programming method and the se… ▽ More A topology optimization method is presented for the design of periodic microstructured materials with prescribed homogenized nonlinear constitutive properties over finite strain ranges. The mechanical model assumes linear elastic isotropic materials, geometric nonlinearity at finite strain, and a quasi-static response. The optimization problem is solved by a nonlinear programming method and the sensitivities computed via the adjoint method. Two-dimensional structures identified using this optimization method are additively manufactured and their uniaxial tensile strain response compared with the numerically predicted behavior. The optimization approach herein enables the design and development of lattice-like materials with prescribed nonlinear effective properties, for use in myriad potential applications, ranging from stress wave and vibration mitigation to soft robotics. △ Less

Submitted 18 May, 2020; originally announced May 2020.

arXiv:2004.10098 [pdf, other]

Continual Learning using a Bayesian Nonparametric Dictionary of Weight Factors

Authors: Nikhil Mehta, Kevin J Liang, Vinay K Verma, Lawrence Carin

Abstract: Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings, where data from previous tasks are unavailable. A number of methods, using various model expansion strategies, have been proposed recently as possible solutions. However, determining how much to expand the model is left to the practitioner, and often a constant schedule is chosen for simplicity,… ▽ More Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings, where data from previous tasks are unavailable. A number of methods, using various model expansion strategies, have been proposed recently as possible solutions. However, determining how much to expand the model is left to the practitioner, and often a constant schedule is chosen for simplicity, regardless of how complex the incoming task is. Instead, we propose a principled Bayesian nonparametric approach based on the Indian Buffet Process (IBP) prior, letting the data determine how much to expand the model complexity. We pair this with a factorization of the neural network's weight matrices. Such an approach allows the number of factors of each weight matrix to scale with the complexity of the task, while the IBP prior encourages sparse weight factor selection and factor reuse, promoting positive knowledge transfer between tasks. We demonstrate the effectiveness of our method on a number of continual learning benchmarks and analyze how weight factors are allocated and reused throughout the training. △ Less

Submitted 27 April, 2021; v1 submitted 21 April, 2020; originally announced April 2020.

Comments: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021 Post-conference updates: Fixed typo in equation (11) and updated references

arXiv:2003.11041 [pdf, other]

doi 10.1126/sciadv.aba4508

Connecting heterogeneous quantum networks by hybrid entanglement swap**

Authors: G. Guccione, T. Darras, H. Le Jeannic, V. B. Verma, S. W. Nam, A. Cavaillès, J. Laurat

Abstract: Recent advances in quantum technologies are rapidly stimulating the building of quantum networks. With the parallel development of multiple physical platforms and different types of encodings, a challenge for present and future networks is to uphold a heterogeneous structure for full functionality and therefore support modular systems that are not necessarily compatible with one another. Central t… ▽ More Recent advances in quantum technologies are rapidly stimulating the building of quantum networks. With the parallel development of multiple physical platforms and different types of encodings, a challenge for present and future networks is to uphold a heterogeneous structure for full functionality and therefore support modular systems that are not necessarily compatible with one another. Central to this endeavor is the capability to distribute and interconnect optical entangled states relying on different discrete and continuous quantum variables. Here we report an entanglement swap** protocol connecting such entangled states. We generate single-photon entanglement and hybrid entanglement between particle-like and wave-like optical qubits, and then demonstrate the heralded creation of hybrid entanglement at a distance by using a specific Bell-state measurement. This ability opens up the prospect of connecting heterogeneous nodes of a network, with the promise of increased integration and novel functionalities. △ Less

Submitted 13 April, 2021; v1 submitted 24 March, 2020; originally announced March 2020.

Journal ref: Science Advances 6, eaba4508 (2020)

arXiv:2003.09393 [pdf, other]

Block-level Double JPEG Compression Detection for Image Forgery Localization

Authors: Vinay Verma, Deepak Singh, Nitin Khanna

Abstract: Forged images have a ubiquitous presence in today's world due to ease of availability of image manipulation tools. In this letter, we propose a deep learning-based novel approach which utilizes the inherent relationship between DCT coefficient histograms and corresponding quantization step sizes to distinguish between original and forged regions in a JPEG image, based on the detection of single an… ▽ More Forged images have a ubiquitous presence in today's world due to ease of availability of image manipulation tools. In this letter, we propose a deep learning-based novel approach which utilizes the inherent relationship between DCT coefficient histograms and corresponding quantization step sizes to distinguish between original and forged regions in a JPEG image, based on the detection of single and double compressed blocks, without fully decompressing the JPEG image. We consider a diverse set of 1,120 quantization matrices collected in a recent study as compared to standard 100 quantization matrices for training, testing, and creating realistic forgeries. In particular, we carefully design the input to DenseNet with a specific combination of quantization step sizes and the respective histograms for a JPEG block. Using this input to learn the compression artifacts produces state-of-the-art results for the detection of single and double compressed blocks of sizes $256 \times 256$ and gives better results for smaller blocks of sizes $128 \times 128$ and $64 \times 64$. Consequently, improved forgery localization performances are obtained on realistic forged images. Also, in the case of test blocks compressed with completely different quantization matrices as compared to matrices used in training, the proposed method outperforms the current state-of-the-art. △ Less

Submitted 20 March, 2020; originally announced March 2020.

arXiv:2002.12858 [pdf, other]

doi 10.1063/5.0006221

Superconducting microwire detectors with single-photon sensitivity in the near-infrared

Authors: Jeff Chiles, Sonia M. Buckley, Adriana Lita, Varun B. Verma, Jason Allmaras, Boris Korzh, Matthew D. Shaw, Jeffrey M. Shainline, Richard P. Mirin, Sae Woo Nam

Abstract: We report on the fabrication and characterization of single-photon-sensitive WSi superconducting detectors with wire widths from 1 μm to 3 μm. The devices achieve saturated internal detection efficiency at 1.55 μm wavelength and exhibit maximum count rates in excess of 10^5 s^-1. We also investigate the material properties of the silicon-rich WSi films used for these devices. We find that many dev… ▽ More We report on the fabrication and characterization of single-photon-sensitive WSi superconducting detectors with wire widths from 1 μm to 3 μm. The devices achieve saturated internal detection efficiency at 1.55 μm wavelength and exhibit maximum count rates in excess of 10^5 s^-1. We also investigate the material properties of the silicon-rich WSi films used for these devices. We find that many devices with active lengths of several hundred microns exhibit critical currents in excess of 50% of the depairing current. A meandered detector with 2.0 μm wire width is demonstrated over a surface area of 362x362 μm^2, showcasing the material and device quality achieved. △ Less

Submitted 28 February, 2020; originally announced February 2020.

arXiv:2001.06657 [pdf, other]

Stacked Adversarial Network for Zero-Shot Sketch based Image Retrieval

Authors: Anubha Pandey, Ashish Mishra, Vinay Kumar Verma, Anurag Mittal, Hema A. Murthy

Abstract: Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that the data of all the classes are available during training. The assumption may not always be practical since the data of a few classes may be unavailable, or the classes may not appear at the time of training. Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to handle previous… ▽ More Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that the data of all the classes are available during training. The assumption may not always be practical since the data of a few classes may be unavailable, or the classes may not appear at the time of training. Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to handle previously unseen classes during the test. This paper proposes a generative approach based on the Stacked Adversarial Network (SAN) and the advantage of Siamese Network (SN) for ZS-SBIR. While SAN generates a high-quality sample, SN learns a better distance metric compared to that of the nearest neighbor search. The capability of the generative model to synthesize image features based on the sketch reduces the SBIR problem to that of an image-to-image retrieval problem. We evaluate the efficacy of our proposed approach on TU-Berlin, and Sketchy database in both standard ZSL and generalized ZSL setting. The proposed method yields a significant improvement in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL) for SBIR. △ Less

Submitted 18 January, 2020; originally announced January 2020.

Comments: Accepted in WACV'2020

arXiv:2001.05545 [pdf, other]

A "Network Pruning Network" Approach to Deep Model Compression

Authors: Vinay Kumar Verma, Pravendra Singh, Vinay P. Namboodiri, Piyush Rai

Abstract: We present a filter pruning approach for deep model compression, using a multitask network. Our approach is based on learning a a pruner network to prune a pre-trained target network. The pruner is essentially a multitask deep neural network with binary outputs that help identify the filters from each layer of the original network that do not have any significant contribution to the model and can… ▽ More We present a filter pruning approach for deep model compression, using a multitask network. Our approach is based on learning a a pruner network to prune a pre-trained target network. The pruner is essentially a multitask deep neural network with binary outputs that help identify the filters from each layer of the original network that do not have any significant contribution to the model and can therefore be pruned. The pruner network has the same architecture as the original network except that it has a multitask/multi-output last layer containing binary-valued outputs (one per filter), which indicate which filters have to be pruned. The pruner's goal is to minimize the number of filters from the original network by assigning zero weights to the corresponding output feature-maps. In contrast to most of the existing methods, instead of relying on iterative pruning, our approach can prune the network (original network) in one go and, moreover, does not require specifying the degree of pruning for each layer (and can learn it instead). The compressed model produced by our approach is generic and does not need any special hardware/software support. Moreover, augmenting with other methods such as knowledge distillation, quantization, and connection pruning can increase the degree of compression for the proposed approach. We show the efficacy of our proposed approach for classification and object detection tasks. △ Less

Submitted 15 January, 2020; originally announced January 2020.

Comments: Accepted in WACV'20

arXiv:2001.03052 [pdf]

Methodical engineering of defects in Mn$_x$Zn$_{1-x}$ O($x$ = 0.03, and 0.05) nanostructures by electron beam for nonlinear optical applications: A new insight

Authors: Albin Antony, P. Poornesh, I. V. Kityk, K. Ozga, J. Jedryka, Reji Philip, Ganesh Sanjeev, Vikash Chandra Petwal, Vijay Pal Verma, Jishnu Dwivedi

Abstract: A series of MnxZn1-xO (x=0.03, 0.05) nanostructures have been grown via the solution based chemical spray pyrolysis technique. Electron beam induced modifications on structural, linear and nonlinear optical and surface morphological properties have been studied and elaborated. GXRD (glancing angle X-ray diffraction) patterns show sharp diffraction peaks matching with the hexagonal wurtzite structu… ▽ More A series of MnxZn1-xO (x=0.03, 0.05) nanostructures have been grown via the solution based chemical spray pyrolysis technique. Electron beam induced modifications on structural, linear and nonlinear optical and surface morphological properties have been studied and elaborated. GXRD (glancing angle X-ray diffraction) patterns show sharp diffraction peaks matching with the hexagonal wurtzite structure of ZnO thin films. The upsurge in ebeam dosage resulted in the shifting of XRD peaks (101) and (002) towards lower angle side, and increase in FWHM value. Gaussian deconvolution on PL spectra reveals the quenching of defect centers, implying the role of electron beam irradiation regulating luminescence and defect centers in the nanostructures. Irradiation induced spatial confinement and phonon localization effects have been observed in the films via micro Raman studies. The later are evident from spectral peak shifts and broadening. Detailed investigations on the effect of electron beam irradiation on third order nonlinear optical properties under continuous and pulsed mode of laser operation regimes are deliberated. Third order absorptive nonlinearity of the nanostructures evaluated using the open aperture Z-scan technique in both continuous and pulsed laser regimes shows strong nonlinear absorption coefficient \b{eta} eff of the order 10-4 cm/W confirming their suitability for passive optical limiting applications under intense radiation environments. Laser induced third harmonic generation (LITHG) experiment results supports the significant variation in nonlinearities upon electron beam irradiation, and the effect can be utilized for frequency conversion mechanisms in high power laser sources and UV light emitters. △ Less

Submitted 8 January, 2020; originally announced January 2020.

arXiv:1912.11570 [pdf, other]

SketchTransfer: A Challenging New Task for Exploring Detail-Invariance and the Abstractions Learned by Deep Networks

Authors: Alex Lamb, Sherjil Ozair, Vikas Verma, David Ha

Abstract: Deep networks have achieved excellent results in perceptual tasks, yet their ability to generalize to variations not seen during training has come under increasing scrutiny. In this work we focus on their ability to have invariance towards the presence or absence of details. For example, humans are able to watch cartoons, which are missing many visual details, without being explicitly trained to d… ▽ More Deep networks have achieved excellent results in perceptual tasks, yet their ability to generalize to variations not seen during training has come under increasing scrutiny. In this work we focus on their ability to have invariance towards the presence or absence of details. For example, humans are able to watch cartoons, which are missing many visual details, without being explicitly trained to do so. As another example, 3D rendering software is a relatively recent development, yet people are able to understand such rendered scenes even though they are missing details (consider a film like Toy Story). The failure of machine learning algorithms to do this indicates a significant gap in generalization between human abilities and the abilities of deep networks. We propose a dataset that will make it easier to study the detail-invariance problem concretely. We produce a concrete task for this: SketchTransfer, and we show that state-of-the-art domain transfer algorithms still struggle with this task. The state-of-the-art technique which achieves over 95\% on MNIST $\xrightarrow{}$ SVHN transfer only achieves 59\% accuracy on the SketchTransfer task, which is much better than random (11\% accuracy) but falls short of the 87\% accuracy of a classifier trained directly on labeled sketches. This indicates that this task is approachable with today's best methods but has substantial room for improvement. △ Less

Submitted 24 December, 2019; originally announced December 2019.

Comments: Accepted WACV 2020

arXiv:1910.07625 [pdf, other]

doi 10.1016/j.ocemod.2019.05.004

The submesoscale, the finescale and their interaction at a mixed layer front

Authors: Vicky Verma, Hieu T. Pham, Sutanu Sarkar

Abstract: The spindown of a geostrophically balanced density front in an upper-ocean mixed layer is simulated with a large eddy simulation (LES) model that resolves O(1000) m down to O(1) m scale. Our goal is to examine the interaction between the submesoscale and the turbulent finescale, and better characterize vertical transport, frontogenesis and dissipative processes. The flow passes through symmetric a… ▽ More The spindown of a geostrophically balanced density front in an upper-ocean mixed layer is simulated with a large eddy simulation (LES) model that resolves O(1000) m down to O(1) m scale. Our goal is to examine the interaction between the submesoscale and the turbulent finescale, and better characterize vertical transport, frontogenesis and dissipative processes. The flow passes through symmetric and baroclinic instabilities, spawns vortex filaments of O(100) m thickness as well as larger eddies, and develops turbulence that is spatially localized and organized. A O(100) m physical-space filter is applied to the simulated flow to separate the coherent submesoscale from the finescale in a decomposition that preserves their spatial organization unlike the typical practice of along-front averaging. Analysis of the submesoscale vertical velocity (as large as 5 mm/s) reveals that downwelling is limited to the thin vortex filaments while upwelling occurs over spatially extensive regions in the eddies, resulting in an overall buoyancy flux that is restratifying. The kinetic energy (KE) transport equations are evaluated separately at both the scales to understand energy pathways in this problem. The buoyancy flux associated with coherent motions acts as the primary source of submesoscale KE which is then transported across the front with a fraction transferred to the finescale. The transfer, limited to thin regions of O(100) m horizontal width, is accomplished by primarily horizontal strain in the upper 10 m and by vertical shear in the rest of the 50 m deep mixed layer. Frontogenetic mechanisms are diagnosed through analysis of the transport equation for squared buoyancy gradient. Horizontal strain is the primary frontogenetic term and is counteracted primarily by horizontal diffusion in the top 10 m and by the horizontal gradient of vertical velocity further below. △ Less

Submitted 16 October, 2019; originally announced October 2019.

Journal ref: Ocean Modelling 140 (2019) 101400

arXiv:1909.11715 [pdf, other]

GraphMix: Improved Training of GNNs for Semi-Supervised Learning

Authors: Vikas Verma, Meng Qu, Kenji Kawaguchi, Alex Lamb, Yoshua Bengio, Juho Kannala, Jian Tang

Abstract: We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization. Further, we provide a theoretical analysis of how GraphMix improves the generalization bounds of the underlying graph neural networ… ▽ More We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization. Further, we provide a theoretical analysis of how GraphMix improves the generalization bounds of the underlying graph neural network, without making any assumptions about the "aggregation" layer or the depth of the graph neural networks. We experimentally validate this analysis by applying GraphMix to various architectures such as Graph Convolutional Networks, Graph Attention Networks and Graph-U-Net. Despite its simplicity, we demonstrate that GraphMix can consistently improve or closely match state-of-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets: Cora-Full, Co-author-CS and Co-author-Physics. △ Less

Submitted 8 October, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

Comments: https://github.com/vikasverma1077/GraphMix

arXiv:1909.04344 [pdf, other]

A Meta-Learning Framework for Generalized Zero-Shot Learning

Authors: Vinay Kumar Verma, Dhanajit Brahma, Piyush Rai

Abstract: Learning to classify unseen class samples at test time is popularly referred to as zero-shot learning (ZSL). If test samples can be from training (seen) as well as unseen classes, it is a more challenging problem due to the existence of strong bias towards seen classes. This problem is generally known as \emph{generalized} zero-shot learning (GZSL). Thanks to the recent advances in generative mode… ▽ More Learning to classify unseen class samples at test time is popularly referred to as zero-shot learning (ZSL). If test samples can be from training (seen) as well as unseen classes, it is a more challenging problem due to the existence of strong bias towards seen classes. This problem is generally known as \emph{generalized} zero-shot learning (GZSL). Thanks to the recent advances in generative models such as VAEs and GANs, sample synthesis based approaches have gained considerable attention for solving this problem. These approaches are able to handle the problem of class bias by synthesizing unseen class samples. However, these ZSL/GZSL models suffer due to the following key limitations: $(i)$ Their training stage learns a class-conditioned generator using only \emph{seen} class data and the training stage does not \emph{explicitly} learn to generate the unseen class samples; $(ii)$ They do not learn a generic optimal parameter which can easily generalize for both seen and unseen class generation; and $(iii)$ If we only have access to a very few samples per seen class, these models tend to perform poorly. In this paper, we propose a meta-learning based generative model that naturally handles these limitations. The proposed model is based on integrating model-agnostic meta learning with a Wasserstein GAN (WGAN) to handle $(i)$ and $(iii)$, and uses a novel task distribution to handle $(ii)$. Our proposed model yields significant improvements on standard ZSL as well as more challenging GZSL setting. In ZSL setting, our model yields 4.5\%, 6.0\%, 9.8\%, and 27.9\% relative improvements over the current state-of-the-art on CUB, AWA1, AWA2, and aPY datasets, respectively. △ Less

Submitted 10 September, 2019; originally announced September 2019.

Comments: Under Submission

arXiv:1909.02915 [pdf]

doi 10.1103/PhysRevB.101.060508

Strong suppression of the resistivity near the transition to superconductivity in narrow micro-bridges in external magnetic fields

Authors: Xiaofu Zhang, Adriana E. Lita, Konstantin Smirnov, HuanLong Liu, Dong Zhu, Varun B. Verma, Sae Woo Nam, Andreas Schilling

Abstract: We have investigated a series of superconducting bridges based on homogeneous amorphous WSi and MoSi films, with bridge widths w ranging from 2 um to 1000 um and film thicknesses d ~ 4-6 nm and 100 nm. Upon decreasing the bridge widths below the respective Pearl lengths, we observe in all cases distinct changes in the characteristics of the resistive transitions to superconductivity. For each of t… ▽ More We have investigated a series of superconducting bridges based on homogeneous amorphous WSi and MoSi films, with bridge widths w ranging from 2 um to 1000 um and film thicknesses d ~ 4-6 nm and 100 nm. Upon decreasing the bridge widths below the respective Pearl lengths, we observe in all cases distinct changes in the characteristics of the resistive transitions to superconductivity. For each of the films, the resistivity curves R(B,T) separate at a well-defined and field-dependent temperature T*(B) with decreasing the temperature, resulting in a dramatic suppression of the resistivity and a sharpening of the transitions with decreasing bridge width w. The associated excess conductivity in all the bridges scales as 1/w, which may suggest the presence of a highly conducting region that is dominating the electric transport in narrow bridges. We argue that this effect can only be observed in materials with sufficiently weak vortex pinning. △ Less

Submitted 21 February, 2020; v1 submitted 6 September, 2019; originally announced September 2019.

Journal ref: Phys. Rev. B 101, 060508 (2020)

arXiv:1908.10520 [pdf, other]

doi 10.1364/OE.27.035279

A kilopixel array of superconducting nanowire single-photon detectors

Authors: Emma E. Wollman, Varun B. Verma, Adriana E. Lita, William H. Farr, Matthew D. Shaw, Richard P. Mirin, Sae Woo Nam

Abstract: We present a 1024-element imaging array of superconducting nanowire single photon detectors (SNSPDs) using a 32x32 row-column multiplexing architecture. Large arrays are desirable for applications such as imaging, spectroscopy, or particle detection. We present a 1024-element imaging array of superconducting nanowire single photon detectors (SNSPDs) using a 32x32 row-column multiplexing architecture. Large arrays are desirable for applications such as imaging, spectroscopy, or particle detection. △ Less

Submitted 27 August, 2019; originally announced August 2019.

arXiv:1908.01000 [pdf, other]

InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization

Authors: Fan-Yun Sun, Jordan Hoffmann, Vikas Verma, Jian Tang

Abstract: This paper studies learning the representations of whole graphs in both unsupervised and semi-supervised scenarios. Graph-level representations are critical in a variety of real-world applications such as predicting the properties of molecules and community analysis in social networks. Traditional graph kernel based methods are simple, yet effective for obtaining fixed-length representations for g… ▽ More This paper studies learning the representations of whole graphs in both unsupervised and semi-supervised scenarios. Graph-level representations are critical in a variety of real-world applications such as predicting the properties of molecules and community analysis in social networks. Traditional graph kernel based methods are simple, yet effective for obtaining fixed-length representations for graphs but they suffer from poor generalization due to hand-crafted designs. There are also some recent methods based on language models (e.g. graph2vec) but they tend to only consider certain substructures (e.g. subtrees) as graph representatives. Inspired by recent progress of unsupervised representation learning, in this paper we proposed a novel method called InfoGraph for learning graph-level representations. We maximize the mutual information between the graph-level representation and the representations of substructures of different scales (e.g., nodes, edges, triangles). By doing so, the graph-level representations encode aspects of the data that are shared across different scales of substructures. Furthermore, we further propose InfoGraph*, an extension of InfoGraph for semi-supervised scenarios. InfoGraph* maximizes the mutual information between unsupervised graph representations learned by InfoGraph and the representations learned by existing supervised methods. As a result, the supervised encoder learns from unlabeled data while preserving the latent semantic space favored by the current supervised task. Experimental results on the tasks of graph classification and molecular property prediction show that InfoGraph is superior to state-of-the-art baselines and InfoGraph* can achieve performance competitive with state-of-the-art semi-supervised models. △ Less

Submitted 17 January, 2020; v1 submitted 31 July, 2019; originally announced August 2019.

Comments: ICLR 2020 (spotlight)

arXiv:1907.07287 [pdf, other]

Towards Understanding Generalization in Gradient-Based Meta-Learning

Authors: Simon Guiroy, Vikas Verma, Christopher Pal

Abstract: In this work we study generalization of neural networks in gradient-based meta-learning by analyzing various properties of the objective landscapes. We experimentally demonstrate that as meta-training progresses, the meta-test solutions, obtained after adapting the meta-train solution of the model, to new tasks via few steps of gradient-based fine-tuning, become flatter, lower in loss, and further… ▽ More In this work we study generalization of neural networks in gradient-based meta-learning by analyzing various properties of the objective landscapes. We experimentally demonstrate that as meta-training progresses, the meta-test solutions, obtained after adapting the meta-train solution of the model, to new tasks via few steps of gradient-based fine-tuning, become flatter, lower in loss, and further away from the meta-train solution. We also show that those meta-test solutions become flatter even as generalization starts to degrade, thus providing an experimental evidence against the correlation between generalization and flat minima in the paradigm of gradient-based meta-leaning. Furthermore, we provide empirical evidence that generalization to new tasks is correlated with the coherence between their adaptation trajectories in parameter space, measured by the average cosine similarity between task-specific trajectory directions, starting from a same meta-train solution. We also show that coherence of meta-test gradients, measured by the average inner product between the task-specific gradient vectors evaluated at meta-train solution, is also correlated with generalization. Based on these observations, we propose a novel regularizer for MAML and provide experimental evidence for its effectiveness. △ Less

Submitted 16 July, 2019; originally announced July 2019.

arXiv:1906.06784 [pdf]

doi 10.1016/j.neunet.2022.07.012

Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Too Much Accuracy

Authors: Alex Lamb, Vikas Verma, Kenji Kawaguchi, Alexander Matyasko, Savya Khosla, Juho Kannala, Yoshua Bengio

Abstract: Adversarial robustness has become a central goal in deep learning, both in the theory and the practice. However, successful methods to improve the adversarial robustness (such as adversarial training) greatly hurt generalization performance on the unperturbed data. This could have a major impact on how the adversarial robustness affects real world systems (i.e. many may opt to forego robustness if… ▽ More Adversarial robustness has become a central goal in deep learning, both in the theory and the practice. However, successful methods to improve the adversarial robustness (such as adversarial training) greatly hurt generalization performance on the unperturbed data. This could have a major impact on how the adversarial robustness affects real world systems (i.e. many may opt to forego robustness if it can improve accuracy on the unperturbed data). We propose Interpolated Adversarial Training, which employs recently proposed interpolation based training methods in the framework of adversarial training. On CIFAR-10, adversarial training increases the standard test error (when there is no adversary) from 4.43% to 12.32%, whereas with our Interpolated adversarial training we retain the adversarial robustness while achieving a standard test error of only 6.45%. With our technique, the relative increase in the standard error for the robust model is reduced from 178.1% to just 45.5%. Moreover, we provide mathematical analysis of Interpolated Adversarial Training to confirm its efficiencies and demonstrate its advantages in terms of robustness and generalization. △ Less

Submitted 19 October, 2022; v1 submitted 16 June, 2019; originally announced June 2019.

Comments: This is the latest version, which is published in the Journal, "Neural Networks", in 2022. All the previous results are unchanged. First two authors contributed equally

Journal ref: Neural Networks, volume 154, pages 218-233 (2022)

arXiv:1906.03038 [pdf, other]

A Generative Framework for Zero-Shot Learning with Adversarial Domain Adaptation

Authors: Varun Khare, Divyat Mahajan, Homanga Bharadhwaj, Vinay Verma, Piyush Rai

Abstract: We present a domain adaptation based generative framework for zero-shot learning. Our framework addresses the problem of domain shift between the seen and unseen class distributions in zero-shot learning and minimizes the shift by develo** a generative model trained via adversarial domain adaptation. Our approach is based on end-to-end learning of the class distributions of seen classes and unse… ▽ More We present a domain adaptation based generative framework for zero-shot learning. Our framework addresses the problem of domain shift between the seen and unseen class distributions in zero-shot learning and minimizes the shift by develo** a generative model trained via adversarial domain adaptation. Our approach is based on end-to-end learning of the class distributions of seen classes and unseen classes. To enable the model to learn the class distributions of unseen classes, we parameterize these class distributions in terms of the class attribute information (which is available for both seen and unseen classes). This provides a very simple way to learn the class distribution of any unseen class, given only its class attribute information, and no labeled training data. Training this model with adversarial domain adaptation further provides robustness against the distribution mismatch between the data from seen and unseen classes. Our approach also provides a novel way for training neural net based classifiers to overcome the hubness problem in zero-shot learning. Through a comprehensive set of experiments, we show that our model yields superior accuracies as compared to various state-of-the-art zero shot learning models, on a variety of benchmark datasets. Code for the experiments is available at github.com/vkkhare/ZSL-ADA △ Less

Submitted 22 February, 2020; v1 submitted 7 June, 2019; originally announced June 2019.

Comments: Proceedings of Winter Conference on Applications of Computer Vision (WACV) 2020

arXiv:1905.08184 [pdf, other]

doi 10.1103/PhysRevResearch.2.013039

Entanglement and non-locality between disparate solid-state quantum memories mediated by photons

Authors: Marcel. li Grimau Puigibert, Mohsen Falamarzi Askarani, Jacob H. Davidson, Varun B. Verma, Matthew D. Shaw, Sae Woo Nam, Thomas Lutz, Gustavo C. Amaral, Daniel Oblak, Wolfgang Tittel

Abstract: Entangling quantum systems with different characteristics through the exchange of photons is a prerequisite for building future quantum networks. Proving the presence of entanglement between quantum memories for light working at different wavelengths furthers this goal. Here, we report on a series of experiments with a thulium-doped crystal, serving as a quantum memory for 794 nm photons, an erbiu… ▽ More Entangling quantum systems with different characteristics through the exchange of photons is a prerequisite for building future quantum networks. Proving the presence of entanglement between quantum memories for light working at different wavelengths furthers this goal. Here, we report on a series of experiments with a thulium-doped crystal, serving as a quantum memory for 794 nm photons, an erbium-doped fibre, serving as a quantum memory for telecommunication-wavelength photons at 1535 nm, and a source of photon pairs created via spontaneous parametric down-conversion. Characterizing the photons after re-emission from the two memories, we find non-classical correlations with a cross-correlation coefficient of $g^{(2)}_{12} = 53\pm8$; entanglement preserving storage with input-output fidelity of $\mathcal{F}_{IO}\approx93\pm2\%$; and non-locality featuring a violation of the Clauser-Horne-Shimony-Holt Bell-inequality with $S= 2.6\pm0.2$. Our proof-of-principle experiment shows that entanglement persists while propagating through different solid-state quantum memories operating at different wavelengths. △ Less

Submitted 21 May, 2019; v1 submitted 20 May, 2019; originally announced May 2019.

Comments: 3 figures in main-text and 5 figures in Supplemental Material

Journal ref: Phys. Rev. Research 2, 013039 (2020)

arXiv:1905.04446 [pdf, other]

Play and Prune: Adaptive Filter Pruning for Deep Model Compression

Authors: Pravendra Singh, Vinay Kumar Verma, Piyush Rai, Vinay P. Namboodiri

Abstract: While convolutional neural networks (CNN) have achieved impressive performance on various classification/recognition tasks, they typically consist of a massive number of parameters. This results in significant memory requirement as well as computational overheads. Consequently, there is a growing need for filter-level pruning approaches for compressing CNN based models that not only reduce the tot… ▽ More While convolutional neural networks (CNN) have achieved impressive performance on various classification/recognition tasks, they typically consist of a massive number of parameters. This results in significant memory requirement as well as computational overheads. Consequently, there is a growing need for filter-level pruning approaches for compressing CNN based models that not only reduce the total number of parameters but reduce the overall computation as well. We present a new min-max framework for filter-level pruning of CNNs. Our framework, called Play and Prune (PP), jointly prunes and fine-tunes CNN model parameters, with an adaptive pruning rate, while maintaining the model's predictive performance. Our framework consists of two modules: (1) An adaptive filter pruning (AFP) module, which minimizes the number of filters in the model; and (2) A pruning rate controller (PRC) module, which maximizes the accuracy during pruning. Moreover, unlike most previous approaches, our approach allows directly specifying the desired error tolerance instead of pruning level. Our compressed models can be deployed at run-time, without requiring any special libraries or hardware. Our approach reduces the number of parameters of VGG-16 by an impressive factor of 17.5X, and number of FLOPS by 6.43X, with no loss of accuracy, significantly outperforming other state-of-the-art filter pruning methods. △ Less

Submitted 11 May, 2019; originally announced May 2019.

Comments: International Joint Conference on Artificial Intelligence (IJCAI-2019)

arXiv:1905.01698 [pdf, other]

doi 10.1103/PhysRevApplied.12.054054

Tunable quantum beat of single photons enabled by nonlinear nanophotonics

Authors: Qing Li, Anshuman Singh, Xiyuan Lu, John Lawall, Varun Verma, Richard Mirin, Sae Woo Nam, Kartik Srinivasan

Abstract: We demonstrate the tunable quantum beat of single photons through the co-development of core nonlinear nanophotonic technologies for frequency-domain manipulation of quantum states in a common physical platform. Spontaneous four-wave mixing in a nonlinear resonator is used to produce non-degenerate, quantum-correlated photon pairs. One photon from each pair is then frequency shifted, without degra… ▽ More We demonstrate the tunable quantum beat of single photons through the co-development of core nonlinear nanophotonic technologies for frequency-domain manipulation of quantum states in a common physical platform. Spontaneous four-wave mixing in a nonlinear resonator is used to produce non-degenerate, quantum-correlated photon pairs. One photon from each pair is then frequency shifted, without degradation of photon statistics, using four-wave mixing Bragg scattering in a second nonlinear resonator. Fine tuning of the applied frequency shift enables tunable quantum interference of the two photons as they are im**ed on a beamsplitter, with an oscillating signature that depends on their frequency difference. Our work showcases the potential of nonlinear nanophotonic devices as a valuable resource for photonic quantum information science. △ Less

Submitted 5 May, 2019; originally announced May 2019.

Journal ref: Phys. Rev. Applied 12, 054054 (2019)

arXiv:1905.00465 [pdf, other]

Measurement-device-independent quantum key distribution coexisting with classical communication

Authors: Raju Valivarthi, Prathwiraj Umesh, Caleb John, Kimberley A. Owen, Varun B. Verma, Sae Woo Nam, Daniel Oblak, Qiang Zhou, Wolfgang Tittel

Abstract: The possibility for quantum and classical communication to coexist on the same fibre is important for deployment and widespread adoption of quantum key distribution (QKD) and, more generally, a future quantum internet. While coexistence has been demonstrated for different QKD implementations, a comprehensive investigation for measurement-device independent (MDI) QKD -- a recently proposed QKD prot… ▽ More The possibility for quantum and classical communication to coexist on the same fibre is important for deployment and widespread adoption of quantum key distribution (QKD) and, more generally, a future quantum internet. While coexistence has been demonstrated for different QKD implementations, a comprehensive investigation for measurement-device independent (MDI) QKD -- a recently proposed QKD protocol that cannot be broken by quantum hacking that targets vulnerabilities of single-photon detectors -- is still missing. Here we experimentally demonstrate that MDI-QKD can operate simultaneously with at least five 10 Gbps bidirectional classical communication channels operating at around 1550 nm wavelength and over 40 km of spooled fibre, and we project communication rates in excess of 10 THz when moving the quantum channel from the third to the second telecommunication window. The similarity of MDI-QKD with quantum repeaters suggests that classical and generalised quantum networks can co-exist on the same fibre infrastructure. △ Less

Submitted 1 May, 2019; originally announced May 2019.

Comments: 12 pages, 3 figures

arXiv:1904.12041 [pdf, other]

doi 10.1364/OPTICA.6.000563

Quantum Frequency Conversion of a Quantum Dot Single-Photon Source on a Nanophotonic Chip

Authors: Anshuman Singh, Qing Li, Shunfa Liu, Ying Yu, Xiyuan Lu, Christian Schneider, Sven Höfling, John Lawall, Varun Verma, Richard Mirin, Sae Woo Nam, ** Liu, Kartik Srinivasan

Abstract: Single self-assembled InAs/GaAs quantum dots are promising bright sources of indistinguishable photons for quantum information science. However, their distribution in emission wavelength, due to inhomogeneous broadening inherent to their growth, has limited the ability to create multiple identical sources. Quantum frequency conversion can overcome this issue, particularly if implemented using scal… ▽ More Single self-assembled InAs/GaAs quantum dots are promising bright sources of indistinguishable photons for quantum information science. However, their distribution in emission wavelength, due to inhomogeneous broadening inherent to their growth, has limited the ability to create multiple identical sources. Quantum frequency conversion can overcome this issue, particularly if implemented using scalable chip-integrated technologies. Here, we report the first demonstration of quantum frequency conversion of a quantum dot single-photon source on a silicon nanophotonic chip. Single photons from a quantum dot in a micropillar cavity are shifted in wavelength with an on-chip conversion efficiency of approximately 12 %, limited by the linewidth of the quantum dot photons. The intensity autocorrelation function g(2)(tau) for the frequency-converted light is antibunched with g(2)(0) = 0.290 +/- 0.030, compared to the before-conversion value g(2)(0) = 0.080 +/- 0.003. We demonstrate the suitability of our frequency conversion interface as a resource for quantum dot sources by characterizing its effectiveness across a wide span of input wavelengths (840 nm to 980 nm), and its ability to achieve tunable wavelength shifts difficult to obtain by other approaches. △ Less

Submitted 26 April, 2019; originally announced April 2019.

Comments: Main text + supplementary information

Journal ref: Optica, vol.6, no.5, pp. 563-569 (2019)

arXiv:1904.08596 [pdf, other]

doi 10.1103/PhysRevB.100.054520

Determining the depairing current in superconducting nanowire single-photon detectors

Authors: S. Frasca, B. Korzh, M. Colangelo, D. Zhu, A. E. Lita, J. P. Allmaras, E. E. Wollman, V. B. Verma, A. E. Dane, E. Ramirez, A. D. Beyer, S. W. Nam, A. G. Kozorezov, M. D. Shaw, K. K. Berggren

Abstract: We estimate the depairing current of superconducting nanowire single photon detectors (SNSPDs) by studying the dependence of the nanowires kinetic inductance on their bias current. The kinetic inductance is determined by measuring the resonance frequency of resonator style nanowire coplanar waveguides both in transmission and reflection configurations. Bias current dependent shifts in the measured… ▽ More We estimate the depairing current of superconducting nanowire single photon detectors (SNSPDs) by studying the dependence of the nanowires kinetic inductance on their bias current. The kinetic inductance is determined by measuring the resonance frequency of resonator style nanowire coplanar waveguides both in transmission and reflection configurations. Bias current dependent shifts in the measured resonant frequency correspond to the change in the kinetic inductance, which can be compared with theoretical predictions. We demonstrate that the fast relaxation model described in the literature accurately matches our experimental data and provides a valuable tool for direct determination of the depairing current. Accurate and direct measurement of the depairing current is critical for nanowire quality analysis, as well as modeling efforts aimed at understanding the detection mechanism in SNSPDs. △ Less

Submitted 18 April, 2019; originally announced April 2019.

Journal ref: Phys. Rev. B 100, 054520 (2019)

arXiv:1904.08542 [pdf, other]

Generative Model for Zero-Shot Sketch-Based Image Retrieval

Authors: Vinay Kumar Verma, Aakansha Mishra, Ashish Mishra, Piyush Rai

Abstract: We present a probabilistic model for Sketch-Based Image Retrieval (SBIR) where, at retrieval time, we are given sketches from novel classes, that were not present at training time. Existing SBIR methods, most of which rely on learning class-wise correspondences between sketches and images, typically work well only for previously seen sketch classes, and result in poor retrieval performance on nove… ▽ More We present a probabilistic model for Sketch-Based Image Retrieval (SBIR) where, at retrieval time, we are given sketches from novel classes, that were not present at training time. Existing SBIR methods, most of which rely on learning class-wise correspondences between sketches and images, typically work well only for previously seen sketch classes, and result in poor retrieval performance on novel classes. To address this, we propose a generative model that learns to generate images, conditioned on a given novel class sketch. This enables us to reduce the SBIR problem to a standard image-to-image search problem. Our model is based on an inverse auto-regressive flow based variational autoencoder, with a feedback mechanism to ensure robust image generation. We evaluate our model on two very challenging datasets, Sketchy, and TU Berlin, with novel train-test split. The proposed approach significantly outperforms various baselines on both the datasets. △ Less

Submitted 17 April, 2019; originally announced April 2019.

Comments: Accepted at CVPR-Workshop 2019

arXiv:1903.10461 [pdf, other]

An ultrahigh-impedance superconducting thermal switch for interfacing superconductors to semiconductors and optoelectronics

Authors: A. N. McCaughan, V. B. Verma, S. Buckley, J. P. Allmaras, A. G. Kozorezov, A. N. Tait, S. W. Nam, J. M. Shainline

Abstract: A number of current approaches to quantum and neuromorphic computing use superconductors as the basis of their platform or as a measurement component, and will need to operate at cryogenic temperatures. Semiconductor systems are typically proposed as a top-level control in these architectures, with low-temperature passive components and intermediary superconducting electronics acting as the direct… ▽ More A number of current approaches to quantum and neuromorphic computing use superconductors as the basis of their platform or as a measurement component, and will need to operate at cryogenic temperatures. Semiconductor systems are typically proposed as a top-level control in these architectures, with low-temperature passive components and intermediary superconducting electronics acting as the direct interface to the lowest-temperature stages. The architectures, therefore, require a low-power superconductor-semiconductor interface, which is not currently available. Here we report a superconducting switch that is capable of translating low-voltage superconducting inputs directly into semiconductor-compatible (above 1,000 mV) outputs at kelvin-scale temperatures (1 K or 4 K). To illustrate the capabilities in interfacing superconductors and semiconductors, we use it to drive a light-emitting diode (LED) in a photonic integrated circuit, generating photons at 1 K from a low-voltage input and detecting them with an on-chip superconducting single-photon detector. We also characterize our device's timing response (less than 300 ps turn-on, 15 ns turn-off), output impedance (greater than 1 MΩ), and energy requirements (0.18 fJ/um^2, 3.24 mV/nW). △ Less

Submitted 30 September, 2019; v1 submitted 25 March, 2019; originally announced March 2019.

arXiv:1903.05101 [pdf, other]

doi 10.1103/PhysRevLett.123.151802

Detecting Dark Matter with Superconducting Nanowires

Authors: Yonit Hochberg, Ilya Charaev, Sae-Woo Nam, Varun Verma, Marco Colangelo, Karl K. Berggren

Abstract: We propose the use of superconducting nanowires as both target and sensor for direct detection of sub-GeV dark matter. With excellent sensitivity to small energy deposits on electrons, and demonstrated low dark counts, such devices could be used to probe electron recoils from dark matter scattering and absorption processes. We demonstrate the feasibility of this idea using measurements of an exist… ▽ More We propose the use of superconducting nanowires as both target and sensor for direct detection of sub-GeV dark matter. With excellent sensitivity to small energy deposits on electrons, and demonstrated low dark counts, such devices could be used to probe electron recoils from dark matter scattering and absorption processes. We demonstrate the feasibility of this idea using measurements of an existing fabricated tungsten-silicide nanowire prototype with 0.8 eV energy threshold and 4.3 nanograms with 10 thousand seconds of exposure, which showed no dark counts. The results from this device already place meaningful bounds on dark matter-electron interactions, including the strongest terrestrial bounds on sub-eV dark photon absorption to date. Future expected fabrication on larger scales and with lower thresholds should enable probing new territory in the direct detection landscape, establishing the complementarity of this approach to other existing proposals. △ Less

Submitted 12 March, 2019; originally announced March 2019.

Comments: 7 pages, 4 figures

Journal ref: Phys. Rev. Lett. 123, 151802 (2019)

arXiv:1903.04120 [pdf, other]

HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs

Authors: Pravendra Singh, Vinay Kumar Verma, Piyush Rai, Vinay P. Namboodiri

Abstract: We present a novel deep learning architecture in which the convolution operation leverages heterogeneous kernels. The proposed HetConv (Heterogeneous Kernel-Based Convolution) reduces the computation (FLOPs) and the number of parameters as compared to standard convolution operation while still maintaining representational efficiency. To show the effectiveness of our proposed convolution, we presen… ▽ More We present a novel deep learning architecture in which the convolution operation leverages heterogeneous kernels. The proposed HetConv (Heterogeneous Kernel-Based Convolution) reduces the computation (FLOPs) and the number of parameters as compared to standard convolution operation while still maintaining representational efficiency. To show the effectiveness of our proposed convolution, we present extensive experimental results on the standard convolutional neural network (CNN) architectures such as VGG \cite{vgg2014very} and ResNet \cite{resnet}. We find that after replacing the standard convolutional filters in these architectures with our proposed HetConv filters, we achieve 3X to 8X FLOPs based improvement in speed while still maintaining (and sometimes improving) the accuracy. We also compare our proposed convolutions with group/depth wise convolutions and show that it achieves more FLOPs reduction with significantly higher accuracy. △ Less

Submitted 25 March, 2019; v1 submitted 11 March, 2019; originally announced March 2019.

Comments: Accepted in CVPR 2019

arXiv:1903.03825 [pdf]

doi 10.1016/j.neunet.2021.10.008

Interpolation Consistency Training for Semi-Supervised Learning

Authors: Vikas Verma, Kenji Kawaguchi, Alex Lamb, Juho Kannala, Arno Solin, Yoshua Bengio, David Lopez-Paz

Abstract: We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density reg… ▽ More We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark datasets. Our theoretical analysis shows that ICT corresponds to a certain type of data-adaptive regularization with unlabeled points which reduces overfitting to labeled points under high confidence values. △ Less

Submitted 19 October, 2022; v1 submitted 9 March, 2019; originally announced March 2019.

Comments: This is the latest version, which is published in the Journal, "Neural Networks", in 2022. All the previous results are unchanged. Keyword: Deep Learning, Semi-supervised Learning, Mixup

Journal ref: Neural Networks, volume 145, pages 90-106 (2022)

arXiv:1903.02709 [pdf, other]

On Adversarial Mixup Resynthesis

Authors: Christopher Beckham, Sina Honari, Vikas Verma, Alex Lamb, Farnoosh Ghadiri, R Devon Hjelm, Yoshua Bengio, Christopher Pal

Abstract: In this paper, we explore new approaches to combining information encoded within the learned representations of auto-encoders. We explore models that are capable of combining the attributes of multiple inputs such that a resynthesised output is trained to fool an adversarial discriminator for real versus synthesised data. Furthermore, we explore the use of such an architecture in the context of se… ▽ More In this paper, we explore new approaches to combining information encoded within the learned representations of auto-encoders. We explore models that are capable of combining the attributes of multiple inputs such that a resynthesised output is trained to fool an adversarial discriminator for real versus synthesised data. Furthermore, we explore the use of such an architecture in the context of semi-supervised learning, where we learn a mixing function whose objective is to produce interpolations of hidden states, or masked combinations of latent representations that are consistent with a conditioned class label. We show quantitative and qualitative evidence that such a formulation is an interesting avenue of research. △ Less

Submitted 23 October, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

Comments: 'Camera-ready draft'

arXiv:1902.04059 [pdf, other]

High-Speed Low-Crosstalk Detection of a $^{171}$Yb$^+$ Qubit using Superconducting Nanowire Single Photon Detectors

Authors: Stephen Crain, Clinton Cahall, Geert Vrijsen, Emma E. Wollman, Matthew D. Shaw, Varun B. Verma, Sae Woo Nam, Jungsang Kim

Abstract: Qubits used in quantum computing tend to suffer from errors, either from the qubit interacting with the environment, or from imperfect control when quantum logic gates are applied. Fault-tolerant construction based on quantum error correcting codes (QECC) can be used to recover from such errors. Effective implementation of QECC requires a high fidelity readout of the ancilla qubits from which the… ▽ More Qubits used in quantum computing tend to suffer from errors, either from the qubit interacting with the environment, or from imperfect control when quantum logic gates are applied. Fault-tolerant construction based on quantum error correcting codes (QECC) can be used to recover from such errors. Effective implementation of QECC requires a high fidelity readout of the ancilla qubits from which the error syndrome can be determined, without affecting the data qubits in which relevant quantum information is stored for processing. Here, we present a detection scheme for \yb trapped ion qubits, where we use superconducting nanowire single photon detectors and utilize photon time-of-arrival statistics to improve the fidelity and speed. Qubit shuttling allows for creating a separate detection region where an ancilla qubit can be measured without disrupting a data qubit. We achieve an average qubit state detection time of 11$μ$s with a fidelity of $99.931(6)\%$. The error due to the detection crosstalk, defined as the probability that the coherence of the data qubit is lost due to the process of detecting an ancilla qubit, is reduced to $\sim2\times10^{-5}$ by creating a separation of 370$μ$m between them. △ Less

Submitted 19 May, 2019; v1 submitted 11 February, 2019; originally announced February 2019.

arXiv:1812.04251 [pdf, other]

doi 10.1103/PhysRevX.9.041013

Single-shot quantum memory advantage in the simulation of stochastic processes

Authors: Farzad Ghafari, Nora Tischler, Jayne Thompson, Mile Gu, Lynden K. Shalm, Varun B. Verma, Sae Woo Nam, Raj B. Patel, Howard M. Wiseman, Geoff J. Pryde

Abstract: Stochastic processes underlie a vast range of natural and social phenomena. Some processes such as atomic decay feature intrinsic randomness, whereas other complex processes, e.g. traffic congestion, are effectively probabilistic because we cannot track all relevant variables. To simulate a stochastic system's future behaviour, information about its past must be stored and thus memory is a key res… ▽ More Stochastic processes underlie a vast range of natural and social phenomena. Some processes such as atomic decay feature intrinsic randomness, whereas other complex processes, e.g. traffic congestion, are effectively probabilistic because we cannot track all relevant variables. To simulate a stochastic system's future behaviour, information about its past must be stored and thus memory is a key resource. Quantum information processing promises a memory advantage for stochastic simulation that has been validated in recent proof-of-concept experiments. Yet, in all past works, the memory saving would only become accessible in the limit of a large number of parallel simulations, because the memory registers of individual quantum simulators had the same dimensionality as their classical counterparts. Here, we report the first experimental demonstration that a quantum stochastic simulator can encode the relevant information in fewer dimensions than any classical simulator, thereby achieving a quantum memory advantage even for an individual simulator. Our photonic experiment thus establishes the potential of a new, practical resource saving in the simulation of complex systems. △ Less

Submitted 11 December, 2018; originally announced December 2018.

Journal ref: Phys. Rev. X 9, 041013 (2019)

arXiv:1811.10559 [pdf, other]

Leveraging Filter Correlations for Deep Model Compression

Authors: Pravendra Singh, Vinay Kumar Verma, Piyush Rai, Vinay P. Namboodiri

Abstract: We present a filter correlation based model compression approach for deep convolutional neural networks. Our approach iteratively identifies pairs of filters with the largest pairwise correlations and drops one of the filters from each such pair. However, instead of discarding one of the filters from each such pair naïvely, the model is re-optimized to make the filters in these pairs maximally cor… ▽ More We present a filter correlation based model compression approach for deep convolutional neural networks. Our approach iteratively identifies pairs of filters with the largest pairwise correlations and drops one of the filters from each such pair. However, instead of discarding one of the filters from each such pair naïvely, the model is re-optimized to make the filters in these pairs maximally correlated, so that discarding one of the filters from the pair results in minimal information loss. Moreover, after discarding the filters in each round, we further finetune the model to recover from the potential small loss incurred by the compression. We evaluate our proposed approach using a comprehensive set of experiments and ablation studies. Our compression method yields state-of-the-art FLOPs compression rates on various benchmarks, such as LeNet-5, VGG-16, and ResNet-50,56, while still achieving excellent predictive performance for tasks such as object detection on benchmark datasets. △ Less

Submitted 15 January, 2020; v1 submitted 26 November, 2018; originally announced November 2018.

Comments: IEEE Winter Conference on Applications of Computer Vision (WACV), 2020

arXiv:1811.00036 [pdf, other]

doi 10.1103/PhysRevLett.121.170403

Demonstration of Einstein-Podolsky-Rosen Steering Using Hybrid Continuous- and Discrete-Variable Entanglement of Light

Authors: A. Cavaillès, H. Le Jeannic, J. Raskop, G. Guccione, D. Markham, E. Diamanti, M. D. Shaw, V. B. Verma, S. W. Nam, J. Laurat

Abstract: Einstein-Podolsky-Rosen steering is known to be a key resource for one-sided device-independent quantum information protocols. Here we demonstrate steering using hybrid entanglement between continuous- and discrete-variable optical qubits. To this end, we report on suitable steering inequalities and detail the implementation and requirements for this demonstration. Steering is experimentally certi… ▽ More Einstein-Podolsky-Rosen steering is known to be a key resource for one-sided device-independent quantum information protocols. Here we demonstrate steering using hybrid entanglement between continuous- and discrete-variable optical qubits. To this end, we report on suitable steering inequalities and detail the implementation and requirements for this demonstration. Steering is experimentally certified by observing a violation by more than 5 standard deviations. Our results illustrate the potential of optical hybrid entanglement for applications in heterogeneous quantum networks that would interconnect disparate physical platforms and encodings. △ Less

Submitted 31 October, 2018; originally announced November 2018.

Journal ref: Phys. Rev. Lett. 121, 170403 (2018)

arXiv:1806.10279 [pdf, other]

doi 10.1103/PhysRevLett.121.100401

Conclusive experimental demonstration of one-way Einstein-Podolsky-Rosen steering

Authors: Nora Tischler, Farzad Ghafari, Travis J. Baker, Sergei Slussarenko, Raj B. Patel, Morgan M. Weston, Sabine Wollmann, Lynden K. Shalm, Varun B. Verma, Sae Woo Nam, H. Chau Nguyen, Howard M. Wiseman, Geoff J. Pryde

Abstract: Einstein-Podolsky-Rosen steering is a quantum phenomenon wherein one party influences, or steers, the state of a distant party's particle beyond what could be achieved with a separable state, by making measurements on one half of an entangled state. This type of quantum nonlocality stands out through its asymmetric setting, and even allows for cases where one party can steer the other, but where t… ▽ More Einstein-Podolsky-Rosen steering is a quantum phenomenon wherein one party influences, or steers, the state of a distant party's particle beyond what could be achieved with a separable state, by making measurements on one half of an entangled state. This type of quantum nonlocality stands out through its asymmetric setting, and even allows for cases where one party can steer the other, but where the reverse is not true. A series of experiments have demonstrated one-way steering in the past, but all were based on significant limiting assumptions. These consisted either of restrictions on the type of allowed measurements, or of assumptions about the quantum state at hand, by map** to a specific family of states and analysing the ideal target state rather than the real experimental state. Here, we present the first experimental demonstration of one-way steering free of such assumptions. We achieve this using a new sufficient condition for non-steerability, and, although not required by our analysis, using a novel source of extremely high-quality photonic Werner states. △ Less

Submitted 12 September, 2018; v1 submitted 26 June, 2018; originally announced June 2018.

Comments: Supplemental Material included in the document

Journal ref: Phys. Rev. Lett. 121, 100401 (2018)

arXiv:1806.06765 [pdf, other]

Modularity Matters: Learning Invariant Relational Reasoning Tasks

Authors: Jason Jo, Vikas Verma, Yoshua Bengio

Abstract: We focus on two supervised visual reasoning tasks whose labels encode a semantic relational rule between two or more objects in an image: the MNIST Parity task and the colorized Pentomino task. The objects in the images undergo random translation, scaling, rotation and coloring transformations. Thus these tasks involve invariant relational reasoning. We report uneven performance of various deep CN… ▽ More We focus on two supervised visual reasoning tasks whose labels encode a semantic relational rule between two or more objects in an image: the MNIST Parity task and the colorized Pentomino task. The objects in the images undergo random translation, scaling, rotation and coloring transformations. Thus these tasks involve invariant relational reasoning. We report uneven performance of various deep CNN models on these two tasks. For the MNIST Parity task, we report that the VGG19 model soundly outperforms a family of ResNet models. Moreover, the family of ResNet models exhibits a general sensitivity to random initialization for the MNIST Parity task. For the colorized Pentomino task, now both the VGG19 and ResNet models exhibit sluggish optimization and very poor test generalization, hovering around 30% test error. The CNN we tested all learn hierarchies of fully distributed features and thus encode the distributed representation prior. We are motivated by a hypothesis from cognitive neuroscience which posits that the human visual cortex is modularized, and this allows the visual cortex to learn higher order invariances. To this end, we consider a modularized variant of the ResNet model, referred to as a Residual Mixture Network (ResMixNet) which employs a mixture-of-experts architecture to interleave distributed representations with more specialized, modular representations. We show that very shallow ResMixNets are capable of learning each of the two tasks well, attaining less than 2% and 1% test error on the MNIST Parity and the colorized Pentomino tasks respectively. Most importantly, the ResMixNet models are extremely parameter efficient: generalizing better than various non-modular CNNs that have over 10x the number of parameters. These experimental results support the hypothesis that modularity is a robust prior for learning invariant relational reasoning. △ Less

Submitted 18 June, 2018; originally announced June 2018.

Comments: Modified abstract to fit arXiv character limit

arXiv:1806.05236 [pdf, other]

Manifold Mixup: Better Representations by Interpolating Hidden States

Authors: Vikas Verma, Alex Lamb, Christopher Beckham, Amir Najafi, Ioannis Mitliagkas, Aaron Courville, David Lopez-Paz, Yoshua Bengio

Abstract: Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose Manifold Mixup, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden repr… ▽ More Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose Manifold Mixup, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations. Manifold Mixup leverages semantic interpolations as additional training signal, obtaining neural networks with smoother decision boundaries at multiple levels of representation. As a result, neural networks trained with Manifold Mixup learn class-representations with fewer directions of variance. We prove theory on why this flattening happens under ideal conditions, validate it on practical situations, and connect it to previous works on information theory and generalization. In spite of incurring no significant computation and being implemented in a few lines of code, Manifold Mixup improves strong baselines in supervised learning, robustness to single-step adversarial attacks, and test log-likelihood. △ Less

Submitted 11 May, 2019; v1 submitted 13 June, 2018; originally announced June 2018.

Comments: To appear in ICML 2019

arXiv:1805.04431 [pdf, other]

doi 10.1038/s41586-018-0085-3

Challenging local realism with human choices

Authors: The BIG Bell Test Collaboration, C. Abellán, A. Acín, A. Alarcón, O. Alibart, C. K. Andersen, F. Andreoli, A. Beckert, F. A. Beduini, A. Bendersky, M. Bentivegna, P. Bierhorst, D. Burchardt, A. Cabello, J. Cariñe, S. Carrasco, G. Carvacho, D. Cavalcanti, R. Chaves, J. Cortés-Vega, A. Cuevas, A. Delgado, H. de Riedmatten, C. Eichler, P. Farrera , et al. (83 additional authors not shown)

Abstract: A Bell test is a randomized trial that compares experimental observations against the philosophical worldview of local realism. A Bell test requires spatially distributed entanglement, fast and high-efficiency detection and unpredictable measurement settings. Although technology can satisfy the first two of these requirements, the use of physical devices to choose settings in a Bell test involves… ▽ More A Bell test is a randomized trial that compares experimental observations against the philosophical worldview of local realism. A Bell test requires spatially distributed entanglement, fast and high-efficiency detection and unpredictable measurement settings. Although technology can satisfy the first two of these requirements, the use of physical devices to choose settings in a Bell test involves making assumptions about the physics that one aims to test. Bell himself noted this weakness in using physical setting choices and argued that human `free will' could be used rigorously to ensure unpredictability in Bell tests. Here we report a set of local-realism tests using human choices, which avoids assumptions about predictability in physics. We recruited about 100,000 human participants to play an online video game that incentivizes fast, sustained input of unpredictable selections and illustrates Bell-test methodology. The participants generated 97,347,490 binary choices, which were directed via a scalable web platform to 12 laboratories on five continents, where 13 experiments tested local realism using photons, single atoms, atomic ensembles, and superconducting devices. Over a 12-hour period on 30 November 2016, participants worldwide provided a sustained data flow of over 1,000 bits per second to the experiments, which used different human-generated data to choose each measurement setting. The observed correlations strongly contradict local realism and other realistic positions in bipartite and tripartite scenarios. Project outcomes include closing the `freedom-of-choice loophole' (the possibility that the setting choices are influenced by `hidden variables' to correlate with the particle properties), the utilization of video-game methods for rapid collection of human generated randomness, and the use of networking techniques for global participation in experimental science. △ Less

Submitted 9 November, 2018; v1 submitted 11 May, 2018; originally announced May 2018.

Comments: This version includes minor changes resulting from reviewer and editorial input. Abstract shortened to fit within arXiv limits

Journal ref: Nature, volume 557, pages 212-216 (2018)

arXiv:1804.05699 [pdf, other]

doi 10.1103/PhysRevApplied.11.054056

Storage and retrieval of heralded telecommunication-wavelength photons using a solid-state waveguide quantum memory

Authors: Mohsen Falamarzi Askarani, Marcel. li Grimau Pugibert, Thomas Lutz, Varun B. Verma, Matthew D. Shaw, Sae Woo Nam, Neil Sinclair, Daniel Oblak, Wolfgang Tittel

Abstract: Large-scale quantum networks will employ telecommunication-wavelength photons to exchange quantum information between remote measurement, storage, and processing nodes via fibre-optic channels. Quantum memories compatible with telecommunication-wavelength photons are a key element towards building such a quantum network. Here, we demonstrate the storage and retrieval of heralded 1532 nm-wavelength… ▽ More Large-scale quantum networks will employ telecommunication-wavelength photons to exchange quantum information between remote measurement, storage, and processing nodes via fibre-optic channels. Quantum memories compatible with telecommunication-wavelength photons are a key element towards building such a quantum network. Here, we demonstrate the storage and retrieval of heralded 1532 nm-wavelength photons using a solid-state waveguide quantum memory. The heralded photons are derived from a photon-pair source that is based on parametric down-conversion, and our quantum memory is based on a 6 GHz-bandwidth atomic frequency comb prepared using an inhomogeneously broadened absorption line of a cryogenically-cooled erbium-doped lithium niobate waveguide. Using persistent spectral hole burning under varying magnetic fields, we determine that the memory is enabled by population transfer into niobium and lithium nuclear spin levels. Despite limited storage time and efficiency, our demonstration represents an important step towards quantum networks that operate in the telecommunication band and the development of on-chip quantum technology using industry-standard crystals. △ Less

Submitted 16 April, 2018; originally announced April 2018.

Comments: 8 pages, 4 figures

Journal ref: Phys. Rev. Applied 11, 054056 (2019)

arXiv:1803.07520 [pdf, other]

doi 10.1103/PhysRevLett.121.183603

Optically addressing single rare-earth ions in a nanophotonic cavity

Authors: Tian Zhong, Jonathan M. Kindem, John G. Bartholomew, Jake Rochman, Ioana Craiciu, Varun Verma, Sae Woo Nam, Francesco Marsili, Matthew D. Shaw, Andrew D. Beyer, Andrei Faraon

Abstract: We demonstrate optical probing of spectrally resolved single Nd rare-earth ions in yttrium orthovanadate. The ions are coupled to a photonic crystal resonator and show strong enhancement of the optical emission rate via the Purcell effect, resulting in near radiatively limited single photon emission. The measured high coupling cooperativity between a single photon and the ion allows for the observ… ▽ More We demonstrate optical probing of spectrally resolved single Nd rare-earth ions in yttrium orthovanadate. The ions are coupled to a photonic crystal resonator and show strong enhancement of the optical emission rate via the Purcell effect, resulting in near radiatively limited single photon emission. The measured high coupling cooperativity between a single photon and the ion allows for the observation of coherent optical Rabi oscillations. This could enable optically controlled spin qubits, quantum logic gates, and spin-photon interfaces for future quantum networks. △ Less

Submitted 12 January, 2019; v1 submitted 20 March, 2018; originally announced March 2018.

Journal ref: Phys. Rev. Lett. 121, 183603 (2018)

arXiv:1802.07426 [pdf, other]

Generalization in Machine Learning via Analytical Learning Theory

Authors: Kenji Kawaguchi, Yoshua Bengio, Vikas Verma, Leslie Pack Kaelbling

Abstract: This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this theory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in… ▽ More This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this theory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in deep learning. We discuss several consequences of our results on one-shot learning, representation learning, deep learning, and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. As a result, it provides different types of results and insights when compared to statistical learning theory. △ Less

Submitted 6 March, 2019; v1 submitted 21 February, 2018; originally announced February 2018.

Report number: Massachusetts Institute of Technology (MIT), MIT-CSAIL-TR-2018-019

arXiv:1801.09086 [pdf, other]

A Generative Approach to Zero-Shot and Few-Shot Action Recognition

Authors: Ashish Mishra, Vinay Kumar Verma, M Shiva Krishna Reddy, Arulkumar S, Piyush Rai, Anurag Mittal

Abstract: We present a generative framework for zero-shot action recognition where some of the possible action classes do not occur in the training data. Our approach is based on modeling each action class using a probability distribution whose parameters are functions of the attribute vector representing that action class. In particular, we assume that the distribution parameters for any action class in th… ▽ More We present a generative framework for zero-shot action recognition where some of the possible action classes do not occur in the training data. Our approach is based on modeling each action class using a probability distribution whose parameters are functions of the attribute vector representing that action class. In particular, we assume that the distribution parameters for any action class in the visual space can be expressed as a linear combination of a set of basis vectors where the combination weights are given by the attributes of the action class. These basis vectors can be learned solely using labeled data from the known (i.e., previously seen) action classes, and can then be used to predict the parameters of the probability distributions of unseen action classes. We consider two settings: (1) Inductive setting, where we use only the labeled examples of the seen action classes to predict the unseen action class parameters; and (2) Transductive setting which further leverages unlabeled data from the unseen action classes. Our framework also naturally extends to few-shot action recognition where a few labeled examples from unseen classes are available. Our experiments on benchmark datasets (UCF101, HMDB51 and Olympic) show significant performance improvements as compared to various baselines, in both standard zero-shot (disjoint seen and unseen classes) and generalized zero-shot learning settings. △ Less

Submitted 27 January, 2018; originally announced January 2018.

Comments: Accepted in WACV 2018

arXiv:1712.05019 [pdf]

doi 10.1103/PhysRevB.97.174502

Superconducting fluctuations and characteristic time scales in amorphous WSi

Authors: Xiaofu Zhang, Adriana E. Lita, Mariia Sidorova, Varun B. Verma, Qiang Wang, Sae Woo Nam, Alexej Semenov, Andreas Schilling

Abstract: We study magnitudes and temperature dependences of the electron-electron and electron-phonon interaction times which play the dominant role in the formation and relaxation of photon induced hotspot in two dimensional amorphous WSi films. The time constants are obtained through magnetoconductance measurements in perpendicular magnetic field in the superconducting fluctuation regime and through time… ▽ More We study magnitudes and temperature dependences of the electron-electron and electron-phonon interaction times which play the dominant role in the formation and relaxation of photon induced hotspot in two dimensional amorphous WSi films. The time constants are obtained through magnetoconductance measurements in perpendicular magnetic field in the superconducting fluctuation regime and through time-resolved photoresponse to optical pulses. The excess magnetoconductivity is interpreted in terms of the weak-localization effect and superconducting fluctuations. Aslamazov-Larkin, and Maki-Thompson superconducting fluctuation alone fail to reproduce the magnetic field dependence in the relatively high magnetic field range when the temperature is rather close to Tc because the suppression of the electronic density of states due to the formation of short lifetime Cooper pairs needs to be considered. The time scale τ_i of inelastic scattering is ascribed to a combination of electron-electron (τ_(e-e)) and electron-phonon (τ_(e-ph)) interaction times, and a characteristic electron-fluctuation time (τ_(e-fl)), which makes it possible to extract their magnitudes and temperature dependences from the measured τ_i. The ratio of phonon-electron (τ_(ph-e)) and electron-phonon interaction times is obtained via measurements of the optical photoresponse of WSi microbridges. Relatively large τ_(e-ph)/τ_(ph-e) and τ_(e-ph)/τ_(e-e) ratios ensure that in WSi the photon energy is more efficiently confined in the electron subsystem than in other materials commonly used in the technology of superconducting nanowire single-photon detectors (SNSPDs). We discuss the impact of interaction times on the hotspot dynamics and compare relevant metrics of SNSPDs from different materials. △ Less

Submitted 13 December, 2017; originally announced December 2017.

Journal ref: Phys. Rev. B 97, 174502 (2018)

arXiv:1712.03878 [pdf, other]

Generalized Zero-Shot Learning via Synthesized Examples

Authors: Vinay Kumar Verma, Gundeep Arora, Ashish Mishra, Piyush Rai

Abstract: We present a generative framework for generalized zero-shot learning where the training and test classes are not necessarily disjoint. Built upon a variational autoencoder based architecture, consisting of a probabilistic encoder and a probabilistic conditional decoder, our model can generate novel exemplars from seen/unseen classes, given their respective class attributes. These exemplars can sub… ▽ More We present a generative framework for generalized zero-shot learning where the training and test classes are not necessarily disjoint. Built upon a variational autoencoder based architecture, consisting of a probabilistic encoder and a probabilistic conditional decoder, our model can generate novel exemplars from seen/unseen classes, given their respective class attributes. These exemplars can subsequently be used to train any off-the-shelf classification model. One of the key aspects of our encoder-decoder architecture is a feedback-driven mechanism in which a discriminator (a multivariate regressor) learns to map the generated exemplars to the corresponding class attribute vectors, leading to an improved generator. Our model's ability to generate and leverage examples from unseen classes to train the classification model naturally helps to mitigate the bias towards predicting seen classes in generalized zero-shot learning settings. Through a comprehensive set of experiments, we show that our model outperforms several state-of-the-art methods, on several benchmark datasets, for both standard as well as generalized zero-shot learning. △ Less

Submitted 11 June, 2018; v1 submitted 11 December, 2017; originally announced December 2017.

Comments: Accepted in CVPR'18

arXiv:1712.02313 [pdf, other]

doi 10.1016/j.image.2018.04.014

DCT-domain Deep Convolutional Neural Networks for Multiple JPEG Compression Classification

Authors: Vinay Verma, Nikita Agarwal, Nitin Khanna

Abstract: With the rapid advancements in digital imaging systems and networking, low-cost hand-held image capture devices equipped with network connectivity are becoming ubiquitous. This ease of digital image capture and sharing is also accompanied by widespread usage of user-friendly image editing software. Thus, we are in an era where digital images can be very easily used for the massive spread of false… ▽ More With the rapid advancements in digital imaging systems and networking, low-cost hand-held image capture devices equipped with network connectivity are becoming ubiquitous. This ease of digital image capture and sharing is also accompanied by widespread usage of user-friendly image editing software. Thus, we are in an era where digital images can be very easily used for the massive spread of false information and their integrity need to be seriously questioned. Application of multiple lossy compressions on images is an essential part of any image editing pipeline involving lossy compressed images. This paper aims to address the problem of classifying images based on the number of JPEG compressions they have undergone, by utilizing deep convolutional neural networks in DCT domain. The proposed system incorporates a well designed pre-processing step before feeding the image data to CNN to capture essential characteristics of compression artifacts and make the system image content independent. Detailed experiments are performed to optimize different aspects of the system, such as depth of CNN, number of DCT frequencies, and execution time. Results on the standard UCID dataset demonstrate that the proposed system outperforms existing systems for multiple JPEG compression detection and is capable of classifying more number of re-compression cycles then existing systems. △ Less

Submitted 6 December, 2017; originally announced December 2017.

Comments: 12 pages

arXiv:1711.05820 [pdf, other]

Zero-Shot Learning via Class-Conditioned Deep Generative Models

Authors: Wenlin Wang, Yunchen Pu, Vinay Kumar Verma, Kai Fan, Yizhe Zhang, Changyou Chen, Piyush Rai, Lawrence Carin

Abstract: We present a deep generative model for learning to predict classes not seen at training time. Unlike most existing methods for this problem, that represent each class as a point (via a semantic embedding), we represent each seen/unseen class using a class-specific latent-space distribution, conditioned on class attributes. We use these latent-space distributions as a prior for a supervised variati… ▽ More We present a deep generative model for learning to predict classes not seen at training time. Unlike most existing methods for this problem, that represent each class as a point (via a semantic embedding), we represent each seen/unseen class using a class-specific latent-space distribution, conditioned on class attributes. We use these latent-space distributions as a prior for a supervised variational autoencoder (VAE), which also facilitates learning highly discriminative feature representations for the inputs. The entire framework is learned end-to-end using only the seen-class training data. The model infers corresponding attributes of a test image by maximizing the VAE lower bound; the inferred attributes may be linked to labels not seen when training. We further extend our model to a (1) semi-supervised/transductive setting by leveraging unlabeled unseen-class data via an unsupervised learning module, and (2) few-shot learning where we also have a small number of labeled inputs from the unseen classes. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of benchmark data sets. △ Less

Submitted 19 November, 2017; v1 submitted 15 November, 2017; originally announced November 2017.

Comments: To appear in AAAI 2018

arXiv:1710.09667 [pdf, other]

doi 10.1103/PhysRevApplied.9.064019

Direct observation of nanofabrication influence on the optical properties of single self-assembled InAs/GaAs quantum dots

Authors: ** Liu, Kumarasiri Konthasinghe, Marcelo Davanco, John Lawall, Vikas Anant, Varun Verma, Richard Mirin, Sae Woo Nam, ** Dong Song, Ben Ma, Ze Sheng Chen, Hai Qiao Ni, Zhi Chuan Niu, Kartik Srinivasan

Abstract: Single self-assembled InAs/GaAs quantum dots are a promising solid-state quantum technology, with which vacuum Rabi splitting, single-photon-level nonlinearities, and bright, pure, and indistinguishable single-photon generation having been demonstrated. For such achievements, nanofabrication is used to create structures in which the quantum dot preferentially interacts with strongly-confined optic… ▽ More Single self-assembled InAs/GaAs quantum dots are a promising solid-state quantum technology, with which vacuum Rabi splitting, single-photon-level nonlinearities, and bright, pure, and indistinguishable single-photon generation having been demonstrated. For such achievements, nanofabrication is used to create structures in which the quantum dot preferentially interacts with strongly-confined optical modes. An open question is the extent to which such nanofabrication may also have an adverse influence, through the creation of traps and surface states that could induce blinking, spectral diffusion, and dephasing. Here, we use photoluminescence imaging to locate the positions of single InAs/GaAs quantum dots with respect to alignment marks with < 5 nm uncertainty, allowing us to measure their behavior before and after fabrication. We track the quantum dot emission linewidth and photon statistics as a function of distance from an etched surface, and find that the linewidth is significantly broadened (up to several GHz) for etched surfaces within a couple hundred nanometers of the quantum dot. However, we do not observe appreciable reduction of the quantum dot radiative efficiency due to blinking. We also show that atomic layer deposition can stabilize spectral diffusion of the quantum dot emission, and partially recover its linewidth. △ Less

Submitted 26 October, 2017; originally announced October 2017.

Comments: 11 pages, 8 figures

Journal ref: Phys. Rev. Applied 9, 064019 (2018)

arXiv:1710.04773 [pdf, other]

Residual Connections Encourage Iterative Inference

Authors: Stanisław Jastrzębski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, Yoshua Bengio

Abstract: Residual networks (Resnets) have become a prominent architecture in deep learning. However, a comprehensive understanding of Resnets is still a topic of ongoing research. A recent view argues that Resnets perform iterative refinement of features. We attempt to further expose properties of this aspect. To this end, we study Resnets both analytically and empirically. We formalize the notion of ite… ▽ More Residual networks (Resnets) have become a prominent architecture in deep learning. However, a comprehensive understanding of Resnets is still a topic of ongoing research. A recent view argues that Resnets perform iterative refinement of features. We attempt to further expose properties of this aspect. To this end, we study Resnets both analytically and empirically. We formalize the notion of iterative refinement in Resnets by showing that residual connections naturally encourage features of residual blocks to move along the negative gradient of loss as we go from one block to the next. In addition, our empirical analysis suggests that Resnets are able to perform both representation learning and iterative refinement. In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features. Finally we observe that sharing residual layers naively leads to representation explosion and counterintuitively, overfitting, and we show that simple existing strategies can help alleviating this problem. △ Less

Submitted 8 March, 2018; v1 submitted 12 October, 2017; originally announced October 2017.

Comments: First two authors contributed equally. Published in ICLR 2018

Showing 51–100 of 167 results for author: Verma, V