-
PatchUp: A Feature-Space Block-Level Regularization Technique for Convolutional Neural Networks
Authors:
Mojtaba Faramarzi,
Mohammad Amini,
Akilesh Badrinaaraayanan,
Vikas Verma,
Sarath Chandar
Abstract:
Large capacity deep learning models are often prone to a high generalization gap when trained with a limited amount of labeled training data. A recent class of methods to address this problem uses various ways to construct a new training sample by mixing a pair (or more) of training samples. We propose PatchUp, a hidden state block-level regularization technique for Convolutional Neural Networks (…
▽ More
Large capacity deep learning models are often prone to a high generalization gap when trained with a limited amount of labeled training data. A recent class of methods to address this problem uses various ways to construct a new training sample by mixing a pair (or more) of training samples. We propose PatchUp, a hidden state block-level regularization technique for Convolutional Neural Networks (CNNs), that is applied on selected contiguous blocks of feature maps from a random pair of samples. Our approach improves the robustness of CNN models against the manifold intrusion problem that may occur in other state-of-the-art mixing approaches. Moreover, since we are mixing the contiguous block of features in the hidden space, which has more dimensions than the input space, we obtain more diverse samples for training towards different dimensions. Our experiments on CIFAR10/100, SVHN, Tiny-ImageNet, and ImageNet using ResNet architectures including PreActResnet18/34, WRN-28-10, ResNet101/152 models show that PatchUp improves upon, or equals, the performance of current state-of-the-art regularizers for CNNs. We also show that PatchUp can provide a better generalization to deformed samples and is more robust against adversarial attacks.
△ Less
Submitted 7 January, 2023; v1 submitted 14 June, 2020;
originally announced June 2020.
-
Interpolation-based semi-supervised learning for object detection
Authors:
Jisoo Jeong,
Vikas Verma,
Minsung Hyun,
Juho Kannala,
Nojun Kwak
Abstract:
Despite the data labeling cost for the object detection tasks being substantially more than that of the classification tasks, semi-supervised learning methods for object detection have not been studied much. In this paper, we propose an Interpolation-based Semi-supervised learning method for object Detection (ISD), which considers and solves the problems caused by applying conventional Interpolati…
▽ More
Despite the data labeling cost for the object detection tasks being substantially more than that of the classification tasks, semi-supervised learning methods for object detection have not been studied much. In this paper, we propose an Interpolation-based Semi-supervised learning method for object Detection (ISD), which considers and solves the problems caused by applying conventional Interpolation Regularization (IR) directly to object detection. We divide the output of the model into two types according to the objectness scores of both original patches that are mixed in IR. Then, we apply a separate loss suitable for each type in an unsupervised manner. The proposed losses dramatically improve the performance of semi-supervised learning as well as supervised learning. In the supervised learning setting, our method improves the baseline methods by a significant margin. In the semi-supervised learning setting, our algorithm improves the performance on a benchmark dataset (PASCAL VOC and MSCOCO) in a benchmark architecture (SSD).
△ Less
Submitted 29 December, 2020; v1 submitted 3 June, 2020;
originally announced June 2020.
-
Topology optimization of nonlinear periodically microstructured materials for tailored homogenized constitutive properties
Authors:
Reza Behrou,
Maroun Abi Ghanem,
Brianna C. Macnider,
Vimarsh Verma,
Ryan Alvey,
**ho Hong,
Ashley F. Emery,
Hyunsun Alicia Kim,
Nicholas Boechler
Abstract:
A topology optimization method is presented for the design of periodic microstructured materials with prescribed homogenized nonlinear constitutive properties over finite strain ranges. The mechanical model assumes linear elastic isotropic materials, geometric nonlinearity at finite strain, and a quasi-static response. The optimization problem is solved by a nonlinear programming method and the se…
▽ More
A topology optimization method is presented for the design of periodic microstructured materials with prescribed homogenized nonlinear constitutive properties over finite strain ranges. The mechanical model assumes linear elastic isotropic materials, geometric nonlinearity at finite strain, and a quasi-static response. The optimization problem is solved by a nonlinear programming method and the sensitivities computed via the adjoint method. Two-dimensional structures identified using this optimization method are additively manufactured and their uniaxial tensile strain response compared with the numerically predicted behavior. The optimization approach herein enables the design and development of lattice-like materials with prescribed nonlinear effective properties, for use in myriad potential applications, ranging from stress wave and vibration mitigation to soft robotics.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Continual Learning using a Bayesian Nonparametric Dictionary of Weight Factors
Authors:
Nikhil Mehta,
Kevin J Liang,
Vinay K Verma,
Lawrence Carin
Abstract:
Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings, where data from previous tasks are unavailable. A number of methods, using various model expansion strategies, have been proposed recently as possible solutions. However, determining how much to expand the model is left to the practitioner, and often a constant schedule is chosen for simplicity,…
▽ More
Naively trained neural networks tend to experience catastrophic forgetting in sequential task settings, where data from previous tasks are unavailable. A number of methods, using various model expansion strategies, have been proposed recently as possible solutions. However, determining how much to expand the model is left to the practitioner, and often a constant schedule is chosen for simplicity, regardless of how complex the incoming task is. Instead, we propose a principled Bayesian nonparametric approach based on the Indian Buffet Process (IBP) prior, letting the data determine how much to expand the model complexity. We pair this with a factorization of the neural network's weight matrices. Such an approach allows the number of factors of each weight matrix to scale with the complexity of the task, while the IBP prior encourages sparse weight factor selection and factor reuse, promoting positive knowledge transfer between tasks. We demonstrate the effectiveness of our method on a number of continual learning benchmarks and analyze how weight factors are allocated and reused throughout the training.
△ Less
Submitted 27 April, 2021; v1 submitted 21 April, 2020;
originally announced April 2020.
-
Connecting heterogeneous quantum networks by hybrid entanglement swap**
Authors:
G. Guccione,
T. Darras,
H. Le Jeannic,
V. B. Verma,
S. W. Nam,
A. Cavaillès,
J. Laurat
Abstract:
Recent advances in quantum technologies are rapidly stimulating the building of quantum networks. With the parallel development of multiple physical platforms and different types of encodings, a challenge for present and future networks is to uphold a heterogeneous structure for full functionality and therefore support modular systems that are not necessarily compatible with one another. Central t…
▽ More
Recent advances in quantum technologies are rapidly stimulating the building of quantum networks. With the parallel development of multiple physical platforms and different types of encodings, a challenge for present and future networks is to uphold a heterogeneous structure for full functionality and therefore support modular systems that are not necessarily compatible with one another. Central to this endeavor is the capability to distribute and interconnect optical entangled states relying on different discrete and continuous quantum variables. Here we report an entanglement swap** protocol connecting such entangled states. We generate single-photon entanglement and hybrid entanglement between particle-like and wave-like optical qubits, and then demonstrate the heralded creation of hybrid entanglement at a distance by using a specific Bell-state measurement. This ability opens up the prospect of connecting heterogeneous nodes of a network, with the promise of increased integration and novel functionalities.
△ Less
Submitted 13 April, 2021; v1 submitted 24 March, 2020;
originally announced March 2020.
-
Block-level Double JPEG Compression Detection for Image Forgery Localization
Authors:
Vinay Verma,
Deepak Singh,
Nitin Khanna
Abstract:
Forged images have a ubiquitous presence in today's world due to ease of availability of image manipulation tools. In this letter, we propose a deep learning-based novel approach which utilizes the inherent relationship between DCT coefficient histograms and corresponding quantization step sizes to distinguish between original and forged regions in a JPEG image, based on the detection of single an…
▽ More
Forged images have a ubiquitous presence in today's world due to ease of availability of image manipulation tools. In this letter, we propose a deep learning-based novel approach which utilizes the inherent relationship between DCT coefficient histograms and corresponding quantization step sizes to distinguish between original and forged regions in a JPEG image, based on the detection of single and double compressed blocks, without fully decompressing the JPEG image. We consider a diverse set of 1,120 quantization matrices collected in a recent study as compared to standard 100 quantization matrices for training, testing, and creating realistic forgeries. In particular, we carefully design the input to DenseNet with a specific combination of quantization step sizes and the respective histograms for a JPEG block. Using this input to learn the compression artifacts produces state-of-the-art results for the detection of single and double compressed blocks of sizes $256 \times 256$ and gives better results for smaller blocks of sizes $128 \times 128$ and $64 \times 64$. Consequently, improved forgery localization performances are obtained on realistic forged images. Also, in the case of test blocks compressed with completely different quantization matrices as compared to matrices used in training, the proposed method outperforms the current state-of-the-art.
△ Less
Submitted 20 March, 2020;
originally announced March 2020.
-
Superconducting microwire detectors with single-photon sensitivity in the near-infrared
Authors:
Jeff Chiles,
Sonia M. Buckley,
Adriana Lita,
Varun B. Verma,
Jason Allmaras,
Boris Korzh,
Matthew D. Shaw,
Jeffrey M. Shainline,
Richard P. Mirin,
Sae Woo Nam
Abstract:
We report on the fabrication and characterization of single-photon-sensitive WSi superconducting detectors with wire widths from 1 μm to 3 μm. The devices achieve saturated internal detection efficiency at 1.55 μm wavelength and exhibit maximum count rates in excess of 10^5 s^-1. We also investigate the material properties of the silicon-rich WSi films used for these devices. We find that many dev…
▽ More
We report on the fabrication and characterization of single-photon-sensitive WSi superconducting detectors with wire widths from 1 μm to 3 μm. The devices achieve saturated internal detection efficiency at 1.55 μm wavelength and exhibit maximum count rates in excess of 10^5 s^-1. We also investigate the material properties of the silicon-rich WSi films used for these devices. We find that many devices with active lengths of several hundred microns exhibit critical currents in excess of 50% of the depairing current. A meandered detector with 2.0 μm wire width is demonstrated over a surface area of 362x362 μm^2, showcasing the material and device quality achieved.
△ Less
Submitted 28 February, 2020;
originally announced February 2020.
-
Stacked Adversarial Network for Zero-Shot Sketch based Image Retrieval
Authors:
Anubha Pandey,
Ashish Mishra,
Vinay Kumar Verma,
Anurag Mittal,
Hema A. Murthy
Abstract:
Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that the data of all the classes are available during training. The assumption may not always be practical since the data of a few classes may be unavailable, or the classes may not appear at the time of training. Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to handle previous…
▽ More
Conventional approaches to Sketch-Based Image Retrieval (SBIR) assume that the data of all the classes are available during training. The assumption may not always be practical since the data of a few classes may be unavailable, or the classes may not appear at the time of training. Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) relaxes this constraint and allows the algorithm to handle previously unseen classes during the test. This paper proposes a generative approach based on the Stacked Adversarial Network (SAN) and the advantage of Siamese Network (SN) for ZS-SBIR. While SAN generates a high-quality sample, SN learns a better distance metric compared to that of the nearest neighbor search. The capability of the generative model to synthesize image features based on the sketch reduces the SBIR problem to that of an image-to-image retrieval problem. We evaluate the efficacy of our proposed approach on TU-Berlin, and Sketchy database in both standard ZSL and generalized ZSL setting. The proposed method yields a significant improvement in standard ZSL as well as in a more challenging generalized ZSL setting (GZSL) for SBIR.
△ Less
Submitted 18 January, 2020;
originally announced January 2020.
-
A "Network Pruning Network" Approach to Deep Model Compression
Authors:
Vinay Kumar Verma,
Pravendra Singh,
Vinay P. Namboodiri,
Piyush Rai
Abstract:
We present a filter pruning approach for deep model compression, using a multitask network. Our approach is based on learning a a pruner network to prune a pre-trained target network. The pruner is essentially a multitask deep neural network with binary outputs that help identify the filters from each layer of the original network that do not have any significant contribution to the model and can…
▽ More
We present a filter pruning approach for deep model compression, using a multitask network. Our approach is based on learning a a pruner network to prune a pre-trained target network. The pruner is essentially a multitask deep neural network with binary outputs that help identify the filters from each layer of the original network that do not have any significant contribution to the model and can therefore be pruned. The pruner network has the same architecture as the original network except that it has a multitask/multi-output last layer containing binary-valued outputs (one per filter), which indicate which filters have to be pruned. The pruner's goal is to minimize the number of filters from the original network by assigning zero weights to the corresponding output feature-maps. In contrast to most of the existing methods, instead of relying on iterative pruning, our approach can prune the network (original network) in one go and, moreover, does not require specifying the degree of pruning for each layer (and can learn it instead). The compressed model produced by our approach is generic and does not need any special hardware/software support. Moreover, augmenting with other methods such as knowledge distillation, quantization, and connection pruning can increase the degree of compression for the proposed approach. We show the efficacy of our proposed approach for classification and object detection tasks.
△ Less
Submitted 15 January, 2020;
originally announced January 2020.
-
Methodical engineering of defects in Mn$_x$Zn$_{1-x}$ O($x$ = 0.03, and 0.05) nanostructures by electron beam for nonlinear optical applications: A new insight
Authors:
Albin Antony,
P. Poornesh,
I. V. Kityk,
K. Ozga,
J. Jedryka,
Reji Philip,
Ganesh Sanjeev,
Vikash Chandra Petwal,
Vijay Pal Verma,
Jishnu Dwivedi
Abstract:
A series of MnxZn1-xO (x=0.03, 0.05) nanostructures have been grown via the solution based chemical spray pyrolysis technique. Electron beam induced modifications on structural, linear and nonlinear optical and surface morphological properties have been studied and elaborated. GXRD (glancing angle X-ray diffraction) patterns show sharp diffraction peaks matching with the hexagonal wurtzite structu…
▽ More
A series of MnxZn1-xO (x=0.03, 0.05) nanostructures have been grown via the solution based chemical spray pyrolysis technique. Electron beam induced modifications on structural, linear and nonlinear optical and surface morphological properties have been studied and elaborated. GXRD (glancing angle X-ray diffraction) patterns show sharp diffraction peaks matching with the hexagonal wurtzite structure of ZnO thin films. The upsurge in ebeam dosage resulted in the shifting of XRD peaks (101) and (002) towards lower angle side, and increase in FWHM value. Gaussian deconvolution on PL spectra reveals the quenching of defect centers, implying the role of electron beam irradiation regulating luminescence and defect centers in the nanostructures. Irradiation induced spatial confinement and phonon localization effects have been observed in the films via micro Raman studies. The later are evident from spectral peak shifts and broadening. Detailed investigations on the effect of electron beam irradiation on third order nonlinear optical properties under continuous and pulsed mode of laser operation regimes are deliberated. Third order absorptive nonlinearity of the nanostructures evaluated using the open aperture Z-scan technique in both continuous and pulsed laser regimes shows strong nonlinear absorption coefficient \b{eta} eff of the order 10-4 cm/W confirming their suitability for passive optical limiting applications under intense radiation environments. Laser induced third harmonic generation (LITHG) experiment results supports the significant variation in nonlinearities upon electron beam irradiation, and the effect can be utilized for frequency conversion mechanisms in high power laser sources and UV light emitters.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
SketchTransfer: A Challenging New Task for Exploring Detail-Invariance and the Abstractions Learned by Deep Networks
Authors:
Alex Lamb,
Sherjil Ozair,
Vikas Verma,
David Ha
Abstract:
Deep networks have achieved excellent results in perceptual tasks, yet their ability to generalize to variations not seen during training has come under increasing scrutiny. In this work we focus on their ability to have invariance towards the presence or absence of details. For example, humans are able to watch cartoons, which are missing many visual details, without being explicitly trained to d…
▽ More
Deep networks have achieved excellent results in perceptual tasks, yet their ability to generalize to variations not seen during training has come under increasing scrutiny. In this work we focus on their ability to have invariance towards the presence or absence of details. For example, humans are able to watch cartoons, which are missing many visual details, without being explicitly trained to do so. As another example, 3D rendering software is a relatively recent development, yet people are able to understand such rendered scenes even though they are missing details (consider a film like Toy Story). The failure of machine learning algorithms to do this indicates a significant gap in generalization between human abilities and the abilities of deep networks. We propose a dataset that will make it easier to study the detail-invariance problem concretely. We produce a concrete task for this: SketchTransfer, and we show that state-of-the-art domain transfer algorithms still struggle with this task. The state-of-the-art technique which achieves over 95\% on MNIST $\xrightarrow{}$ SVHN transfer only achieves 59\% accuracy on the SketchTransfer task, which is much better than random (11\% accuracy) but falls short of the 87\% accuracy of a classifier trained directly on labeled sketches. This indicates that this task is approachable with today's best methods but has substantial room for improvement.
△ Less
Submitted 24 December, 2019;
originally announced December 2019.
-
The submesoscale, the finescale and their interaction at a mixed layer front
Authors:
Vicky Verma,
Hieu T. Pham,
Sutanu Sarkar
Abstract:
The spindown of a geostrophically balanced density front in an upper-ocean mixed layer is simulated with a large eddy simulation (LES) model that resolves O(1000) m down to O(1) m scale. Our goal is to examine the interaction between the submesoscale and the turbulent finescale, and better characterize vertical transport, frontogenesis and dissipative processes. The flow passes through symmetric a…
▽ More
The spindown of a geostrophically balanced density front in an upper-ocean mixed layer is simulated with a large eddy simulation (LES) model that resolves O(1000) m down to O(1) m scale. Our goal is to examine the interaction between the submesoscale and the turbulent finescale, and better characterize vertical transport, frontogenesis and dissipative processes. The flow passes through symmetric and baroclinic instabilities, spawns vortex filaments of O(100) m thickness as well as larger eddies, and develops turbulence that is spatially localized and organized. A O(100) m physical-space filter is applied to the simulated flow to separate the coherent submesoscale from the finescale in a decomposition that preserves their spatial organization unlike the typical practice of along-front averaging. Analysis of the submesoscale vertical velocity (as large as 5 mm/s) reveals that downwelling is limited to the thin vortex filaments while upwelling occurs over spatially extensive regions in the eddies, resulting in an overall buoyancy flux that is restratifying. The kinetic energy (KE) transport equations are evaluated separately at both the scales to understand energy pathways in this problem. The buoyancy flux associated with coherent motions acts as the primary source of submesoscale KE which is then transported across the front with a fraction transferred to the finescale. The transfer, limited to thin regions of O(100) m horizontal width, is accomplished by primarily horizontal strain in the upper 10 m and by vertical shear in the rest of the 50 m deep mixed layer. Frontogenetic mechanisms are diagnosed through analysis of the transport equation for squared buoyancy gradient. Horizontal strain is the primary frontogenetic term and is counteracted primarily by horizontal diffusion in the top 10 m and by the horizontal gradient of vertical velocity further below.
△ Less
Submitted 16 October, 2019;
originally announced October 2019.
-
GraphMix: Improved Training of GNNs for Semi-Supervised Learning
Authors:
Vikas Verma,
Meng Qu,
Kenji Kawaguchi,
Alex Lamb,
Yoshua Bengio,
Juho Kannala,
Jian Tang
Abstract:
We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization. Further, we provide a theoretical analysis of how GraphMix improves the generalization bounds of the underlying graph neural networ…
▽ More
We present GraphMix, a regularization method for Graph Neural Network based semi-supervised object classification, whereby we propose to train a fully-connected network jointly with the graph neural network via parameter sharing and interpolation-based regularization. Further, we provide a theoretical analysis of how GraphMix improves the generalization bounds of the underlying graph neural network, without making any assumptions about the "aggregation" layer or the depth of the graph neural networks. We experimentally validate this analysis by applying GraphMix to various architectures such as Graph Convolutional Networks, Graph Attention Networks and Graph-U-Net. Despite its simplicity, we demonstrate that GraphMix can consistently improve or closely match state-of-the-art performance using even simpler architectures such as Graph Convolutional Networks, across three established graph benchmarks: Cora, Citeseer and Pubmed citation network datasets, as well as three newly proposed datasets: Cora-Full, Co-author-CS and Co-author-Physics.
△ Less
Submitted 8 October, 2020; v1 submitted 25 September, 2019;
originally announced September 2019.
-
A Meta-Learning Framework for Generalized Zero-Shot Learning
Authors:
Vinay Kumar Verma,
Dhanajit Brahma,
Piyush Rai
Abstract:
Learning to classify unseen class samples at test time is popularly referred to as zero-shot learning (ZSL). If test samples can be from training (seen) as well as unseen classes, it is a more challenging problem due to the existence of strong bias towards seen classes. This problem is generally known as \emph{generalized} zero-shot learning (GZSL). Thanks to the recent advances in generative mode…
▽ More
Learning to classify unseen class samples at test time is popularly referred to as zero-shot learning (ZSL). If test samples can be from training (seen) as well as unseen classes, it is a more challenging problem due to the existence of strong bias towards seen classes. This problem is generally known as \emph{generalized} zero-shot learning (GZSL). Thanks to the recent advances in generative models such as VAEs and GANs, sample synthesis based approaches have gained considerable attention for solving this problem. These approaches are able to handle the problem of class bias by synthesizing unseen class samples. However, these ZSL/GZSL models suffer due to the following key limitations: $(i)$ Their training stage learns a class-conditioned generator using only \emph{seen} class data and the training stage does not \emph{explicitly} learn to generate the unseen class samples; $(ii)$ They do not learn a generic optimal parameter which can easily generalize for both seen and unseen class generation; and $(iii)$ If we only have access to a very few samples per seen class, these models tend to perform poorly. In this paper, we propose a meta-learning based generative model that naturally handles these limitations. The proposed model is based on integrating model-agnostic meta learning with a Wasserstein GAN (WGAN) to handle $(i)$ and $(iii)$, and uses a novel task distribution to handle $(ii)$. Our proposed model yields significant improvements on standard ZSL as well as more challenging GZSL setting. In ZSL setting, our model yields 4.5\%, 6.0\%, 9.8\%, and 27.9\% relative improvements over the current state-of-the-art on CUB, AWA1, AWA2, and aPY datasets, respectively.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
Strong suppression of the resistivity near the transition to superconductivity in narrow micro-bridges in external magnetic fields
Authors:
Xiaofu Zhang,
Adriana E. Lita,
Konstantin Smirnov,
HuanLong Liu,
Dong Zhu,
Varun B. Verma,
Sae Woo Nam,
Andreas Schilling
Abstract:
We have investigated a series of superconducting bridges based on homogeneous amorphous WSi and MoSi films, with bridge widths w ranging from 2 um to 1000 um and film thicknesses d ~ 4-6 nm and 100 nm. Upon decreasing the bridge widths below the respective Pearl lengths, we observe in all cases distinct changes in the characteristics of the resistive transitions to superconductivity. For each of t…
▽ More
We have investigated a series of superconducting bridges based on homogeneous amorphous WSi and MoSi films, with bridge widths w ranging from 2 um to 1000 um and film thicknesses d ~ 4-6 nm and 100 nm. Upon decreasing the bridge widths below the respective Pearl lengths, we observe in all cases distinct changes in the characteristics of the resistive transitions to superconductivity. For each of the films, the resistivity curves R(B,T) separate at a well-defined and field-dependent temperature T*(B) with decreasing the temperature, resulting in a dramatic suppression of the resistivity and a sharpening of the transitions with decreasing bridge width w. The associated excess conductivity in all the bridges scales as 1/w, which may suggest the presence of a highly conducting region that is dominating the electric transport in narrow bridges. We argue that this effect can only be observed in materials with sufficiently weak vortex pinning.
△ Less
Submitted 21 February, 2020; v1 submitted 6 September, 2019;
originally announced September 2019.
-
A kilopixel array of superconducting nanowire single-photon detectors
Authors:
Emma E. Wollman,
Varun B. Verma,
Adriana E. Lita,
William H. Farr,
Matthew D. Shaw,
Richard P. Mirin,
Sae Woo Nam
Abstract:
We present a 1024-element imaging array of superconducting nanowire single photon detectors (SNSPDs) using a 32x32 row-column multiplexing architecture. Large arrays are desirable for applications such as imaging, spectroscopy, or particle detection.
We present a 1024-element imaging array of superconducting nanowire single photon detectors (SNSPDs) using a 32x32 row-column multiplexing architecture. Large arrays are desirable for applications such as imaging, spectroscopy, or particle detection.
△ Less
Submitted 27 August, 2019;
originally announced August 2019.
-
InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization
Authors:
Fan-Yun Sun,
Jordan Hoffmann,
Vikas Verma,
Jian Tang
Abstract:
This paper studies learning the representations of whole graphs in both unsupervised and semi-supervised scenarios. Graph-level representations are critical in a variety of real-world applications such as predicting the properties of molecules and community analysis in social networks. Traditional graph kernel based methods are simple, yet effective for obtaining fixed-length representations for g…
▽ More
This paper studies learning the representations of whole graphs in both unsupervised and semi-supervised scenarios. Graph-level representations are critical in a variety of real-world applications such as predicting the properties of molecules and community analysis in social networks. Traditional graph kernel based methods are simple, yet effective for obtaining fixed-length representations for graphs but they suffer from poor generalization due to hand-crafted designs. There are also some recent methods based on language models (e.g. graph2vec) but they tend to only consider certain substructures (e.g. subtrees) as graph representatives. Inspired by recent progress of unsupervised representation learning, in this paper we proposed a novel method called InfoGraph for learning graph-level representations. We maximize the mutual information between the graph-level representation and the representations of substructures of different scales (e.g., nodes, edges, triangles). By doing so, the graph-level representations encode aspects of the data that are shared across different scales of substructures. Furthermore, we further propose InfoGraph*, an extension of InfoGraph for semi-supervised scenarios. InfoGraph* maximizes the mutual information between unsupervised graph representations learned by InfoGraph and the representations learned by existing supervised methods. As a result, the supervised encoder learns from unlabeled data while preserving the latent semantic space favored by the current supervised task. Experimental results on the tasks of graph classification and molecular property prediction show that InfoGraph is superior to state-of-the-art baselines and InfoGraph* can achieve performance competitive with state-of-the-art semi-supervised models.
△ Less
Submitted 17 January, 2020; v1 submitted 31 July, 2019;
originally announced August 2019.
-
Towards Understanding Generalization in Gradient-Based Meta-Learning
Authors:
Simon Guiroy,
Vikas Verma,
Christopher Pal
Abstract:
In this work we study generalization of neural networks in gradient-based meta-learning by analyzing various properties of the objective landscapes. We experimentally demonstrate that as meta-training progresses, the meta-test solutions, obtained after adapting the meta-train solution of the model, to new tasks via few steps of gradient-based fine-tuning, become flatter, lower in loss, and further…
▽ More
In this work we study generalization of neural networks in gradient-based meta-learning by analyzing various properties of the objective landscapes. We experimentally demonstrate that as meta-training progresses, the meta-test solutions, obtained after adapting the meta-train solution of the model, to new tasks via few steps of gradient-based fine-tuning, become flatter, lower in loss, and further away from the meta-train solution. We also show that those meta-test solutions become flatter even as generalization starts to degrade, thus providing an experimental evidence against the correlation between generalization and flat minima in the paradigm of gradient-based meta-leaning. Furthermore, we provide empirical evidence that generalization to new tasks is correlated with the coherence between their adaptation trajectories in parameter space, measured by the average cosine similarity between task-specific trajectory directions, starting from a same meta-train solution. We also show that coherence of meta-test gradients, measured by the average inner product between the task-specific gradient vectors evaluated at meta-train solution, is also correlated with generalization. Based on these observations, we propose a novel regularizer for MAML and provide experimental evidence for its effectiveness.
△ Less
Submitted 16 July, 2019;
originally announced July 2019.
-
Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing Too Much Accuracy
Authors:
Alex Lamb,
Vikas Verma,
Kenji Kawaguchi,
Alexander Matyasko,
Savya Khosla,
Juho Kannala,
Yoshua Bengio
Abstract:
Adversarial robustness has become a central goal in deep learning, both in the theory and the practice. However, successful methods to improve the adversarial robustness (such as adversarial training) greatly hurt generalization performance on the unperturbed data. This could have a major impact on how the adversarial robustness affects real world systems (i.e. many may opt to forego robustness if…
▽ More
Adversarial robustness has become a central goal in deep learning, both in the theory and the practice. However, successful methods to improve the adversarial robustness (such as adversarial training) greatly hurt generalization performance on the unperturbed data. This could have a major impact on how the adversarial robustness affects real world systems (i.e. many may opt to forego robustness if it can improve accuracy on the unperturbed data). We propose Interpolated Adversarial Training, which employs recently proposed interpolation based training methods in the framework of adversarial training. On CIFAR-10, adversarial training increases the standard test error (when there is no adversary) from 4.43% to 12.32%, whereas with our Interpolated adversarial training we retain the adversarial robustness while achieving a standard test error of only 6.45%. With our technique, the relative increase in the standard error for the robust model is reduced from 178.1% to just 45.5%. Moreover, we provide mathematical analysis of Interpolated Adversarial Training to confirm its efficiencies and demonstrate its advantages in terms of robustness and generalization.
△ Less
Submitted 19 October, 2022; v1 submitted 16 June, 2019;
originally announced June 2019.
-
A Generative Framework for Zero-Shot Learning with Adversarial Domain Adaptation
Authors:
Varun Khare,
Divyat Mahajan,
Homanga Bharadhwaj,
Vinay Verma,
Piyush Rai
Abstract:
We present a domain adaptation based generative framework for zero-shot learning. Our framework addresses the problem of domain shift between the seen and unseen class distributions in zero-shot learning and minimizes the shift by develo** a generative model trained via adversarial domain adaptation. Our approach is based on end-to-end learning of the class distributions of seen classes and unse…
▽ More
We present a domain adaptation based generative framework for zero-shot learning. Our framework addresses the problem of domain shift between the seen and unseen class distributions in zero-shot learning and minimizes the shift by develo** a generative model trained via adversarial domain adaptation. Our approach is based on end-to-end learning of the class distributions of seen classes and unseen classes. To enable the model to learn the class distributions of unseen classes, we parameterize these class distributions in terms of the class attribute information (which is available for both seen and unseen classes). This provides a very simple way to learn the class distribution of any unseen class, given only its class attribute information, and no labeled training data. Training this model with adversarial domain adaptation further provides robustness against the distribution mismatch between the data from seen and unseen classes. Our approach also provides a novel way for training neural net based classifiers to overcome the hubness problem in zero-shot learning. Through a comprehensive set of experiments, we show that our model yields superior accuracies as compared to various state-of-the-art zero shot learning models, on a variety of benchmark datasets. Code for the experiments is available at github.com/vkkhare/ZSL-ADA
△ Less
Submitted 22 February, 2020; v1 submitted 7 June, 2019;
originally announced June 2019.
-
Entanglement and non-locality between disparate solid-state quantum memories mediated by photons
Authors:
Marcel. li Grimau Puigibert,
Mohsen Falamarzi Askarani,
Jacob H. Davidson,
Varun B. Verma,
Matthew D. Shaw,
Sae Woo Nam,
Thomas Lutz,
Gustavo C. Amaral,
Daniel Oblak,
Wolfgang Tittel
Abstract:
Entangling quantum systems with different characteristics through the exchange of photons is a prerequisite for building future quantum networks. Proving the presence of entanglement between quantum memories for light working at different wavelengths furthers this goal. Here, we report on a series of experiments with a thulium-doped crystal, serving as a quantum memory for 794 nm photons, an erbiu…
▽ More
Entangling quantum systems with different characteristics through the exchange of photons is a prerequisite for building future quantum networks. Proving the presence of entanglement between quantum memories for light working at different wavelengths furthers this goal. Here, we report on a series of experiments with a thulium-doped crystal, serving as a quantum memory for 794 nm photons, an erbium-doped fibre, serving as a quantum memory for telecommunication-wavelength photons at 1535 nm, and a source of photon pairs created via spontaneous parametric down-conversion. Characterizing the photons after re-emission from the two memories, we find non-classical correlations with a cross-correlation coefficient of $g^{(2)}_{12} = 53\pm8$; entanglement preserving storage with input-output fidelity of $\mathcal{F}_{IO}\approx93\pm2\%$; and non-locality featuring a violation of the Clauser-Horne-Shimony-Holt Bell-inequality with $S= 2.6\pm0.2$. Our proof-of-principle experiment shows that entanglement persists while propagating through different solid-state quantum memories operating at different wavelengths.
△ Less
Submitted 21 May, 2019; v1 submitted 20 May, 2019;
originally announced May 2019.
-
Play and Prune: Adaptive Filter Pruning for Deep Model Compression
Authors:
Pravendra Singh,
Vinay Kumar Verma,
Piyush Rai,
Vinay P. Namboodiri
Abstract:
While convolutional neural networks (CNN) have achieved impressive performance on various classification/recognition tasks, they typically consist of a massive number of parameters. This results in significant memory requirement as well as computational overheads. Consequently, there is a growing need for filter-level pruning approaches for compressing CNN based models that not only reduce the tot…
▽ More
While convolutional neural networks (CNN) have achieved impressive performance on various classification/recognition tasks, they typically consist of a massive number of parameters. This results in significant memory requirement as well as computational overheads. Consequently, there is a growing need for filter-level pruning approaches for compressing CNN based models that not only reduce the total number of parameters but reduce the overall computation as well. We present a new min-max framework for filter-level pruning of CNNs. Our framework, called Play and Prune (PP), jointly prunes and fine-tunes CNN model parameters, with an adaptive pruning rate, while maintaining the model's predictive performance. Our framework consists of two modules: (1) An adaptive filter pruning (AFP) module, which minimizes the number of filters in the model; and (2) A pruning rate controller (PRC) module, which maximizes the accuracy during pruning. Moreover, unlike most previous approaches, our approach allows directly specifying the desired error tolerance instead of pruning level. Our compressed models can be deployed at run-time, without requiring any special libraries or hardware. Our approach reduces the number of parameters of VGG-16 by an impressive factor of 17.5X, and number of FLOPS by 6.43X, with no loss of accuracy, significantly outperforming other state-of-the-art filter pruning methods.
△ Less
Submitted 11 May, 2019;
originally announced May 2019.
-
Tunable quantum beat of single photons enabled by nonlinear nanophotonics
Authors:
Qing Li,
Anshuman Singh,
Xiyuan Lu,
John Lawall,
Varun Verma,
Richard Mirin,
Sae Woo Nam,
Kartik Srinivasan
Abstract:
We demonstrate the tunable quantum beat of single photons through the co-development of core nonlinear nanophotonic technologies for frequency-domain manipulation of quantum states in a common physical platform. Spontaneous four-wave mixing in a nonlinear resonator is used to produce non-degenerate, quantum-correlated photon pairs. One photon from each pair is then frequency shifted, without degra…
▽ More
We demonstrate the tunable quantum beat of single photons through the co-development of core nonlinear nanophotonic technologies for frequency-domain manipulation of quantum states in a common physical platform. Spontaneous four-wave mixing in a nonlinear resonator is used to produce non-degenerate, quantum-correlated photon pairs. One photon from each pair is then frequency shifted, without degradation of photon statistics, using four-wave mixing Bragg scattering in a second nonlinear resonator. Fine tuning of the applied frequency shift enables tunable quantum interference of the two photons as they are im**ed on a beamsplitter, with an oscillating signature that depends on their frequency difference. Our work showcases the potential of nonlinear nanophotonic devices as a valuable resource for photonic quantum information science.
△ Less
Submitted 5 May, 2019;
originally announced May 2019.
-
Measurement-device-independent quantum key distribution coexisting with classical communication
Authors:
Raju Valivarthi,
Prathwiraj Umesh,
Caleb John,
Kimberley A. Owen,
Varun B. Verma,
Sae Woo Nam,
Daniel Oblak,
Qiang Zhou,
Wolfgang Tittel
Abstract:
The possibility for quantum and classical communication to coexist on the same fibre is important for deployment and widespread adoption of quantum key distribution (QKD) and, more generally, a future quantum internet. While coexistence has been demonstrated for different QKD implementations, a comprehensive investigation for measurement-device independent (MDI) QKD -- a recently proposed QKD prot…
▽ More
The possibility for quantum and classical communication to coexist on the same fibre is important for deployment and widespread adoption of quantum key distribution (QKD) and, more generally, a future quantum internet. While coexistence has been demonstrated for different QKD implementations, a comprehensive investigation for measurement-device independent (MDI) QKD -- a recently proposed QKD protocol that cannot be broken by quantum hacking that targets vulnerabilities of single-photon detectors -- is still missing. Here we experimentally demonstrate that MDI-QKD can operate simultaneously with at least five 10 Gbps bidirectional classical communication channels operating at around 1550 nm wavelength and over 40 km of spooled fibre, and we project communication rates in excess of 10 THz when moving the quantum channel from the third to the second telecommunication window. The similarity of MDI-QKD with quantum repeaters suggests that classical and generalised quantum networks can co-exist on the same fibre infrastructure.
△ Less
Submitted 1 May, 2019;
originally announced May 2019.
-
Quantum Frequency Conversion of a Quantum Dot Single-Photon Source on a Nanophotonic Chip
Authors:
Anshuman Singh,
Qing Li,
Shunfa Liu,
Ying Yu,
Xiyuan Lu,
Christian Schneider,
Sven Höfling,
John Lawall,
Varun Verma,
Richard Mirin,
Sae Woo Nam,
** Liu,
Kartik Srinivasan
Abstract:
Single self-assembled InAs/GaAs quantum dots are promising bright sources of indistinguishable photons for quantum information science. However, their distribution in emission wavelength, due to inhomogeneous broadening inherent to their growth, has limited the ability to create multiple identical sources. Quantum frequency conversion can overcome this issue, particularly if implemented using scal…
▽ More
Single self-assembled InAs/GaAs quantum dots are promising bright sources of indistinguishable photons for quantum information science. However, their distribution in emission wavelength, due to inhomogeneous broadening inherent to their growth, has limited the ability to create multiple identical sources. Quantum frequency conversion can overcome this issue, particularly if implemented using scalable chip-integrated technologies. Here, we report the first demonstration of quantum frequency conversion of a quantum dot single-photon source on a silicon nanophotonic chip. Single photons from a quantum dot in a micropillar cavity are shifted in wavelength with an on-chip conversion efficiency of approximately 12 %, limited by the linewidth of the quantum dot photons. The intensity autocorrelation function g(2)(tau) for the frequency-converted light is antibunched with g(2)(0) = 0.290 +/- 0.030, compared to the before-conversion value g(2)(0) = 0.080 +/- 0.003. We demonstrate the suitability of our frequency conversion interface as a resource for quantum dot sources by characterizing its effectiveness across a wide span of input wavelengths (840 nm to 980 nm), and its ability to achieve tunable wavelength shifts difficult to obtain by other approaches.
△ Less
Submitted 26 April, 2019;
originally announced April 2019.
-
Determining the depairing current in superconducting nanowire single-photon detectors
Authors:
S. Frasca,
B. Korzh,
M. Colangelo,
D. Zhu,
A. E. Lita,
J. P. Allmaras,
E. E. Wollman,
V. B. Verma,
A. E. Dane,
E. Ramirez,
A. D. Beyer,
S. W. Nam,
A. G. Kozorezov,
M. D. Shaw,
K. K. Berggren
Abstract:
We estimate the depairing current of superconducting nanowire single photon detectors (SNSPDs) by studying the dependence of the nanowires kinetic inductance on their bias current. The kinetic inductance is determined by measuring the resonance frequency of resonator style nanowire coplanar waveguides both in transmission and reflection configurations. Bias current dependent shifts in the measured…
▽ More
We estimate the depairing current of superconducting nanowire single photon detectors (SNSPDs) by studying the dependence of the nanowires kinetic inductance on their bias current. The kinetic inductance is determined by measuring the resonance frequency of resonator style nanowire coplanar waveguides both in transmission and reflection configurations. Bias current dependent shifts in the measured resonant frequency correspond to the change in the kinetic inductance, which can be compared with theoretical predictions. We demonstrate that the fast relaxation model described in the literature accurately matches our experimental data and provides a valuable tool for direct determination of the depairing current. Accurate and direct measurement of the depairing current is critical for nanowire quality analysis, as well as modeling efforts aimed at understanding the detection mechanism in SNSPDs.
△ Less
Submitted 18 April, 2019;
originally announced April 2019.
-
Generative Model for Zero-Shot Sketch-Based Image Retrieval
Authors:
Vinay Kumar Verma,
Aakansha Mishra,
Ashish Mishra,
Piyush Rai
Abstract:
We present a probabilistic model for Sketch-Based Image Retrieval (SBIR) where, at retrieval time, we are given sketches from novel classes, that were not present at training time. Existing SBIR methods, most of which rely on learning class-wise correspondences between sketches and images, typically work well only for previously seen sketch classes, and result in poor retrieval performance on nove…
▽ More
We present a probabilistic model for Sketch-Based Image Retrieval (SBIR) where, at retrieval time, we are given sketches from novel classes, that were not present at training time. Existing SBIR methods, most of which rely on learning class-wise correspondences between sketches and images, typically work well only for previously seen sketch classes, and result in poor retrieval performance on novel classes. To address this, we propose a generative model that learns to generate images, conditioned on a given novel class sketch. This enables us to reduce the SBIR problem to a standard image-to-image search problem. Our model is based on an inverse auto-regressive flow based variational autoencoder, with a feedback mechanism to ensure robust image generation. We evaluate our model on two very challenging datasets, Sketchy, and TU Berlin, with novel train-test split. The proposed approach significantly outperforms various baselines on both the datasets.
△ Less
Submitted 17 April, 2019;
originally announced April 2019.
-
An ultrahigh-impedance superconducting thermal switch for interfacing superconductors to semiconductors and optoelectronics
Authors:
A. N. McCaughan,
V. B. Verma,
S. Buckley,
J. P. Allmaras,
A. G. Kozorezov,
A. N. Tait,
S. W. Nam,
J. M. Shainline
Abstract:
A number of current approaches to quantum and neuromorphic computing use superconductors as the basis of their platform or as a measurement component, and will need to operate at cryogenic temperatures. Semiconductor systems are typically proposed as a top-level control in these architectures, with low-temperature passive components and intermediary superconducting electronics acting as the direct…
▽ More
A number of current approaches to quantum and neuromorphic computing use superconductors as the basis of their platform or as a measurement component, and will need to operate at cryogenic temperatures. Semiconductor systems are typically proposed as a top-level control in these architectures, with low-temperature passive components and intermediary superconducting electronics acting as the direct interface to the lowest-temperature stages. The architectures, therefore, require a low-power superconductor-semiconductor interface, which is not currently available. Here we report a superconducting switch that is capable of translating low-voltage superconducting inputs directly into semiconductor-compatible (above 1,000 mV) outputs at kelvin-scale temperatures (1 K or 4 K). To illustrate the capabilities in interfacing superconductors and semiconductors, we use it to drive a light-emitting diode (LED) in a photonic integrated circuit, generating photons at 1 K from a low-voltage input and detecting them with an on-chip superconducting single-photon detector. We also characterize our device's timing response (less than 300 ps turn-on, 15 ns turn-off), output impedance (greater than 1 MΩ), and energy requirements (0.18 fJ/um^2, 3.24 mV/nW).
△ Less
Submitted 30 September, 2019; v1 submitted 25 March, 2019;
originally announced March 2019.
-
Detecting Dark Matter with Superconducting Nanowires
Authors:
Yonit Hochberg,
Ilya Charaev,
Sae-Woo Nam,
Varun Verma,
Marco Colangelo,
Karl K. Berggren
Abstract:
We propose the use of superconducting nanowires as both target and sensor for direct detection of sub-GeV dark matter. With excellent sensitivity to small energy deposits on electrons, and demonstrated low dark counts, such devices could be used to probe electron recoils from dark matter scattering and absorption processes. We demonstrate the feasibility of this idea using measurements of an exist…
▽ More
We propose the use of superconducting nanowires as both target and sensor for direct detection of sub-GeV dark matter. With excellent sensitivity to small energy deposits on electrons, and demonstrated low dark counts, such devices could be used to probe electron recoils from dark matter scattering and absorption processes. We demonstrate the feasibility of this idea using measurements of an existing fabricated tungsten-silicide nanowire prototype with 0.8 eV energy threshold and 4.3 nanograms with 10 thousand seconds of exposure, which showed no dark counts. The results from this device already place meaningful bounds on dark matter-electron interactions, including the strongest terrestrial bounds on sub-eV dark photon absorption to date. Future expected fabrication on larger scales and with lower thresholds should enable probing new territory in the direct detection landscape, establishing the complementarity of this approach to other existing proposals.
△ Less
Submitted 12 March, 2019;
originally announced March 2019.
-
HetConv: Heterogeneous Kernel-Based Convolutions for Deep CNNs
Authors:
Pravendra Singh,
Vinay Kumar Verma,
Piyush Rai,
Vinay P. Namboodiri
Abstract:
We present a novel deep learning architecture in which the convolution operation leverages heterogeneous kernels. The proposed HetConv (Heterogeneous Kernel-Based Convolution) reduces the computation (FLOPs) and the number of parameters as compared to standard convolution operation while still maintaining representational efficiency. To show the effectiveness of our proposed convolution, we presen…
▽ More
We present a novel deep learning architecture in which the convolution operation leverages heterogeneous kernels. The proposed HetConv (Heterogeneous Kernel-Based Convolution) reduces the computation (FLOPs) and the number of parameters as compared to standard convolution operation while still maintaining representational efficiency. To show the effectiveness of our proposed convolution, we present extensive experimental results on the standard convolutional neural network (CNN) architectures such as VGG \cite{vgg2014very} and ResNet \cite{resnet}. We find that after replacing the standard convolutional filters in these architectures with our proposed HetConv filters, we achieve 3X to 8X FLOPs based improvement in speed while still maintaining (and sometimes improving) the accuracy. We also compare our proposed convolutions with group/depth wise convolutions and show that it achieves more FLOPs reduction with significantly higher accuracy.
△ Less
Submitted 25 March, 2019; v1 submitted 11 March, 2019;
originally announced March 2019.
-
Interpolation Consistency Training for Semi-Supervised Learning
Authors:
Vikas Verma,
Kenji Kawaguchi,
Alex Lamb,
Juho Kannala,
Arno Solin,
Yoshua Bengio,
David Lopez-Paz
Abstract:
We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density reg…
▽ More
We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark datasets. Our theoretical analysis shows that ICT corresponds to a certain type of data-adaptive regularization with unlabeled points which reduces overfitting to labeled points under high confidence values.
△ Less
Submitted 19 October, 2022; v1 submitted 9 March, 2019;
originally announced March 2019.
-
On Adversarial Mixup Resynthesis
Authors:
Christopher Beckham,
Sina Honari,
Vikas Verma,
Alex Lamb,
Farnoosh Ghadiri,
R Devon Hjelm,
Yoshua Bengio,
Christopher Pal
Abstract:
In this paper, we explore new approaches to combining information encoded within the learned representations of auto-encoders. We explore models that are capable of combining the attributes of multiple inputs such that a resynthesised output is trained to fool an adversarial discriminator for real versus synthesised data. Furthermore, we explore the use of such an architecture in the context of se…
▽ More
In this paper, we explore new approaches to combining information encoded within the learned representations of auto-encoders. We explore models that are capable of combining the attributes of multiple inputs such that a resynthesised output is trained to fool an adversarial discriminator for real versus synthesised data. Furthermore, we explore the use of such an architecture in the context of semi-supervised learning, where we learn a mixing function whose objective is to produce interpolations of hidden states, or masked combinations of latent representations that are consistent with a conditioned class label. We show quantitative and qualitative evidence that such a formulation is an interesting avenue of research.
△ Less
Submitted 23 October, 2019; v1 submitted 6 March, 2019;
originally announced March 2019.
-
High-Speed Low-Crosstalk Detection of a $^{171}$Yb$^+$ Qubit using Superconducting Nanowire Single Photon Detectors
Authors:
Stephen Crain,
Clinton Cahall,
Geert Vrijsen,
Emma E. Wollman,
Matthew D. Shaw,
Varun B. Verma,
Sae Woo Nam,
Jungsang Kim
Abstract:
Qubits used in quantum computing tend to suffer from errors, either from the qubit interacting with the environment, or from imperfect control when quantum logic gates are applied. Fault-tolerant construction based on quantum error correcting codes (QECC) can be used to recover from such errors. Effective implementation of QECC requires a high fidelity readout of the ancilla qubits from which the…
▽ More
Qubits used in quantum computing tend to suffer from errors, either from the qubit interacting with the environment, or from imperfect control when quantum logic gates are applied. Fault-tolerant construction based on quantum error correcting codes (QECC) can be used to recover from such errors. Effective implementation of QECC requires a high fidelity readout of the ancilla qubits from which the error syndrome can be determined, without affecting the data qubits in which relevant quantum information is stored for processing. Here, we present a detection scheme for \yb trapped ion qubits, where we use superconducting nanowire single photon detectors and utilize photon time-of-arrival statistics to improve the fidelity and speed. Qubit shuttling allows for creating a separate detection region where an ancilla qubit can be measured without disrupting a data qubit. We achieve an average qubit state detection time of 11$μ$s with a fidelity of $99.931(6)\%$. The error due to the detection crosstalk, defined as the probability that the coherence of the data qubit is lost due to the process of detecting an ancilla qubit, is reduced to $\sim2\times10^{-5}$ by creating a separation of 370$μ$m between them.
△ Less
Submitted 19 May, 2019; v1 submitted 11 February, 2019;
originally announced February 2019.
-
Single-shot quantum memory advantage in the simulation of stochastic processes
Authors:
Farzad Ghafari,
Nora Tischler,
Jayne Thompson,
Mile Gu,
Lynden K. Shalm,
Varun B. Verma,
Sae Woo Nam,
Raj B. Patel,
Howard M. Wiseman,
Geoff J. Pryde
Abstract:
Stochastic processes underlie a vast range of natural and social phenomena. Some processes such as atomic decay feature intrinsic randomness, whereas other complex processes, e.g. traffic congestion, are effectively probabilistic because we cannot track all relevant variables. To simulate a stochastic system's future behaviour, information about its past must be stored and thus memory is a key res…
▽ More
Stochastic processes underlie a vast range of natural and social phenomena. Some processes such as atomic decay feature intrinsic randomness, whereas other complex processes, e.g. traffic congestion, are effectively probabilistic because we cannot track all relevant variables. To simulate a stochastic system's future behaviour, information about its past must be stored and thus memory is a key resource. Quantum information processing promises a memory advantage for stochastic simulation that has been validated in recent proof-of-concept experiments. Yet, in all past works, the memory saving would only become accessible in the limit of a large number of parallel simulations, because the memory registers of individual quantum simulators had the same dimensionality as their classical counterparts. Here, we report the first experimental demonstration that a quantum stochastic simulator can encode the relevant information in fewer dimensions than any classical simulator, thereby achieving a quantum memory advantage even for an individual simulator. Our photonic experiment thus establishes the potential of a new, practical resource saving in the simulation of complex systems.
△ Less
Submitted 11 December, 2018;
originally announced December 2018.
-
Leveraging Filter Correlations for Deep Model Compression
Authors:
Pravendra Singh,
Vinay Kumar Verma,
Piyush Rai,
Vinay P. Namboodiri
Abstract:
We present a filter correlation based model compression approach for deep convolutional neural networks. Our approach iteratively identifies pairs of filters with the largest pairwise correlations and drops one of the filters from each such pair. However, instead of discarding one of the filters from each such pair naïvely, the model is re-optimized to make the filters in these pairs maximally cor…
▽ More
We present a filter correlation based model compression approach for deep convolutional neural networks. Our approach iteratively identifies pairs of filters with the largest pairwise correlations and drops one of the filters from each such pair. However, instead of discarding one of the filters from each such pair naïvely, the model is re-optimized to make the filters in these pairs maximally correlated, so that discarding one of the filters from the pair results in minimal information loss. Moreover, after discarding the filters in each round, we further finetune the model to recover from the potential small loss incurred by the compression. We evaluate our proposed approach using a comprehensive set of experiments and ablation studies. Our compression method yields state-of-the-art FLOPs compression rates on various benchmarks, such as LeNet-5, VGG-16, and ResNet-50,56, while still achieving excellent predictive performance for tasks such as object detection on benchmark datasets.
△ Less
Submitted 15 January, 2020; v1 submitted 26 November, 2018;
originally announced November 2018.
-
Demonstration of Einstein-Podolsky-Rosen Steering Using Hybrid Continuous- and Discrete-Variable Entanglement of Light
Authors:
A. Cavaillès,
H. Le Jeannic,
J. Raskop,
G. Guccione,
D. Markham,
E. Diamanti,
M. D. Shaw,
V. B. Verma,
S. W. Nam,
J. Laurat
Abstract:
Einstein-Podolsky-Rosen steering is known to be a key resource for one-sided device-independent quantum information protocols. Here we demonstrate steering using hybrid entanglement between continuous- and discrete-variable optical qubits. To this end, we report on suitable steering inequalities and detail the implementation and requirements for this demonstration. Steering is experimentally certi…
▽ More
Einstein-Podolsky-Rosen steering is known to be a key resource for one-sided device-independent quantum information protocols. Here we demonstrate steering using hybrid entanglement between continuous- and discrete-variable optical qubits. To this end, we report on suitable steering inequalities and detail the implementation and requirements for this demonstration. Steering is experimentally certified by observing a violation by more than 5 standard deviations. Our results illustrate the potential of optical hybrid entanglement for applications in heterogeneous quantum networks that would interconnect disparate physical platforms and encodings.
△ Less
Submitted 31 October, 2018;
originally announced November 2018.
-
Conclusive experimental demonstration of one-way Einstein-Podolsky-Rosen steering
Authors:
Nora Tischler,
Farzad Ghafari,
Travis J. Baker,
Sergei Slussarenko,
Raj B. Patel,
Morgan M. Weston,
Sabine Wollmann,
Lynden K. Shalm,
Varun B. Verma,
Sae Woo Nam,
H. Chau Nguyen,
Howard M. Wiseman,
Geoff J. Pryde
Abstract:
Einstein-Podolsky-Rosen steering is a quantum phenomenon wherein one party influences, or steers, the state of a distant party's particle beyond what could be achieved with a separable state, by making measurements on one half of an entangled state. This type of quantum nonlocality stands out through its asymmetric setting, and even allows for cases where one party can steer the other, but where t…
▽ More
Einstein-Podolsky-Rosen steering is a quantum phenomenon wherein one party influences, or steers, the state of a distant party's particle beyond what could be achieved with a separable state, by making measurements on one half of an entangled state. This type of quantum nonlocality stands out through its asymmetric setting, and even allows for cases where one party can steer the other, but where the reverse is not true. A series of experiments have demonstrated one-way steering in the past, but all were based on significant limiting assumptions. These consisted either of restrictions on the type of allowed measurements, or of assumptions about the quantum state at hand, by map** to a specific family of states and analysing the ideal target state rather than the real experimental state. Here, we present the first experimental demonstration of one-way steering free of such assumptions. We achieve this using a new sufficient condition for non-steerability, and, although not required by our analysis, using a novel source of extremely high-quality photonic Werner states.
△ Less
Submitted 12 September, 2018; v1 submitted 26 June, 2018;
originally announced June 2018.
-
Modularity Matters: Learning Invariant Relational Reasoning Tasks
Authors:
Jason Jo,
Vikas Verma,
Yoshua Bengio
Abstract:
We focus on two supervised visual reasoning tasks whose labels encode a semantic relational rule between two or more objects in an image: the MNIST Parity task and the colorized Pentomino task. The objects in the images undergo random translation, scaling, rotation and coloring transformations. Thus these tasks involve invariant relational reasoning. We report uneven performance of various deep CN…
▽ More
We focus on two supervised visual reasoning tasks whose labels encode a semantic relational rule between two or more objects in an image: the MNIST Parity task and the colorized Pentomino task. The objects in the images undergo random translation, scaling, rotation and coloring transformations. Thus these tasks involve invariant relational reasoning. We report uneven performance of various deep CNN models on these two tasks. For the MNIST Parity task, we report that the VGG19 model soundly outperforms a family of ResNet models. Moreover, the family of ResNet models exhibits a general sensitivity to random initialization for the MNIST Parity task. For the colorized Pentomino task, now both the VGG19 and ResNet models exhibit sluggish optimization and very poor test generalization, hovering around 30% test error. The CNN we tested all learn hierarchies of fully distributed features and thus encode the distributed representation prior. We are motivated by a hypothesis from cognitive neuroscience which posits that the human visual cortex is modularized, and this allows the visual cortex to learn higher order invariances. To this end, we consider a modularized variant of the ResNet model, referred to as a Residual Mixture Network (ResMixNet) which employs a mixture-of-experts architecture to interleave distributed representations with more specialized, modular representations. We show that very shallow ResMixNets are capable of learning each of the two tasks well, attaining less than 2% and 1% test error on the MNIST Parity and the colorized Pentomino tasks respectively. Most importantly, the ResMixNet models are extremely parameter efficient: generalizing better than various non-modular CNNs that have over 10x the number of parameters. These experimental results support the hypothesis that modularity is a robust prior for learning invariant relational reasoning.
△ Less
Submitted 18 June, 2018;
originally announced June 2018.
-
Manifold Mixup: Better Representations by Interpolating Hidden States
Authors:
Vikas Verma,
Alex Lamb,
Christopher Beckham,
Amir Najafi,
Ioannis Mitliagkas,
Aaron Courville,
David Lopez-Paz,
Yoshua Bengio
Abstract:
Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose Manifold Mixup, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden repr…
▽ More
Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly different test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose Manifold Mixup, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations. Manifold Mixup leverages semantic interpolations as additional training signal, obtaining neural networks with smoother decision boundaries at multiple levels of representation. As a result, neural networks trained with Manifold Mixup learn class-representations with fewer directions of variance. We prove theory on why this flattening happens under ideal conditions, validate it on practical situations, and connect it to previous works on information theory and generalization. In spite of incurring no significant computation and being implemented in a few lines of code, Manifold Mixup improves strong baselines in supervised learning, robustness to single-step adversarial attacks, and test log-likelihood.
△ Less
Submitted 11 May, 2019; v1 submitted 13 June, 2018;
originally announced June 2018.
-
Challenging local realism with human choices
Authors:
The BIG Bell Test Collaboration,
C. Abellán,
A. Acín,
A. Alarcón,
O. Alibart,
C. K. Andersen,
F. Andreoli,
A. Beckert,
F. A. Beduini,
A. Bendersky,
M. Bentivegna,
P. Bierhorst,
D. Burchardt,
A. Cabello,
J. Cariñe,
S. Carrasco,
G. Carvacho,
D. Cavalcanti,
R. Chaves,
J. Cortés-Vega,
A. Cuevas,
A. Delgado,
H. de Riedmatten,
C. Eichler,
P. Farrera
, et al. (83 additional authors not shown)
Abstract:
A Bell test is a randomized trial that compares experimental observations against the philosophical worldview of local realism. A Bell test requires spatially distributed entanglement, fast and high-efficiency detection and unpredictable measurement settings. Although technology can satisfy the first two of these requirements, the use of physical devices to choose settings in a Bell test involves…
▽ More
A Bell test is a randomized trial that compares experimental observations against the philosophical worldview of local realism. A Bell test requires spatially distributed entanglement, fast and high-efficiency detection and unpredictable measurement settings. Although technology can satisfy the first two of these requirements, the use of physical devices to choose settings in a Bell test involves making assumptions about the physics that one aims to test. Bell himself noted this weakness in using physical setting choices and argued that human `free will' could be used rigorously to ensure unpredictability in Bell tests. Here we report a set of local-realism tests using human choices, which avoids assumptions about predictability in physics. We recruited about 100,000 human participants to play an online video game that incentivizes fast, sustained input of unpredictable selections and illustrates Bell-test methodology. The participants generated 97,347,490 binary choices, which were directed via a scalable web platform to 12 laboratories on five continents, where 13 experiments tested local realism using photons, single atoms, atomic ensembles, and superconducting devices. Over a 12-hour period on 30 November 2016, participants worldwide provided a sustained data flow of over 1,000 bits per second to the experiments, which used different human-generated data to choose each measurement setting. The observed correlations strongly contradict local realism and other realistic positions in bipartite and tripartite scenarios. Project outcomes include closing the `freedom-of-choice loophole' (the possibility that the setting choices are influenced by `hidden variables' to correlate with the particle properties), the utilization of video-game methods for rapid collection of human generated randomness, and the use of networking techniques for global participation in experimental science.
△ Less
Submitted 9 November, 2018; v1 submitted 11 May, 2018;
originally announced May 2018.
-
Storage and retrieval of heralded telecommunication-wavelength photons using a solid-state waveguide quantum memory
Authors:
Mohsen Falamarzi Askarani,
Marcel. li Grimau Pugibert,
Thomas Lutz,
Varun B. Verma,
Matthew D. Shaw,
Sae Woo Nam,
Neil Sinclair,
Daniel Oblak,
Wolfgang Tittel
Abstract:
Large-scale quantum networks will employ telecommunication-wavelength photons to exchange quantum information between remote measurement, storage, and processing nodes via fibre-optic channels. Quantum memories compatible with telecommunication-wavelength photons are a key element towards building such a quantum network. Here, we demonstrate the storage and retrieval of heralded 1532 nm-wavelength…
▽ More
Large-scale quantum networks will employ telecommunication-wavelength photons to exchange quantum information between remote measurement, storage, and processing nodes via fibre-optic channels. Quantum memories compatible with telecommunication-wavelength photons are a key element towards building such a quantum network. Here, we demonstrate the storage and retrieval of heralded 1532 nm-wavelength photons using a solid-state waveguide quantum memory. The heralded photons are derived from a photon-pair source that is based on parametric down-conversion, and our quantum memory is based on a 6 GHz-bandwidth atomic frequency comb prepared using an inhomogeneously broadened absorption line of a cryogenically-cooled erbium-doped lithium niobate waveguide. Using persistent spectral hole burning under varying magnetic fields, we determine that the memory is enabled by population transfer into niobium and lithium nuclear spin levels. Despite limited storage time and efficiency, our demonstration represents an important step towards quantum networks that operate in the telecommunication band and the development of on-chip quantum technology using industry-standard crystals.
△ Less
Submitted 16 April, 2018;
originally announced April 2018.
-
Optically addressing single rare-earth ions in a nanophotonic cavity
Authors:
Tian Zhong,
Jonathan M. Kindem,
John G. Bartholomew,
Jake Rochman,
Ioana Craiciu,
Varun Verma,
Sae Woo Nam,
Francesco Marsili,
Matthew D. Shaw,
Andrew D. Beyer,
Andrei Faraon
Abstract:
We demonstrate optical probing of spectrally resolved single Nd rare-earth ions in yttrium orthovanadate. The ions are coupled to a photonic crystal resonator and show strong enhancement of the optical emission rate via the Purcell effect, resulting in near radiatively limited single photon emission. The measured high coupling cooperativity between a single photon and the ion allows for the observ…
▽ More
We demonstrate optical probing of spectrally resolved single Nd rare-earth ions in yttrium orthovanadate. The ions are coupled to a photonic crystal resonator and show strong enhancement of the optical emission rate via the Purcell effect, resulting in near radiatively limited single photon emission. The measured high coupling cooperativity between a single photon and the ion allows for the observation of coherent optical Rabi oscillations. This could enable optically controlled spin qubits, quantum logic gates, and spin-photon interfaces for future quantum networks.
△ Less
Submitted 12 January, 2019; v1 submitted 20 March, 2018;
originally announced March 2018.
-
Generalization in Machine Learning via Analytical Learning Theory
Authors:
Kenji Kawaguchi,
Yoshua Bengio,
Vikas Verma,
Leslie Pack Kaelbling
Abstract:
This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this theory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in…
▽ More
This paper introduces a novel measure-theoretic theory for machine learning that does not require statistical assumptions. Based on this theory, a new regularization method in deep learning is derived and shown to outperform previous methods in CIFAR-10, CIFAR-100, and SVHN. Moreover, the proposed theory provides a theoretical basis for a family of practically successful regularization methods in deep learning. We discuss several consequences of our results on one-shot learning, representation learning, deep learning, and curriculum learning. Unlike statistical learning theory, the proposed learning theory analyzes each problem instance individually via measure theory, rather than a set of problem instances via statistics. As a result, it provides different types of results and insights when compared to statistical learning theory.
△ Less
Submitted 6 March, 2019; v1 submitted 21 February, 2018;
originally announced February 2018.
-
A Generative Approach to Zero-Shot and Few-Shot Action Recognition
Authors:
Ashish Mishra,
Vinay Kumar Verma,
M Shiva Krishna Reddy,
Arulkumar S,
Piyush Rai,
Anurag Mittal
Abstract:
We present a generative framework for zero-shot action recognition where some of the possible action classes do not occur in the training data. Our approach is based on modeling each action class using a probability distribution whose parameters are functions of the attribute vector representing that action class. In particular, we assume that the distribution parameters for any action class in th…
▽ More
We present a generative framework for zero-shot action recognition where some of the possible action classes do not occur in the training data. Our approach is based on modeling each action class using a probability distribution whose parameters are functions of the attribute vector representing that action class. In particular, we assume that the distribution parameters for any action class in the visual space can be expressed as a linear combination of a set of basis vectors where the combination weights are given by the attributes of the action class. These basis vectors can be learned solely using labeled data from the known (i.e., previously seen) action classes, and can then be used to predict the parameters of the probability distributions of unseen action classes. We consider two settings: (1) Inductive setting, where we use only the labeled examples of the seen action classes to predict the unseen action class parameters; and (2) Transductive setting which further leverages unlabeled data from the unseen action classes. Our framework also naturally extends to few-shot action recognition where a few labeled examples from unseen classes are available. Our experiments on benchmark datasets (UCF101, HMDB51 and Olympic) show significant performance improvements as compared to various baselines, in both standard zero-shot (disjoint seen and unseen classes) and generalized zero-shot learning settings.
△ Less
Submitted 27 January, 2018;
originally announced January 2018.
-
Superconducting fluctuations and characteristic time scales in amorphous WSi
Authors:
Xiaofu Zhang,
Adriana E. Lita,
Mariia Sidorova,
Varun B. Verma,
Qiang Wang,
Sae Woo Nam,
Alexej Semenov,
Andreas Schilling
Abstract:
We study magnitudes and temperature dependences of the electron-electron and electron-phonon interaction times which play the dominant role in the formation and relaxation of photon induced hotspot in two dimensional amorphous WSi films. The time constants are obtained through magnetoconductance measurements in perpendicular magnetic field in the superconducting fluctuation regime and through time…
▽ More
We study magnitudes and temperature dependences of the electron-electron and electron-phonon interaction times which play the dominant role in the formation and relaxation of photon induced hotspot in two dimensional amorphous WSi films. The time constants are obtained through magnetoconductance measurements in perpendicular magnetic field in the superconducting fluctuation regime and through time-resolved photoresponse to optical pulses. The excess magnetoconductivity is interpreted in terms of the weak-localization effect and superconducting fluctuations. Aslamazov-Larkin, and Maki-Thompson superconducting fluctuation alone fail to reproduce the magnetic field dependence in the relatively high magnetic field range when the temperature is rather close to Tc because the suppression of the electronic density of states due to the formation of short lifetime Cooper pairs needs to be considered. The time scale τ_i of inelastic scattering is ascribed to a combination of electron-electron (τ_(e-e)) and electron-phonon (τ_(e-ph)) interaction times, and a characteristic electron-fluctuation time (τ_(e-fl)), which makes it possible to extract their magnitudes and temperature dependences from the measured τ_i. The ratio of phonon-electron (τ_(ph-e)) and electron-phonon interaction times is obtained via measurements of the optical photoresponse of WSi microbridges. Relatively large τ_(e-ph)/τ_(ph-e) and τ_(e-ph)/τ_(e-e) ratios ensure that in WSi the photon energy is more efficiently confined in the electron subsystem than in other materials commonly used in the technology of superconducting nanowire single-photon detectors (SNSPDs). We discuss the impact of interaction times on the hotspot dynamics and compare relevant metrics of SNSPDs from different materials.
△ Less
Submitted 13 December, 2017;
originally announced December 2017.
-
Generalized Zero-Shot Learning via Synthesized Examples
Authors:
Vinay Kumar Verma,
Gundeep Arora,
Ashish Mishra,
Piyush Rai
Abstract:
We present a generative framework for generalized zero-shot learning where the training and test classes are not necessarily disjoint. Built upon a variational autoencoder based architecture, consisting of a probabilistic encoder and a probabilistic conditional decoder, our model can generate novel exemplars from seen/unseen classes, given their respective class attributes. These exemplars can sub…
▽ More
We present a generative framework for generalized zero-shot learning where the training and test classes are not necessarily disjoint. Built upon a variational autoencoder based architecture, consisting of a probabilistic encoder and a probabilistic conditional decoder, our model can generate novel exemplars from seen/unseen classes, given their respective class attributes. These exemplars can subsequently be used to train any off-the-shelf classification model. One of the key aspects of our encoder-decoder architecture is a feedback-driven mechanism in which a discriminator (a multivariate regressor) learns to map the generated exemplars to the corresponding class attribute vectors, leading to an improved generator. Our model's ability to generate and leverage examples from unseen classes to train the classification model naturally helps to mitigate the bias towards predicting seen classes in generalized zero-shot learning settings. Through a comprehensive set of experiments, we show that our model outperforms several state-of-the-art methods, on several benchmark datasets, for both standard as well as generalized zero-shot learning.
△ Less
Submitted 11 June, 2018; v1 submitted 11 December, 2017;
originally announced December 2017.
-
DCT-domain Deep Convolutional Neural Networks for Multiple JPEG Compression Classification
Authors:
Vinay Verma,
Nikita Agarwal,
Nitin Khanna
Abstract:
With the rapid advancements in digital imaging systems and networking, low-cost hand-held image capture devices equipped with network connectivity are becoming ubiquitous. This ease of digital image capture and sharing is also accompanied by widespread usage of user-friendly image editing software. Thus, we are in an era where digital images can be very easily used for the massive spread of false…
▽ More
With the rapid advancements in digital imaging systems and networking, low-cost hand-held image capture devices equipped with network connectivity are becoming ubiquitous. This ease of digital image capture and sharing is also accompanied by widespread usage of user-friendly image editing software. Thus, we are in an era where digital images can be very easily used for the massive spread of false information and their integrity need to be seriously questioned. Application of multiple lossy compressions on images is an essential part of any image editing pipeline involving lossy compressed images. This paper aims to address the problem of classifying images based on the number of JPEG compressions they have undergone, by utilizing deep convolutional neural networks in DCT domain. The proposed system incorporates a well designed pre-processing step before feeding the image data to CNN to capture essential characteristics of compression artifacts and make the system image content independent. Detailed experiments are performed to optimize different aspects of the system, such as depth of CNN, number of DCT frequencies, and execution time. Results on the standard UCID dataset demonstrate that the proposed system outperforms existing systems for multiple JPEG compression detection and is capable of classifying more number of re-compression cycles then existing systems.
△ Less
Submitted 6 December, 2017;
originally announced December 2017.
-
Zero-Shot Learning via Class-Conditioned Deep Generative Models
Authors:
Wenlin Wang,
Yunchen Pu,
Vinay Kumar Verma,
Kai Fan,
Yizhe Zhang,
Changyou Chen,
Piyush Rai,
Lawrence Carin
Abstract:
We present a deep generative model for learning to predict classes not seen at training time. Unlike most existing methods for this problem, that represent each class as a point (via a semantic embedding), we represent each seen/unseen class using a class-specific latent-space distribution, conditioned on class attributes. We use these latent-space distributions as a prior for a supervised variati…
▽ More
We present a deep generative model for learning to predict classes not seen at training time. Unlike most existing methods for this problem, that represent each class as a point (via a semantic embedding), we represent each seen/unseen class using a class-specific latent-space distribution, conditioned on class attributes. We use these latent-space distributions as a prior for a supervised variational autoencoder (VAE), which also facilitates learning highly discriminative feature representations for the inputs. The entire framework is learned end-to-end using only the seen-class training data. The model infers corresponding attributes of a test image by maximizing the VAE lower bound; the inferred attributes may be linked to labels not seen when training. We further extend our model to a (1) semi-supervised/transductive setting by leveraging unlabeled unseen-class data via an unsupervised learning module, and (2) few-shot learning where we also have a small number of labeled inputs from the unseen classes. We compare our model with several state-of-the-art methods through a comprehensive set of experiments on a variety of benchmark data sets.
△ Less
Submitted 19 November, 2017; v1 submitted 15 November, 2017;
originally announced November 2017.
-
Direct observation of nanofabrication influence on the optical properties of single self-assembled InAs/GaAs quantum dots
Authors:
** Liu,
Kumarasiri Konthasinghe,
Marcelo Davanco,
John Lawall,
Vikas Anant,
Varun Verma,
Richard Mirin,
Sae Woo Nam,
** Dong Song,
Ben Ma,
Ze Sheng Chen,
Hai Qiao Ni,
Zhi Chuan Niu,
Kartik Srinivasan
Abstract:
Single self-assembled InAs/GaAs quantum dots are a promising solid-state quantum technology, with which vacuum Rabi splitting, single-photon-level nonlinearities, and bright, pure, and indistinguishable single-photon generation having been demonstrated. For such achievements, nanofabrication is used to create structures in which the quantum dot preferentially interacts with strongly-confined optic…
▽ More
Single self-assembled InAs/GaAs quantum dots are a promising solid-state quantum technology, with which vacuum Rabi splitting, single-photon-level nonlinearities, and bright, pure, and indistinguishable single-photon generation having been demonstrated. For such achievements, nanofabrication is used to create structures in which the quantum dot preferentially interacts with strongly-confined optical modes. An open question is the extent to which such nanofabrication may also have an adverse influence, through the creation of traps and surface states that could induce blinking, spectral diffusion, and dephasing. Here, we use photoluminescence imaging to locate the positions of single InAs/GaAs quantum dots with respect to alignment marks with < 5 nm uncertainty, allowing us to measure their behavior before and after fabrication. We track the quantum dot emission linewidth and photon statistics as a function of distance from an etched surface, and find that the linewidth is significantly broadened (up to several GHz) for etched surfaces within a couple hundred nanometers of the quantum dot. However, we do not observe appreciable reduction of the quantum dot radiative efficiency due to blinking. We also show that atomic layer deposition can stabilize spectral diffusion of the quantum dot emission, and partially recover its linewidth.
△ Less
Submitted 26 October, 2017;
originally announced October 2017.
-
Residual Connections Encourage Iterative Inference
Authors:
Stanisław Jastrzębski,
Devansh Arpit,
Nicolas Ballas,
Vikas Verma,
Tong Che,
Yoshua Bengio
Abstract:
Residual networks (Resnets) have become a prominent architecture in deep learning. However, a comprehensive understanding of Resnets is still a topic of ongoing research.
A recent view argues that Resnets perform iterative refinement of features. We attempt to further expose properties of this aspect. To this end, we study Resnets both analytically and empirically. We formalize the notion of ite…
▽ More
Residual networks (Resnets) have become a prominent architecture in deep learning. However, a comprehensive understanding of Resnets is still a topic of ongoing research.
A recent view argues that Resnets perform iterative refinement of features. We attempt to further expose properties of this aspect. To this end, we study Resnets both analytically and empirically. We formalize the notion of iterative refinement in Resnets by showing that residual connections naturally encourage features of residual blocks to move along the negative gradient of loss as we go from one block to the next. In addition, our empirical analysis suggests that Resnets are able to perform both representation learning and iterative refinement. In general, a Resnet block tends to concentrate representation learning behavior in the first few layers while higher layers perform iterative refinement of features. Finally we observe that sharing residual layers naively leads to representation explosion and counterintuitively, overfitting, and we show that simple existing strategies can help alleviating this problem.
△ Less
Submitted 8 March, 2018; v1 submitted 12 October, 2017;
originally announced October 2017.