Search | arXiv e-print repository

Scalable Optimization in the Modular Norm

Authors: Tim Large, Yang Liu, Minyoung Huh, Hyo** Bahng, Phillip Isola, Jeremy Bernstein

Abstract: To improve performance in contemporary deep learning, one is interested in scaling up the neural network in terms of both the number and the size of the layers. When ram** up the width of a single layer, graceful scaling of training has been linked to the need to normalize the weights and their updates in the "natural norm" particular to that layer. In this paper, we significantly generalize thi… ▽ More To improve performance in contemporary deep learning, one is interested in scaling up the neural network in terms of both the number and the size of the layers. When ram** up the width of a single layer, graceful scaling of training has been linked to the need to normalize the weights and their updates in the "natural norm" particular to that layer. In this paper, we significantly generalize this idea by defining the modular norm, which is the natural norm on the full weight space of any neural network architecture. The modular norm is defined recursively in tandem with the network architecture itself. We show that the modular norm has several promising applications. On the practical side, the modular norm can be used to normalize the updates of any base optimizer so that the learning rate becomes transferable across width and depth. This means that the user does not need to compute optimizer-specific scale factors in order to scale training. On the theoretical side, we show that for any neural network built from "well-behaved" atomic modules, the gradient of the network is Lipschitz-continuous in the modular norm, with the Lipschitz constant admitting a simple recursive formula. This characterization opens the door to porting standard ideas in optimization theory over to deep learning. We have created a Python package called Modula that automatically normalizes weight updates in the modular norm of the architecture. The package is available via "pip install modula" with source code at https://github.com/jxbz/modula. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2203.17274 [pdf, other]

Exploring Visual Prompts for Adapting Large-Scale Models

Authors: Hyo** Bahng, Ali Jahanian, Swami Sankaranarayanan, Phillip Isola

Abstract: We investigate the efficacy of visual prompting to adapt large-scale models in vision. Following the recent approach from prompt tuning and adversarial reprogramming, we learn a single image perturbation such that a frozen model prompted with this perturbation performs a new task. Through comprehensive experiments, we demonstrate that visual prompting is particularly effective for CLIP and robust… ▽ More We investigate the efficacy of visual prompting to adapt large-scale models in vision. Following the recent approach from prompt tuning and adversarial reprogramming, we learn a single image perturbation such that a frozen model prompted with this perturbation performs a new task. Through comprehensive experiments, we demonstrate that visual prompting is particularly effective for CLIP and robust to distribution shift, achieving performance competitive with standard linear probes. We further analyze properties of the downstream dataset, prompt design, and output transformation in regard to adaptation performance. The surprising effectiveness of visual prompting provides a new perspective on adapting pre-trained models in vision. Code is available at http://hjbahng.github.io/visual_prompting . △ Less

Submitted 3 June, 2022; v1 submitted 31 March, 2022; originally announced March 2022.

Comments: 16 pages, 10 figures

arXiv:2102.13087 [pdf, other]

Confining light in all-dielectric anisotropic metamaterial particles for nano-scale nonlinear optics

Authors: Saman Jahani, Joong Hwan Bahng, Arkadev Roy, Nicholas Kotov, Alireza Marandi

Abstract: High-index dielectrics can confine light into nano-scale leading to enhanced nonlinear response. However, increased momentum in these media can deteriorate the overlap between different harmonics which hinders efficient nonlinear interaction in wavelength-scale resonators in the absence of momentum matching. Here, we propose an alternative approach for light confinement in anisotropic particles. T… ▽ More High-index dielectrics can confine light into nano-scale leading to enhanced nonlinear response. However, increased momentum in these media can deteriorate the overlap between different harmonics which hinders efficient nonlinear interaction in wavelength-scale resonators in the absence of momentum matching. Here, we propose an alternative approach for light confinement in anisotropic particles. The extra degree of freedom in anisotropic media allows us to control the evanescent waves near the center and the radial momentum away from the center, independently. This can lead to a strong light confinement as well as an excellent field overlap between different harmonics which is ideal for nonlinear wavelength conversion. Controlling the evanescent fields can also help to surpass the constrains on the radiation bandwidth of isotropic dielectric antennas. This can improve the light coupling into these particles, which is crucial for nano-scale nonlinear optics. We estimate the second-harmonic generation efficiency as well as optical parametric oscillation threshold in these particles to show the strong nonlinear response in these particles even away from the center of resonances. Our approach is promising to be realized experimentally and can be used for many applications, such as large-scale parallel sensing and computing. △ Less

Submitted 25 February, 2021; originally announced February 2021.

arXiv:2008.08746 [pdf]

doi 10.1021/acsnano.0c07127

Mie resonance engineering in meta-shell supraparticles for nanoscale nonlinear optics

Authors: Joong Hwan Bahng, Saman Jahani, Douglas Montjoy, Timothy Yao, Nikolas Kotov, Alireza Marandi

Abstract: Supraparticles are coordinated assemblies of discrete nanoscale building blocks into complex and hierarchical colloidal superstructures. Holistic optical responses in such assemblies are not observed in an individual building block or in their bulk counterparts. Furthermore, subwavelength dimensions of the unit building blocks enable engraving optical metamaterials within the supraparticle, which… ▽ More Supraparticles are coordinated assemblies of discrete nanoscale building blocks into complex and hierarchical colloidal superstructures. Holistic optical responses in such assemblies are not observed in an individual building block or in their bulk counterparts. Furthermore, subwavelength dimensions of the unit building blocks enable engraving optical metamaterials within the supraparticle, which thus far has been beyond the current pool of colloidal engineering. This can lead to effective optical features in a colloidal platform with unprecedented ability to tune the electromagnetic responses of these particles. Here, we introduce and demonstrate the nanophotonics of meta-shell supraparticle (MSP), an all dielectric colloidal superstructure having an optical nonlinear metamaterial shell conformed onto a spherical core. We show that the metamaterial shell facilitates engineering the Mie resonances in the MSP that enable significant enhancement of the second harmonic generation (SHG). We show several orders of magnitude enhancement of second-harmonic generation in an MSP compared to its building blocks. Furthermore, we show an absolute conversion efficiency as high as 10^-7 far from the damage threshold, setting a new benchmark for SHG with low-index colloids. The MSP provides pragmatic solutions for instantaneous wavelength conversions with colloidal platforms that are suitable for chemical and biological applications. Their engineerability and scalability promise a fertile ground for nonlinear nanophotonics in the colloidal platforms with structural and material diversity. △ Less

Submitted 19 August, 2020; originally announced August 2020.

arXiv:1912.03085 [pdf, other]

Exploring Unlabeled Faces for Novel Attribute Discovery

Authors: Hyo** Bahng, Sunghyo Chung, Seungjoo Yoo, Jaegul Choo

Abstract: Despite remarkable success in unpaired image-to-image translation, existing systems still require a large amount of labeled images. This is a bottleneck for their real-world applications; in practice, a model trained on labeled CelebA dataset does not work well for test images from a different distribution -- greatly limiting their application to unlabeled images of a much larger quantity. In this… ▽ More Despite remarkable success in unpaired image-to-image translation, existing systems still require a large amount of labeled images. This is a bottleneck for their real-world applications; in practice, a model trained on labeled CelebA dataset does not work well for test images from a different distribution -- greatly limiting their application to unlabeled images of a much larger quantity. In this paper, we attempt to alleviate this necessity for labeled data in the facial image translation domain. We aim to explore the degree to which you can discover novel attributes from unlabeled faces and perform high-quality translation. To this end, we use prior knowledge about the visual world as guidance to discover novel attributes and transfer them via a novel normalization method. Experiments show that our method trained on unlabeled data produces high-quality translations, preserves identity, and be perceptually realistic as good as, or better than, state-of-the-art methods trained on labeled data. △ Less

Submitted 6 December, 2019; originally announced December 2019.

Comments: 10 pages, 6 figures

arXiv:1911.13181 [pdf, other]

doi 10.1145/3340531.3411940

ST-GRAT: A Novel Spatio-temporal Graph Attention Network for Accurately Forecasting Dynamically Changing Road Speed

Authors: Cheonbok Park, Chunggi Lee, Hyo** Bahng, Yunwon Tae, Kihwan Kim, Seungmin **, Sungahn Ko, Jaegul Choo

Abstract: Predicting road traffic speed is a challenging task due to different types of roads, abrupt speed change and spatial dependencies between roads; it requires the modeling of dynamically changing spatial dependencies among roads and temporal patterns over long input sequences. This paper proposes a novel spatio-temporal graph attention (ST-GRAT) that effectively captures the spatio-temporal dynamics… ▽ More Predicting road traffic speed is a challenging task due to different types of roads, abrupt speed change and spatial dependencies between roads; it requires the modeling of dynamically changing spatial dependencies among roads and temporal patterns over long input sequences. This paper proposes a novel spatio-temporal graph attention (ST-GRAT) that effectively captures the spatio-temporal dynamics in road networks. The novel aspects of our approach mainly include spatial attention, temporal attention, and spatial sentinel vectors. The spatial attention takes the graph structure information (e.g., distance between roads) and dynamically adjusts spatial correlation based on road states. The temporal attention is responsible for capturing traffic speed changes, and the sentinel vectors allow the model to retrieve new features from spatially correlated nodes or preserve existing features. The experimental results show that ST-GRAT outperforms existing models, especially in difficult conditions where traffic speeds rapidly change (e.g., rush hours). We additionally provide a qualitative study to analyze when and where ST-GRAT tended to make accurate predictions during rush-hour times. △ Less

Submitted 20 October, 2020; v1 submitted 29 November, 2019; originally announced November 2019.

Comments: to be published in CIKM-2020

arXiv:1910.02806 [pdf, other]

Learning De-biased Representations with Biased Representations

Authors: Hyo** Bahng, Sanghyuk Chun, Sangdoo Yun, Jaegul Choo, Seong Joon Oh

Abstract: Many machine learning algorithms are trained and evaluated by splitting data from a single source into training and test sets. While such focus on in-distribution learning scenarios has led to interesting advancement, it has not been able to tell if models are relying on dataset biases as shortcuts for successful prediction (e.g., using snow cues for recognising snowmobiles), resulting in biased m… ▽ More Many machine learning algorithms are trained and evaluated by splitting data from a single source into training and test sets. While such focus on in-distribution learning scenarios has led to interesting advancement, it has not been able to tell if models are relying on dataset biases as shortcuts for successful prediction (e.g., using snow cues for recognising snowmobiles), resulting in biased models that fail to generalise when the bias shifts to a different class. The cross-bias generalisation problem has been addressed by de-biasing training data through augmentation or re-sampling, which are often prohibitive due to the data collection cost (e.g., collecting images of a snowmobile on a desert) and the difficulty of quantifying or expressing biases in the first place. In this work, we propose a novel framework to train a de-biased representation by encouraging it to be different from a set of representations that are biased by design. This tactic is feasible in many scenarios where it is much easier to define a set of biased representations than to define and quantify bias. We demonstrate the efficacy of our method across a variety of synthetic and real-world biases; our experiments show that the method discourages models from taking bias shortcuts, resulting in improved generalisation. Source code is available at https://github.com/clovaai/rebias. △ Less

Submitted 30 June, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

Comments: Accepted to ICML 2020. Code available at https://github.com/clovaai/rebias

arXiv:1906.11888 [pdf, other]

Coloring With Limited Data: Few-Shot Colorization via Memory-Augmented Networks

Authors: Seungjoo Yoo, Hyo** Bahng, Sunghyo Chung, Junsoo Lee, Jaehyuk Chang, Jaegul Choo

Abstract: Despite recent advancements in deep learning-based automatic colorization, they are still limited when it comes to few-shot learning. Existing models require a significant amount of training data. To tackle this issue, we present a novel memory-augmented colorization model MemoPainter that can produce high-quality colorization with limited data. In particular, our model is able to capture rare ins… ▽ More Despite recent advancements in deep learning-based automatic colorization, they are still limited when it comes to few-shot learning. Existing models require a significant amount of training data. To tackle this issue, we present a novel memory-augmented colorization model MemoPainter that can produce high-quality colorization with limited data. In particular, our model is able to capture rare instances and successfully colorize them. We also propose a novel threshold triplet loss that enables unsupervised training of memory networks without the need of class labels. Experiments show that our model has superior quality in both few-shot and one-shot colorization tasks. △ Less

Submitted 9 June, 2019; originally announced June 2019.

Comments: CVPR 2019

arXiv:1805.02481 [pdf, other]

MEGAN: Mixture of Experts of Generative Adversarial Networks for Multimodal Image Generation

Authors: David Keetae Park, Seungjoo Yoo, Hyo** Bahng, Jaegul Choo, Noseong Park

Abstract: Recently, generative adversarial networks (GANs) have shown promising performance in generating realistic images. However, they often struggle in learning complex underlying modalities in a given dataset, resulting in poor-quality generated images. To mitigate this problem, we present a novel approach called mixture of experts GAN (MEGAN), an ensemble approach of multiple generator networks. Each… ▽ More Recently, generative adversarial networks (GANs) have shown promising performance in generating realistic images. However, they often struggle in learning complex underlying modalities in a given dataset, resulting in poor-quality generated images. To mitigate this problem, we present a novel approach called mixture of experts GAN (MEGAN), an ensemble approach of multiple generator networks. Each generator network in MEGAN specializes in generating images with a particular subset of modalities, e.g., an image class. Instead of incorporating a separate step of handcrafted clustering of multiple modalities, our proposed model is trained through an end-to-end learning of multiple generators via gating networks, which is responsible for choosing the appropriate generator network for a given condition. We adopt the categorical reparameterization trick for a categorical decision to be made in selecting a generator while maintaining the flow of the gradients. We demonstrate that individual generators learn different and salient subparts of the data and achieve a multiscale structural similarity (MS-SSIM) score of 0.2470 for CelebA and a competitive unsupervised inception score of 8.33 in CIFAR-10. △ Less

Submitted 8 May, 2018; v1 submitted 7 May, 2018; originally announced May 2018.

Comments: 27th International Joint Conference on Artificial Intelligence (IJCAI 2018)

arXiv:1804.04128 [pdf, other]

Coloring with Words: Guiding Image Colorization Through Text-based Palette Generation

Authors: Hyo** Bahng, Seungjoo Yoo, Wonwoong Cho, David K. Park, Ziming Wu, Xiaojuan Ma, Jaegul Choo

Abstract: This paper proposes a novel approach to generate multiple color palettes that reflect the semantics of input text and then colorize a given grayscale image according to the generated color palette. In contrast to existing approaches, our model can understand rich text, whether it is a single word, a phrase, or a sentence, and generate multiple possible palettes from it. For this task, we introduce… ▽ More This paper proposes a novel approach to generate multiple color palettes that reflect the semantics of input text and then colorize a given grayscale image according to the generated color palette. In contrast to existing approaches, our model can understand rich text, whether it is a single word, a phrase, or a sentence, and generate multiple possible palettes from it. For this task, we introduce our manually curated dataset called Palette-and-Text (PAT). Our proposed model called Text2Colors consists of two conditional generative adversarial networks: the text-to-palette generation networks and the palette-based colorization networks. The former captures the semantics of the text input and produce relevant color palettes. The latter colorizes a grayscale image using the generated color palette. Our evaluation results show that people preferred our generated palettes over ground truth palettes and that our model can effectively reflect the given palette when colorizing an image. △ Less

Submitted 7 August, 2018; v1 submitted 11 April, 2018; originally announced April 2018.

Comments: 25 pages, 22 figures

Journal ref: ECCV 2018

Showing 1–10 of 10 results for author: Bahng, H