-
Towards a Better Evaluation of Out-of-Domain Generalization
Authors:
Duhun Hwang,
Suhyun Kang,
Moonjung Eo,
Jimyeong Kim,
Wonjong Rhee
Abstract:
The objective of Domain Generalization (DG) is to devise algorithms and models capable of achieving high performance on previously unseen test distributions. In the pursuit of this objective, average measure has been employed as the prevalent measure for evaluating models and comparing algorithms in the existing DG studies. Despite its significance, a comprehensive exploration of the average measu…
▽ More
The objective of Domain Generalization (DG) is to devise algorithms and models capable of achieving high performance on previously unseen test distributions. In the pursuit of this objective, average measure has been employed as the prevalent measure for evaluating models and comparing algorithms in the existing DG studies. Despite its significance, a comprehensive exploration of the average measure has been lacking and its suitability in approximating the true domain generalization performance has been questionable. In this study, we carefully investigate the limitations inherent in the average measure and propose worst+gap measure as a robust alternative. We establish theoretical grounds of the proposed measure by deriving two theorems starting from two different assumptions. We conduct extensive experimental investigations to compare the proposed worst+gap measure with the conventional average measure. Given the indispensable need to access the true DG performance for studying measures, we modify five existing datasets to come up with SR-CMNIST, C-Cats&Dogs, L-CIFAR10, PACS-corrupted, and VLCS-corrupted datasets. The experiment results unveil an inferior performance of the average measure in approximating the true DG performance and confirm the robustness of the theoretically supported worst+gap measure.
△ Less
Submitted 2 June, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
Unveiling Key Aspects of Fine-Tuning in Sentence Embeddings: A Representation Rank Analysis
Authors:
Euna Jung,
Jaeill Kim,
Jungmin Ko,
**woo Park,
Wonjong Rhee
Abstract:
The latest advancements in unsupervised learning of sentence embeddings predominantly involve employing contrastive learning-based (CL-based) fine-tuning over pre-trained language models. In this study, we analyze the latest sentence embedding methods by adopting representation rank as the primary tool of analysis. We first define Phase 1 and Phase 2 of fine-tuning based on when representation ran…
▽ More
The latest advancements in unsupervised learning of sentence embeddings predominantly involve employing contrastive learning-based (CL-based) fine-tuning over pre-trained language models. In this study, we analyze the latest sentence embedding methods by adopting representation rank as the primary tool of analysis. We first define Phase 1 and Phase 2 of fine-tuning based on when representation rank peaks. Utilizing these phases, we conduct a thorough analysis and obtain essential findings across key aspects, including alignment and uniformity, linguistic abilities, and correlation between performance and rank. For instance, we find that the dynamics of the key aspects can undergo significant changes as fine-tuning transitions from Phase 1 to Phase 2. Based on these findings, we experiment with a rank reduction (RR) strategy that facilitates rapid and stable fine-tuning of the latest CL-based methods. Through empirical investigations, we showcase the efficacy of RR in enhancing the performance and stability of five state-of-the-art sentence embedding methods.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
Authors:
Wonkyun Kim,
Changin Choi,
Wonseok Lee,
Wonjong Rhee
Abstract:
Stimulated by the sophisticated reasoning capabilities of recent Large Language Models (LLMs), a variety of strategies for bridging video modality have been devised. A prominent strategy involves Video Language Models (VideoLMs), which train a learnable interface with video data to connect advanced vision encoders with LLMs. Recently, an alternative strategy has surfaced, employing readily availab…
▽ More
Stimulated by the sophisticated reasoning capabilities of recent Large Language Models (LLMs), a variety of strategies for bridging video modality have been devised. A prominent strategy involves Video Language Models (VideoLMs), which train a learnable interface with video data to connect advanced vision encoders with LLMs. Recently, an alternative strategy has surfaced, employing readily available foundation models, such as VideoLMs and LLMs, across multiple stages for modality bridging. In this study, we introduce a simple yet novel strategy where only a single Vision Language Model (VLM) is utilized. Our starting point is the plain insight that a video comprises a series of images, or frames, interwoven with temporal information. The essence of video comprehension lies in adeptly managing the temporal aspects along with the spatial details of each frame. Initially, we transform a video into a single composite image by arranging multiple frames in a grid layout. The resulting single image is termed as an image grid. This format, while maintaining the appearance of a solitary image, effectively retains temporal information within the grid structure. Therefore, the image grid approach enables direct application of a single high-performance VLM without necessitating any video-data training. Our extensive experimental analysis across ten zero-shot video question answering benchmarks, including five open-ended and five multiple-choice benchmarks, reveals that the proposed Image Grid Vision Language Model (IG-VLM) surpasses the existing methods in nine out of ten benchmarks.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Improving Forward Compatibility in Class Incremental Learning by Increasing Representation Rank and Feature Richness
Authors:
Jaeill Kim,
Wonseok Lee,
Moonjung Eo,
Wonjong Rhee
Abstract:
Class Incremental Learning (CIL) constitutes a pivotal subfield within continual learning, aimed at enabling models to progressively learn new classification tasks while retaining knowledge obtained from prior tasks. Although previous studies have predominantly focused on backward compatible approaches to mitigate catastrophic forgetting, recent investigations have introduced forward compatible me…
▽ More
Class Incremental Learning (CIL) constitutes a pivotal subfield within continual learning, aimed at enabling models to progressively learn new classification tasks while retaining knowledge obtained from prior tasks. Although previous studies have predominantly focused on backward compatible approaches to mitigate catastrophic forgetting, recent investigations have introduced forward compatible methods to enhance performance on novel tasks and complement existing backward compatible methods. In this study, we introduce an effective-Rank based Feature Richness enhancement (RFR) method, designed for improving forward compatibility. Specifically, this method increases the effective rank of representations during the base session, thereby facilitating the incorporation of more informative features pertinent to unseen novel tasks. Consequently, RFR achieves dual objectives in backward and forward compatibility: minimizing feature extractor modifications and enhancing novel task performance, respectively. To validate the efficacy of our approach, we establish a theoretical connection between effective rank and the Shannon entropy of representations. Subsequently, we conduct comprehensive experiments by integrating RFR into eleven well-known CIL methods. Our results demonstrate the effectiveness of our approach in enhancing novel-task performance while mitigating catastrophic forgetting. Furthermore, our method notably improves the average incremental accuracy across all eleven cases examined.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
Authors:
Jimyeong Kim,
Jungwon Park,
Wonjong Rhee
Abstract:
In text-to-image personalization, a timely and crucial challenge is the tendency of generated images overfitting to the biases present in the reference images. We initiate our study with a comprehensive categorization of the biases into background, nearby-object, tied-object, substance (in style re-contextualization), and pose biases. These biases manifest in the generated images due to their enta…
▽ More
In text-to-image personalization, a timely and crucial challenge is the tendency of generated images overfitting to the biases present in the reference images. We initiate our study with a comprehensive categorization of the biases into background, nearby-object, tied-object, substance (in style re-contextualization), and pose biases. These biases manifest in the generated images due to their entanglement into the subject embedding. This undesired embedding entanglement not only results in the reflection of biases from the reference images into the generated images but also notably diminishes the alignment of the generated images with the given generation prompt. To address this challenge, we propose SID~(Selectively Informative Description), a text description strategy that deviates from the prevalent approach of only characterizing the subject's class identification. SID is generated utilizing multimodal GPT-4 and can be seamlessly integrated into optimization-based models. We present comprehensive experimental results along with analyses of cross-attention maps, subject-alignment, non-subject-disentanglement, and text-alignment.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization
Authors:
Yeji Song,
Jimyeong Kim,
Wonhark Park,
Wonsik Shin,
Wonjong Rhee,
Nojun Kwak
Abstract:
In a surge of text-to-image (T2I) models and their customization methods that generate new images of a user-provided subject, current works focus on alleviating the costs incurred by a lengthy per-subject optimization. These zero-shot customization methods encode the image of a specified subject into a visual embedding which is then utilized alongside the textual embedding for diffusion guidance.…
▽ More
In a surge of text-to-image (T2I) models and their customization methods that generate new images of a user-provided subject, current works focus on alleviating the costs incurred by a lengthy per-subject optimization. These zero-shot customization methods encode the image of a specified subject into a visual embedding which is then utilized alongside the textual embedding for diffusion guidance. The visual embedding incorporates intrinsic information about the subject, while the textual embedding provides a new, transient context. However, the existing methods often 1) are significantly affected by the input images, eg., generating images with the same pose, and 2) exhibit deterioration in the subject's identity. We first pin down the problem and show that redundant pose information in the visual embedding interferes with the textual embedding containing the desired pose information. To address this issue, we propose orthogonal visual embedding which effectively harmonizes with the given textual embedding. We also adopt the visual-only embedding and inject the subject's clear features utilizing a self-attention swap. Our results demonstrate the effectiveness and robustness of our method, which offers highly flexible zero-shot generation while effectively maintaining the subject's identity.
△ Less
Submitted 21 March, 2024;
originally announced March 2024.
-
On-Off Pattern Encoding and Path-Count Encoding as Deep Neural Network Representations
Authors:
Euna Jung,
Jaekeol Choi,
EungGu Yun,
Wonjong Rhee
Abstract:
Understanding the encoded representation of Deep Neural Networks (DNNs) has been a fundamental yet challenging objective. In this work, we focus on two possible directions for analyzing representations of DNNs by studying simple image classification tasks. Specifically, we consider \textit{On-Off pattern} and \textit{PathCount} for investigating how information is stored in deep representations. O…
▽ More
Understanding the encoded representation of Deep Neural Networks (DNNs) has been a fundamental yet challenging objective. In this work, we focus on two possible directions for analyzing representations of DNNs by studying simple image classification tasks. Specifically, we consider \textit{On-Off pattern} and \textit{PathCount} for investigating how information is stored in deep representations. On-off pattern of a neuron is decided as `on' or `off' depending on whether the neuron's activation after ReLU is non-zero or zero. PathCount is the number of paths that transmit non-zero energy from the input to a neuron. We investigate how neurons in the network encodes information by replacing each layer's activation with On-Off pattern or PathCount and evaluating its effect on classification performance. We also examine correlation between representation and PathCount. Finally, we show a possible way to improve an existing DNN interpretation method, Class Activation Map (CAM), by directly utilizing On-Off or PathCount.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Enhancing Contrastive Learning with Efficient Combinatorial Positive Pairing
Authors:
Jaeill Kim,
Duhun Hwang,
Eunjung Lee,
Jangwon Suh,
Jimyeong Kim,
Wonjong Rhee
Abstract:
In the past few years, contrastive learning has played a central role for the success of visual unsupervised representation learning. Around the same time, high-performance non-contrastive learning methods have been developed as well. While most of the works utilize only two views, we carefully review the existing multi-view methods and propose a general multi-view strategy that can improve learni…
▽ More
In the past few years, contrastive learning has played a central role for the success of visual unsupervised representation learning. Around the same time, high-performance non-contrastive learning methods have been developed as well. While most of the works utilize only two views, we carefully review the existing multi-view methods and propose a general multi-view strategy that can improve learning speed and performance of any contrastive or non-contrastive method. We first analyze CMC's full-graph paradigm and empirically show that the learning speed of $K$-views can be increased by $_{K}\mathrm{C}_{2}$ times for small learning rate and early training. Then, we upgrade CMC's full-graph by mixing views created by a crop-only augmentation, adopting small-size views as in SwAV multi-crop, and modifying the negative sampling. The resulting multi-view strategy is called ECPP (Efficient Combinatorial Positive Pairing). We investigate the effectiveness of ECPP by applying it to SimCLR and assessing the linear evaluation performance for CIFAR-10 and ImageNet-100. For each benchmark, we achieve a state-of-the-art performance. In case of ImageNet-100, ECPP boosted SimCLR outperforms supervised learning.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
A Differentiable Framework for End-to-End Learning of Hybrid Structured Compression
Authors:
Moonjung Eo,
Suhyun Kang,
Wonjong Rhee
Abstract:
Filter pruning and low-rank decomposition are two of the foundational techniques for structured compression. Although recent efforts have explored hybrid approaches aiming to integrate the advantages of both techniques, their performance gains have been modest at best. In this study, we develop a \textit{Differentiable Framework~(DF)} that can express filter selection, rank selection, and budget c…
▽ More
Filter pruning and low-rank decomposition are two of the foundational techniques for structured compression. Although recent efforts have explored hybrid approaches aiming to integrate the advantages of both techniques, their performance gains have been modest at best. In this study, we develop a \textit{Differentiable Framework~(DF)} that can express filter selection, rank selection, and budget constraint into a single analytical formulation. Within the framework, we introduce DML-S for filter selection, integrating scheduling into existing mask learning techniques. Additionally, we present DTL-S for rank selection, utilizing a singular value thresholding operator. The framework with DML-S and DTL-S offers a hybrid structured compression methodology that facilitates end-to-end learning through gradient-base optimization. Experimental results demonstrate the efficacy of DF, surpassing state-of-the-art structured compression methods. Our work establishes a robust and versatile avenue for advancing structured compression techniques.
△ Less
Submitted 20 September, 2023;
originally announced September 2023.
-
Towards a Rigorous Analysis of Mutual Information in Contrastive Learning
Authors:
Kyungeun Lee,
Jaeill Kim,
Suhyun Kang,
Wonjong Rhee
Abstract:
Contrastive learning has emerged as a cornerstone in recent achievements of unsupervised representation learning. Its primary paradigm involves an instance discrimination task with a mutual information loss. The loss is known as InfoNCE and it has yielded vital insights into contrastive learning through the lens of mutual information analysis. However, the estimation of mutual information can prov…
▽ More
Contrastive learning has emerged as a cornerstone in recent achievements of unsupervised representation learning. Its primary paradigm involves an instance discrimination task with a mutual information loss. The loss is known as InfoNCE and it has yielded vital insights into contrastive learning through the lens of mutual information analysis. However, the estimation of mutual information can prove challenging, creating a gap between the elegance of its mathematical foundation and the complexity of its estimation. As a result, drawing rigorous insights or conclusions from mutual information analysis becomes intricate. In this study, we introduce three novel methods and a few related theorems, aimed at enhancing the rigor of mutual information analysis. Despite their simplicity, these methods can carry substantial utility. Leveraging these approaches, we reassess three instances of contrastive learning analysis, illustrating their capacity to facilitate deeper comprehension or to rectify pre-existing misconceptions. Specifically, we investigate small batch size, mutual information as a measure, and the InfoMin principle.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Meta-Learning with a Geometry-Adaptive Preconditioner
Authors:
Suhyun Kang,
Duhun Hwang,
Moonjung Eo,
Taesup Kim,
Wonjong Rhee
Abstract:
Model-agnostic meta-learning (MAML) is one of the most successful meta-learning algorithms. It has a bi-level optimization structure where the outer-loop process learns a shared initialization and the inner-loop process optimizes task-specific weights. Although MAML relies on the standard gradient descent in the inner-loop, recent studies have shown that controlling the inner-loop's gradient desce…
▽ More
Model-agnostic meta-learning (MAML) is one of the most successful meta-learning algorithms. It has a bi-level optimization structure where the outer-loop process learns a shared initialization and the inner-loop process optimizes task-specific weights. Although MAML relies on the standard gradient descent in the inner-loop, recent studies have shown that controlling the inner-loop's gradient descent with a meta-learned preconditioner can be beneficial. Existing preconditioners, however, cannot simultaneously adapt in a task-specific and path-dependent way. Additionally, they do not satisfy the Riemannian metric condition, which can enable the steepest descent learning with preconditioned gradient. In this study, we propose Geometry-Adaptive Preconditioned gradient descent (GAP) that can overcome the limitations in MAML; GAP can efficiently meta-learn a preconditioner that is dependent on task-specific parameters, and its preconditioner can be shown to be a Riemannian metric. Thanks to the two properties, the geometry-adaptive preconditioner is effective for improving the inner-loop optimization. Experiment results show that GAP outperforms the state-of-the-art MAML family and preconditioned gradient descent-MAML (PGD-MAML) family in a variety of few-shot learning tasks. Code is available at: https://github.com/Suhyun777/CVPR23-GAP.
△ Less
Submitted 29 November, 2023; v1 submitted 4 April, 2023;
originally announced April 2023.
-
VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution
Authors:
Jaeill Kim,
Suhyun Kang,
Duhun Hwang,
Jungwook Shin,
Wonjong Rhee
Abstract:
Since the introduction of deep learning, a wide scope of representation properties, such as decorrelation, whitening, disentanglement, rank, isotropy, and mutual information, have been studied to improve the quality of representation. However, manipulating such properties can be challenging in terms of implementational effectiveness and general applicability. To address these limitations, we propo…
▽ More
Since the introduction of deep learning, a wide scope of representation properties, such as decorrelation, whitening, disentanglement, rank, isotropy, and mutual information, have been studied to improve the quality of representation. However, manipulating such properties can be challenging in terms of implementational effectiveness and general applicability. To address these limitations, we propose to regularize von Neumann entropy~(VNE) of representation. First, we demonstrate that the mathematical formulation of VNE is superior in effectively manipulating the eigenvalues of the representation autocorrelation matrix. Then, we demonstrate that it is widely applicable in improving state-of-the-art algorithms or popular benchmark algorithms by investigating domain-generalization, meta-learning, self-supervised learning, and generative models. In addition, we formally establish theoretical connections with rank, disentanglement, and isotropy of representation. Finally, we provide discussions on the dimension control of VNE and the relationship with Shannon entropy. Code is available at: https://github.com/jaeill/CVPR23-VNE.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.
-
DR.CPO: Diversified and Realistic 3D Augmentation via Iterative Construction, Random Placement, and HPR Occlusion
Authors:
Jungwook Shin,
Jaeill Kim,
Kyungeun Lee,
Hyunghun Cho,
Wonjong Rhee
Abstract:
In autonomous driving, data augmentation is commonly used for improving 3D object detection. The most basic methods include insertion of copied objects and rotation and scaling of the entire training frame. Numerous variants have been developed as well. The existing methods, however, are considerably limited when compared to the variety of the real world possibilities. In this work, we develop a d…
▽ More
In autonomous driving, data augmentation is commonly used for improving 3D object detection. The most basic methods include insertion of copied objects and rotation and scaling of the entire training frame. Numerous variants have been developed as well. The existing methods, however, are considerably limited when compared to the variety of the real world possibilities. In this work, we develop a diversified and realistic augmentation method that can flexibly construct a whole-body object, freely locate and rotate the object, and apply self-occlusion and external-occlusion accordingly. To improve the diversity of the whole-body object construction, we develop an iterative method that stochastically combines multiple objects observed from the real world into a single object. Unlike the existing augmentation methods, the constructed objects can be randomly located and rotated in the training frame because proper occlusions can be reflected to the whole-body objects in the final step. Finally, proper self-occlusion at each local object level and external-occlusion at the global frame level are applied using the Hidden Point Removal (HPR) algorithm that is computationally efficient. HPR is also used for adaptively controlling the point density of each object according to the object's distance from the LiDAR. Experiment results show that the proposed DR.CPO algorithm is data-efficient and model-agnostic without incurring any computational overhead. Also, DR.CPO can improve mAP performance by 2.08% when compared to the best 3D detection result known for KITTI dataset. The code is available at https://github.com/SNU-DRL/DRCPO.git
△ Less
Submitted 30 August, 2023; v1 submitted 20 March, 2023;
originally announced March 2023.
-
Evaluating Feature Attribution Methods for Electrocardiogram
Authors:
Jangwon Suh,
Jimyeong Kim,
Euna Jung,
Wonjong Rhee
Abstract:
The performance of cardiac arrhythmia detection with electrocardiograms(ECGs) has been considerably improved since the introduction of deep learning models. In practice, the high performance alone is not sufficient and a proper explanation is also required. Recently, researchers have started adopting feature attribution methods to address this requirement, but it has been unclear which of the meth…
▽ More
The performance of cardiac arrhythmia detection with electrocardiograms(ECGs) has been considerably improved since the introduction of deep learning models. In practice, the high performance alone is not sufficient and a proper explanation is also required. Recently, researchers have started adopting feature attribution methods to address this requirement, but it has been unclear which of the methods are appropriate for ECG. In this work, we identify and customize three evaluation metrics for feature attribution methods based on the characteristics of ECG: localization score, pointing game, and degradation score. Using the three evaluation metrics, we evaluate and analyze eleven widely-used feature attribution methods. We find that some of the feature attribution methods are much more adequate for explaining ECG, where Grad-CAM outperforms the second-best method by a large margin.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
Isotropic Representation Can Improve Dense Retrieval
Authors:
Euna Jung,
Jungwon Park,
Jaekeol Choi,
Sungyoon Kim,
Wonjong Rhee
Abstract:
The recent advancement in language representation modeling has broadly affected the design of dense retrieval models. In particular, many of the high-performing dense retrieval models evaluate representations of query and document using BERT, and subsequently apply a cosine-similarity based scoring to determine the relevance. BERT representations, however, are known to follow an anisotropic distri…
▽ More
The recent advancement in language representation modeling has broadly affected the design of dense retrieval models. In particular, many of the high-performing dense retrieval models evaluate representations of query and document using BERT, and subsequently apply a cosine-similarity based scoring to determine the relevance. BERT representations, however, are known to follow an anisotropic distribution of a narrow cone shape and such an anisotropic distribution can be undesirable for the cosine-similarity based scoring. In this work, we first show that BERT-based DR also follows an anisotropic distribution. To cope with the problem, we introduce unsupervised post-processing methods of Normalizing Flow and whitening, and develop token-wise method in addition to the sequence-wise method for applying the post-processing methods to the representations of dense retrieval models. We show that the proposed methods can effectively enhance the representations to be isotropic, then we perform experiments with ColBERT and RepBERT to show that the performance (NDCG at 10) of document re-ranking can be improved by 5.17\%$\sim$8.09\% for ColBERT and 6.88\%$\sim$22.81\% for RepBERT. To examine the potential of isotropic representation for improving the robustness of DR models, we investigate out-of-distribution tasks where the test dataset differs from the training dataset. The results show that isotropic representation can achieve a generally improved performance. For instance, when training dataset is MS-MARCO and test dataset is Robust04, isotropy post-processing can improve the baseline performance by up to 24.98\%. Furthermore, we show that an isotropic model trained with an out-of-distribution dataset can even outperform a baseline model trained with the in-distribution dataset.
△ Less
Submitted 31 July, 2023; v1 submitted 1 September, 2022;
originally announced September 2022.
-
Finding Inverse Document Frequency Information in BERT
Authors:
Jaekeol Choi,
Euna Jung,
Sungjun Lim,
Wonjong Rhee
Abstract:
For many decades, BM25 and its variants have been the dominant document retrieval approach, where their two underlying features are Term Frequency (TF) and Inverse Document Frequency (IDF). The traditional approach, however, is being rapidly replaced by Neural Ranking Models (NRMs) that can exploit semantic features. In this work, we consider BERT-based NRMs and study if IDF information is present…
▽ More
For many decades, BM25 and its variants have been the dominant document retrieval approach, where their two underlying features are Term Frequency (TF) and Inverse Document Frequency (IDF). The traditional approach, however, is being rapidly replaced by Neural Ranking Models (NRMs) that can exploit semantic features. In this work, we consider BERT-based NRMs and study if IDF information is present in the NRMs. This simple question is interesting because IDF has been indispensable for the traditional lexical matching, but global features like IDF are not explicitly learned by neural language models including BERT. We adopt linear probing as the main analysis tool because typical BERT based NRMs utilize linear or inner-product based score aggregators. We analyze input embeddings, representations of all BERT layers, and the self-attention weights of CLS. By studying MS-MARCO dataset with three BERT-based models, we show that all of them contain information that is strongly dependent on IDF.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
B2EA: An Evolutionary Algorithm Assisted by Two Bayesian Optimization Modules for Neural Architecture Search
Authors:
Hyunghun Cho,
Jungwook Shin,
Wonjong Rhee
Abstract:
The early pioneering Neural Architecture Search (NAS) works were multi-trial methods applicable to any general search space. The subsequent works took advantage of the early findings and developed weight-sharing methods that assume a structured search space typically with pre-fixed hyperparameters. Despite the amazing computational efficiency of the weight-sharing NAS algorithms, it is becoming ap…
▽ More
The early pioneering Neural Architecture Search (NAS) works were multi-trial methods applicable to any general search space. The subsequent works took advantage of the early findings and developed weight-sharing methods that assume a structured search space typically with pre-fixed hyperparameters. Despite the amazing computational efficiency of the weight-sharing NAS algorithms, it is becoming apparent that multi-trial NAS algorithms are also needed for identifying very high-performance architectures, especially when exploring a general search space. In this work, we carefully review the latest multi-trial NAS algorithms and identify the key strategies including Evolutionary Algorithm (EA), Bayesian Optimization (BO), diversification, input and output transformations, and lower fidelity estimation. To accommodate the key strategies into a single framework, we develop B2EA that is a surrogate assisted EA with two BO surrogate models and a mutation step in between. To show that B2EA is robust and efficient, we evaluate three performance metrics over 14 benchmarks with general and cell-based search spaces. Comparisons with state-of-the-art multi-trial algorithms reveal that B2EA is robust and efficient over the 14 benchmarks for three difficulty levels of target performance. The B2EA code is publicly available at \url{https://github.com/snu-adsl/BBEA}.
△ Less
Submitted 17 February, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
A Highly Effective Low-Rank Compression of Deep Neural Networks with Modified Beam-Search and Modified Stable Rank
Authors:
Moonjung Eo,
Suhyun Kang,
Wonjong Rhee
Abstract:
Compression has emerged as one of the essential deep learning research topics, especially for the edge devices that have limited computation power and storage capacity. Among the main compression techniques, low-rank compression via matrix factorization has been known to have two problems. First, an extensive tuning is required. Second, the resulting compression performance is typically not impres…
▽ More
Compression has emerged as one of the essential deep learning research topics, especially for the edge devices that have limited computation power and storage capacity. Among the main compression techniques, low-rank compression via matrix factorization has been known to have two problems. First, an extensive tuning is required. Second, the resulting compression performance is typically not impressive. In this work, we propose a low-rank compression method that utilizes a modified beam-search for an automatic rank selection and a modified stable rank for a compression-friendly training. The resulting BSR (Beam-search and Stable Rank) algorithm requires only a single hyperparameter to be tuned for the desired compression ratio. The performance of BSR in terms of accuracy and compression ratio trade-off curve turns out to be superior to the previously known low-rank compression methods. Furthermore, BSR can perform on par with or better than the state-of-the-art structured pruning methods. As with pruning, BSR can be easily combined with quantization for an additional compression.
△ Less
Submitted 30 November, 2021; v1 submitted 30 November, 2021;
originally announced November 2021.
-
Semi-Siamese Bi-encoder Neural Ranking Model Using Lightweight Fine-Tuning
Authors:
Euna Jung,
Jaekeol Choi,
Wonjong Rhee
Abstract:
A BERT-based Neural Ranking Model (NRM) can be either a crossencoder or a bi-encoder. Between the two, bi-encoder is highly efficient because all the documents can be pre-processed before the actual query time. In this work, we show two approaches for improving the performance of BERT-based bi-encoders. The first approach is to replace the full fine-tuning step with a lightweight fine-tuning. We e…
▽ More
A BERT-based Neural Ranking Model (NRM) can be either a crossencoder or a bi-encoder. Between the two, bi-encoder is highly efficient because all the documents can be pre-processed before the actual query time. In this work, we show two approaches for improving the performance of BERT-based bi-encoders. The first approach is to replace the full fine-tuning step with a lightweight fine-tuning. We examine lightweight fine-tuning methods that are adapter-based, prompt-based, and hybrid of the two. The second approach is to develop semi-Siamese models where queries and documents are handled with a limited amount of difference. The limited difference is realized by learning two lightweight fine-tuning modules, where the main language model of BERT is kept common for both query and document. We provide extensive experiment results for monoBERT, TwinBERT, and ColBERT where three performance metrics are evaluated over Robust04, ClueWeb09b, and MS-MARCO datasets. The results confirm that both lightweight fine-tuning and semi-Siamese are considerably helpful for improving BERT-based bi-encoders. In fact, lightweight fine-tuning is helpful for crossencoder, too
△ Less
Submitted 2 March, 2022; v1 submitted 28 October, 2021;
originally announced October 2021.
-
AID-Purifier: A Light Auxiliary Network for Boosting Adversarial Defense
Authors:
Duhun Hwang,
Eunjung Lee,
Wonjong Rhee
Abstract:
We propose an AID-purifier that can boost the robustness of adversarially-trained networks by purifying their inputs. AID-purifier is an auxiliary network that works as an add-on to an already trained main classifier. To keep it computationally light, it is trained as a discriminator with a binary cross-entropy loss. To obtain additionally useful information from the adversarial examples, the arch…
▽ More
We propose an AID-purifier that can boost the robustness of adversarially-trained networks by purifying their inputs. AID-purifier is an auxiliary network that works as an add-on to an already trained main classifier. To keep it computationally light, it is trained as a discriminator with a binary cross-entropy loss. To obtain additionally useful information from the adversarial examples, the architecture design is closely related to information maximization principles where two layers of the main classification network are piped to the auxiliary network. To assist the iterative optimization procedure of purification, the auxiliary network is trained with AVmixup. AID-purifier can be used together with other purifiers such as PixelDefend for an extra enhancement. The overall results indicate that the best performing adversarially-trained networks can be enhanced by the best performing purification networks, where AID-purifier is a competitive candidate that is light and robust.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
Improving Bi-encoder Document Ranking Models with Two Rankers and Multi-teacher Distillation
Authors:
Jaekeol Choi,
Euna Jung,
Jangwon Suh,
Wonjong Rhee
Abstract:
BERT-based Neural Ranking Models (NRMs) can be classified according to how the query and document are encoded through BERT's self-attention layers - bi-encoder versus cross-encoder. Bi-encoder models are highly efficient because all the documents can be pre-processed before the query time, but their performance is inferior compared to cross-encoder models. Both models utilize a ranker that receive…
▽ More
BERT-based Neural Ranking Models (NRMs) can be classified according to how the query and document are encoded through BERT's self-attention layers - bi-encoder versus cross-encoder. Bi-encoder models are highly efficient because all the documents can be pre-processed before the query time, but their performance is inferior compared to cross-encoder models. Both models utilize a ranker that receives BERT representations as the input and generates a relevance score as the output. In this work, we propose a method where multi-teacher distillation is applied to a cross-encoder NRM and a bi-encoder NRM to produce a bi-encoder NRM with two rankers. The resulting student bi-encoder achieves an improved performance by simultaneously learning from a cross-encoder teacher and a bi-encoder teacher and also by combining relevance scores from the two rankers. We call this method TRMD (Two Rankers and Multi-teacher Distillation). In the experiments, TwinBERT and ColBERT are considered as baseline bi-encoders. When monoBERT is used as the cross-encoder teacher, together with either TwinBERT or ColBERT as the bi-encoder teacher, TRMD produces a student bi-encoder that performs better than the corresponding baseline bi-encoder. For P@20, the maximum improvement was 11.4%, and the average improvement was 6.8%. As an additional experiment, we considered producing cross-encoder students with TRMD, and found that it could also improve the cross-encoders.
△ Less
Submitted 6 August, 2021; v1 submitted 11 March, 2021;
originally announced March 2021.
-
Short-term Traffic Prediction with Deep Neural Networks: A Survey
Authors:
Kyungeun Lee,
Moonjung Eo,
Euna Jung,
Yoon** Yoon,
Wonjong Rhee
Abstract:
In modern transportation systems, an enormous amount of traffic data is generated every day. This has led to rapid progress in short-term traffic prediction (STTP), in which deep learning methods have recently been applied. In traffic networks with complex spatiotemporal relationships, deep neural networks (DNNs) often perform well because they are capable of automatically extracting the most impo…
▽ More
In modern transportation systems, an enormous amount of traffic data is generated every day. This has led to rapid progress in short-term traffic prediction (STTP), in which deep learning methods have recently been applied. In traffic networks with complex spatiotemporal relationships, deep neural networks (DNNs) often perform well because they are capable of automatically extracting the most important features and patterns. In this study, we survey recent STTP studies applying deep networks from four perspectives. 1) We summarize input data representation methods according to the number and type of spatial and temporal dependencies involved. 2) We briefly explain a wide range of DNN techniques from the earliest networks, including Restricted Boltzmann Machines, to the most recent, including graph-based and meta-learning networks. 3) We summarize previous STTP studies in terms of the type of DNN techniques, application area, dataset and code availability, and the type of the represented spatiotemporal dependencies. 4) We compile public traffic datasets that are popular and can be used as the standard benchmarks. Finally, we suggest challenging issues and possible future research directions in STTP.
△ Less
Submitted 28 August, 2020;
originally announced September 2020.
-
Interpreting Neural Ranking Models using Grad-CAM
Authors:
Jaekeol Choi,
Jungin Choi,
Wonjong Rhee
Abstract:
Recently, applying deep neural networks in IR has become an important and timely topic. For instance, Neural Ranking Models(NRMs) have shown promising performance compared to the traditional ranking models. However, explaining the ranking results has become even more difficult with NRM due to the complex structure of neural networks. On the other hand, a great deal of research is under progress on…
▽ More
Recently, applying deep neural networks in IR has become an important and timely topic. For instance, Neural Ranking Models(NRMs) have shown promising performance compared to the traditional ranking models. However, explaining the ranking results has become even more difficult with NRM due to the complex structure of neural networks. On the other hand, a great deal of research is under progress on Interpretable Machine Learning(IML), including Grad-CAM. Grad-CAM is an attribution method and it can visualize the input regions that contribute to the network's output. In this paper, we adopt Grad-CAM for interpreting the ranking results of NRM. By adopting Grad-CAM, we analyze how each query-document term pair contributes to the matching score for a given pair of query and document. The visualization results provide insights on why a certain document is relevant to the given query. Also, the results show that neural ranking model captures the subtle notion of relevance. Our interpretation method and visualization results can be used for snippet generation and user-query intent analysis.
△ Less
Submitted 12 May, 2020;
originally announced May 2020.
-
DDP-GCN: Multi-Graph Convolutional Network for Spatiotemporal Traffic Forecasting
Authors:
Kyungeun Lee,
Wonjong Rhee
Abstract:
Traffic speed forecasting is one of the core problems in transportation systems. For a more accurate prediction, recent studies started using not only the temporal speed patterns but also the spatial information on the road network through the graph convolutional networks. Even though the road network is highly complex due to its non-Euclidean and directional characteristics, previous approaches m…
▽ More
Traffic speed forecasting is one of the core problems in transportation systems. For a more accurate prediction, recent studies started using not only the temporal speed patterns but also the spatial information on the road network through the graph convolutional networks. Even though the road network is highly complex due to its non-Euclidean and directional characteristics, previous approaches mainly focused on modeling the spatial dependencies using the distance only. In this paper, we identify two essential spatial dependencies in traffic forecasting in addition to distance, direction and positional relationship, for designing basic graph elements as the fundamental building blocks. Using the building blocks, we suggest DDP-GCN (Distance, Direction, and Positional relationship Graph Convolutional Network) to incorporate the three spatial relationships into deep neural networks. We evaluate the proposed model with two large-scale real-world datasets, and find positive improvements for long-term forecasting in highly complex urban networks. The improvement can be larger for commute hours, but it can be also limited for short-term forecasting.
△ Less
Submitted 25 September, 2022; v1 submitted 29 May, 2019;
originally announced May 2019.
-
DEEP-BO for Hyperparameter Optimization of Deep Networks
Authors:
Hyunghun Cho,
Yong** Kim,
Eunjung Lee,
Daeyoung Choi,
Yongjae Lee,
Wonjong Rhee
Abstract:
The performance of deep neural networks (DNN) is very sensitive to the particular choice of hyper-parameters. To make it worse, the shape of the learning curve can be significantly affected when a technique like batchnorm is used. As a result, hyperparameter optimization of deep networks can be much more challenging than traditional machine learning models. In this work, we start from well known B…
▽ More
The performance of deep neural networks (DNN) is very sensitive to the particular choice of hyper-parameters. To make it worse, the shape of the learning curve can be significantly affected when a technique like batchnorm is used. As a result, hyperparameter optimization of deep networks can be much more challenging than traditional machine learning models. In this work, we start from well known Bayesian Optimization solutions and provide enhancement strategies specifically designed for hyperparameter optimization of deep networks. The resulting algorithm is named as DEEP-BO (Diversified, Early-termination-Enabled, and Parallel Bayesian Optimization). When evaluated over six DNN benchmarks, DEEP-BO easily outperforms or shows comparable performance with some of the well-known solutions including GP-Hedge, Hyperband, BOHB, Median Stop** Rule, and Learning Curve Extrapolation. The code used is made publicly available at https://github.com/snu-adsl/DEEP-BO.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
Subtask Gated Networks for Non-Intrusive Load Monitoring
Authors:
Changho Shin,
Sunghwan Joo,
Jaeryun Yim,
Hyoseop Lee,
Taesup Moon,
Wonjong Rhee
Abstract:
Non-intrusive load monitoring (NILM), also known as energy disaggregation, is a blind source separation problem where a household's aggregate electricity consumption is broken down into electricity usages of individual appliances. In this way, the cost and trouble of installing many measurement devices over numerous household appliances can be avoided, and only one device needs to be installed. Th…
▽ More
Non-intrusive load monitoring (NILM), also known as energy disaggregation, is a blind source separation problem where a household's aggregate electricity consumption is broken down into electricity usages of individual appliances. In this way, the cost and trouble of installing many measurement devices over numerous household appliances can be avoided, and only one device needs to be installed. The problem has been well-known since Hart's seminal paper in 1992, and recently significant performance improvements have been achieved by adopting deep networks. In this work, we focus on the idea that appliances have on/off states, and develop a deep network for further performance improvements. Specifically, we propose a subtask gated network that combines the main regression network with an on/off classification subtask network. Unlike typical multitask learning algorithms where multiple tasks simply share the network parameters to take advantage of the relevance among tasks, the subtask gated network multiply the main network's regression output with the subtask's classification probability. When standby-power is additionally learned, the proposed solution surpasses the state-of-the-art performance for most of the benchmark cases. The subtask gated network can be very effective for any problem that inherently has on/off states.
△ Less
Submitted 16 November, 2018;
originally announced November 2018.
-
Statistical Characteristics of Deep Representations: An Empirical Investigation
Authors:
Daeyoung Choi,
Kyungeun Lee,
Duhun Hwang,
Wonjong Rhee
Abstract:
In this study, the effects of eight representation regularization methods are investigated, including two newly developed rank regularizers (RR). The investigation shows that the statistical characteristics of representations such as correlation, sparsity, and rank can be manipulated as intended, during training. Furthermore, it is possible to improve the baseline performance simply by trying all…
▽ More
In this study, the effects of eight representation regularization methods are investigated, including two newly developed rank regularizers (RR). The investigation shows that the statistical characteristics of representations such as correlation, sparsity, and rank can be manipulated as intended, during training. Furthermore, it is possible to improve the baseline performance simply by trying all the representation regularizers and fine-tuning the strength of their effects. In contrast to performance improvement, no consistent relationship between performance and statistical characteristics was observable. The results indicate that manipulation of statistical characteristics can be helpful for improving performance, but only indirectly through its influence on learning dynamics or its tuning effects.
△ Less
Submitted 2 December, 2020; v1 submitted 8 November, 2018;
originally announced November 2018.
-
Utilizing Class Information for Deep Network Representation Sha**
Authors:
Daeyoung Choi,
Wonjong Rhee
Abstract:
Statistical characteristics of deep network representations, such as sparsity and correlation, are known to be relevant to the performance and interpretability of deep learning. When a statistical characteristic is desired, often an adequate regularizer can be designed and applied during the training phase. Typically, such a regularizer aims to manipulate a statistical characteristic over all clas…
▽ More
Statistical characteristics of deep network representations, such as sparsity and correlation, are known to be relevant to the performance and interpretability of deep learning. When a statistical characteristic is desired, often an adequate regularizer can be designed and applied during the training phase. Typically, such a regularizer aims to manipulate a statistical characteristic over all classes together. For classification tasks, however, it might be advantageous to enforce the desired characteristic per class such that different classes can be better distinguished. Motivated by the idea, we design two class-wise regularizers that explicitly utilize class information: class-wise Covariance Regularizer (cw-CR) and class-wise Variance Regularizer (cw-VR). cw-CR targets to reduce the covariance of representations calculated from the same class samples for encouraging feature independence. cw-VR is similar, but variance instead of covariance is targeted to improve feature compactness. For the sake of completeness, their counterparts without using class information, Covariance Regularizer (CR) and Variance Regularizer (VR), are considered together. The four regularizers are conceptually simple and computationally very efficient, and the visualization shows that the regularizers indeed perform distinct representation sha**. In terms of classification performance, significant improvements over the baseline and L1/L2 weight regularization methods were found for 21 out of 22 tasks over popular benchmark datasets. In particular, cw-VR achieved the best performance for 13 tasks including ResNet-32/110.
△ Less
Submitted 28 February, 2019; v1 submitted 24 September, 2018;
originally announced September 2018.
-
Restructuring Batch Normalization to Accelerate CNN Training
Authors:
Wonkyung Jung,
Dae** Jung,
and Byeongho Kim,
Sunjung Lee,
Wonjong Rhee,
Jung Ho Ahn
Abstract:
Batch Normalization (BN) has become a core design block of modern Convolutional Neural Networks (CNNs). A typical modern CNN has a large number of BN layers in its lean and deep architecture. BN requires mean and variance calculations over each mini-batch during training. Therefore, the existing memory access reduction techniques, such as fusing multiple CONV layers, are not effective for accelera…
▽ More
Batch Normalization (BN) has become a core design block of modern Convolutional Neural Networks (CNNs). A typical modern CNN has a large number of BN layers in its lean and deep architecture. BN requires mean and variance calculations over each mini-batch during training. Therefore, the existing memory access reduction techniques, such as fusing multiple CONV layers, are not effective for accelerating BN due to their inability to optimize mini-batch related calculations during training. To address this increasingly important problem, we propose to restructure BN layers by first splitting a BN layer into two sub-layers (fission) and then combining the first sub-layer with its preceding CONV layer and the second sub-layer with the following activation and CONV layers (fusion). The proposed solution can significantly reduce main-memory accesses while training the latest CNN models, and the experiments on a chip multiprocessor show that the proposed BN restructuring can improve the performance of DenseNet-121 by 25.7%.
△ Less
Submitted 1 March, 2019; v1 submitted 3 July, 2018;
originally announced July 2018.
-
Partitioning Compute Units in CNN Acceleration for Statistical Memory Traffic Sha**
Authors:
Dae** Jung,
Sunjung Lee,
Wonjong Rhee,
Jung Ho Ahn
Abstract:
The design complexity of CNNs has been steadily increasing to improve accuracy. To cope with the massive amount of computation needed for such complex CNNs, the latest solutions utilize blocking of an image over the available dimensions and batching of multiple input images to improve data reuse in the memory hierarchy. While there has been numerous works on maximizing data reuse, only a few studi…
▽ More
The design complexity of CNNs has been steadily increasing to improve accuracy. To cope with the massive amount of computation needed for such complex CNNs, the latest solutions utilize blocking of an image over the available dimensions and batching of multiple input images to improve data reuse in the memory hierarchy. While there has been numerous works on maximizing data reuse, only a few studies have focused on the memory bottleneck caused by limited bandwidth. Bandwidth bottleneck can easily occur in CNN acceleration as CNN layers have different sizes with varying computation needs and as batching is typically performed over each CNN layer for an ideal data reuse. In this case, the data transfer demand for a layer can be relatively low or high compared to the computation requirement of the layer, and hence temporal fluctuations in memory access can be induced eventually causing bandwidth problems. In this paper, we first show that there exists a high degree of fluctuation in memory access to computation ratio depending on CNN layers and functions in the layer being processed by the compute units (cores), where the units are tightly synchronized to maximize data reuse. Then we propose a strategy of partitioning the compute units where the cores within each partition process a batch of input data synchronously to maximize data reuse but different partitions run asynchronously. As the partitions stay asynchronous and typically process different CNN layers at any given moment, the memory access traffic sizes of the partitions become statistically shuffled. Thus, the partitioning of compute units and asynchronous use of them make the total memory access traffic size be smoothened over time. We call this smoothing statistical memory traffic sha**, and we show that it can lead to 8.0 percent of performance gain on a commercial 64-core processor when running ResNet-50.
△ Less
Submitted 18 June, 2018;
originally announced June 2018.
-
A Downstream Crosstalk Channel Estimation Method for Mix of Legacy and Vectoring-Enabled VDSL
Authors:
Mehdi Mohseni,
Wonjong Rhee,
Georgios Ginis
Abstract:
With the latest technology of vectoring, DSL data rates in the order of 100Mbps have become a reality that is under field deployment. The key is to cancel crosstalk from other lines, which is also known as multiuser MIMO cancellation for wireless communications. During the DSL system upgrade phase of field deployment, mix of legacy and vectoring-enabled VDSL lines is inevitable and a channel estim…
▽ More
With the latest technology of vectoring, DSL data rates in the order of 100Mbps have become a reality that is under field deployment. The key is to cancel crosstalk from other lines, which is also known as multiuser MIMO cancellation for wireless communications. During the DSL system upgrade phase of field deployment, mix of legacy and vectoring-enabled VDSL lines is inevitable and a channel estimation solution for the entire mix is needed before vectoring can be enforced. This paper describes a practical method for crosstalk channel estimation for downstream vectoring, assuming that a vectoring-enabled DSLAM forces DMT symbol-level timing to be aligned for all of the lines, but also assuming that the location of synch symbols are aligned only among vectoring-enabled lines. Each vectoring-enabled receiver is capable of reporting error samples to vectoring-DSLAM. The estimation method is not only practical, but also matches the performance of Maximum-Likelihood estimator for the selected training sequences.
△ Less
Submitted 20 February, 2017; v1 submitted 20 February, 2017;
originally announced February 2017.
-
Glucose metabolism and oscillatory behavior of pancreatic islets
Authors:
H. Kang,
J. Jo,
H. J. Kim,
M. Y. Choi,
S. W. Rhee,
D. S. Koh
Abstract:
A variety of oscillations are observed in pancreatic islets.We establish a model, incorporating two oscillatory systems of different time scales: One is the well-known bursting model in pancreatic beta-cells and the other is the glucose-insulin feedback model which considers direct and indirect feedback of secreted insulin. These two are coupled to interact with each other in the combined model,…
▽ More
A variety of oscillations are observed in pancreatic islets.We establish a model, incorporating two oscillatory systems of different time scales: One is the well-known bursting model in pancreatic beta-cells and the other is the glucose-insulin feedback model which considers direct and indirect feedback of secreted insulin. These two are coupled to interact with each other in the combined model, and two basic assumptions are made on the basis of biological observations: The conductance g_{K(ATP)} for the ATP-dependent potassium current is a decreasing function of the glucose concentration whereas the insulin secretion rate is given by a function of the intracellular calcium concentration. Obtained via extensive numerical simulations are complex oscillations including clusters of bursts, slow and fast calcium oscillations, and so on. We also consider how the intracellular glucose concentration depends upon the extracellular glucose concentration, and examine the inhibitory effects of insulin.
△ Less
Submitted 23 September, 2005;
originally announced September 2005.
-
Pairing Instability and Mechanical Collapse of a Bose Gas with an Attractive Interaction
Authors:
Gun Sang Jeon,
Lan Yin,
Sung Wu Rhee,
David J. Thouless
Abstract:
We study the pairing instability and mechanical collapse of a dilute homogeneous bose gas with an attractive interaction. The pairing phase is found to be a saddle point, unstable against pairing fluctuations. This pairing saddle point exists above a critical temperature. Below this critical temperature, the system is totally unstable in the pairing channel. Thus the system could collapse in the…
▽ More
We study the pairing instability and mechanical collapse of a dilute homogeneous bose gas with an attractive interaction. The pairing phase is found to be a saddle point, unstable against pairing fluctuations. This pairing saddle point exists above a critical temperature. Below this critical temperature, the system is totally unstable in the pairing channel. Thus the system could collapse in the pairing channel in addition to mechanical collapse. The critical temperatures of pairing instability and mechanical collapse are higher than the BEC temperature of an ideal bose gas with the same density. When fluctuations are taken into account, we find that the critical temperature of mechanical collapse is even higher. The difference between the collapse temperature and the BEC temperature is proportional to $(n|a_s|^3)^{2/9}$, where $n$ is the density and $a_s$ is the scattering length.
△ Less
Submitted 15 May, 2002; v1 submitted 28 September, 2001;
originally announced October 2001.
-
Vortex Dynamics in the Two-Fluid Model
Authors:
D. J. Thouless,
M. R. Geller,
W. F. Vinen,
J. -Y. Fortin,
S. W. Rhee
Abstract:
We have used two-fluid dynamics to study the discrepancy between the work of Thouless, Ao and Niu (TAN) and that of Iordanskii. In TAN no transverse force on a vortex due to normal fluid flow was found, whereas the earlier work found a transverse force proportional to normal fluid velocity u and normal fluid density. We have linearized the time-independent two-fluid equations about the exact sol…
▽ More
We have used two-fluid dynamics to study the discrepancy between the work of Thouless, Ao and Niu (TAN) and that of Iordanskii. In TAN no transverse force on a vortex due to normal fluid flow was found, whereas the earlier work found a transverse force proportional to normal fluid velocity u and normal fluid density. We have linearized the time-independent two-fluid equations about the exact solution for a vortex, and find three solutions which are important in the region far from the vortex. Uniform superfluid flow gives rise to the usual superfluid Magnus force. Uniform normal fluid flow gives rise to no forces in the linear region, but does not satisfy reasonable boundary conditions at short distances. A logarithmically increasing normal fluid flow gives a viscous force. As in classical hydrodynamics, and as in the early work of Hall and Vinen, this logarithmic increase must be cut off by nonlinear effects at large distances; this gives a viscous force proportional to u/ln(u), and a transverse contribution which goes like u/(ln u)^2, even in the absence of an explicit Iordanskii force. In the limit u goes to zero the TAN result is obtained, but at nonzero u there are important corrections that were not found in TAN. We argue that the Magnus force in a superfluid at nonzero temperature is an example of a topological relation for which finite-size corrections may be large.
△ Less
Submitted 21 January, 2001; v1 submitted 18 January, 2001;
originally announced January 2001.
-
Renormalization-group study of gate charge effects in Josephson-junction chains
Authors:
M. Y. Choi,
Sung Wu Rhee,
Minchul Lee,
J. Choi
Abstract:
We study the quantum phase transition in a chain of superconducting grains, coupled by Josephson junctions, with emphasis on the effects of gate charges induced on the grains. At zero temperature the system is mapped onto a two-dimensional classical Coulomb gas, where the gate charge plays the role of an imaginary electric field. Such a field is found relevant in the renormalization-group transf…
▽ More
We study the quantum phase transition in a chain of superconducting grains, coupled by Josephson junctions, with emphasis on the effects of gate charges induced on the grains. At zero temperature the system is mapped onto a two-dimensional classical Coulomb gas, where the gate charge plays the role of an imaginary electric field. Such a field is found relevant in the renormalization-group transformation and to change the nature of the superconductor-insulator transition present in the system, tending to suppress quantum fluctuations and hel** establish superconductivity. On the basis of this observation, we propose the zero-temperature phase diagram on the plane of the gate charge and the energy ratio.
△ Less
Submitted 14 December, 2000;
originally announced December 2000.