Search | arXiv e-print repository

The Geometry of Categorical and Hierarchical Concepts in Large Language Models

Authors: Kiho Park, Yo Joong Choe, Yibo Jiang, Victor Veitch

Abstract: Understanding how semantic meaning is encoded in the representation spaces of large language models is a fundamental problem in interpretability. In this paper, we study the two foundational questions in this area. First, how are categorical concepts, such as {'mammal', 'bird', 'reptile', 'fish'}, represented? Second, how are hierarchical relations between concepts encoded? For example, how is the… ▽ More Understanding how semantic meaning is encoded in the representation spaces of large language models is a fundamental problem in interpretability. In this paper, we study the two foundational questions in this area. First, how are categorical concepts, such as {'mammal', 'bird', 'reptile', 'fish'}, represented? Second, how are hierarchical relations between concepts encoded? For example, how is the fact that 'dog' is a kind of 'mammal' encoded? We show how to extend the linear representation hypothesis to answer these questions. We find a remarkably simple structure: simple categorical concepts are represented as simplices, hierarchically related concepts are orthogonal in a sense we make precise, and (in consequence) complex concepts are represented as polytopes constructed from direct sums of simplices, reflecting the hierarchical structure. We validate these theoretical results on the Gemma large language model, estimating representations for 957 hierarchically related concepts using data from WordNet. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Code is available at https://github.com/KihoPark/LLM_Categorical_Hierarchical_Representations

arXiv:2402.09698 [pdf, other]

Combining Evidence Across Filtrations Using Adjusters

Authors: Yo Joong Choe, Aaditya Ramdas

Abstract: In anytime-valid sequential inference, it is known that any admissible procedure must be based on e-processes, which are composite generalizations of test martingales that quantify the accumulated evidence against a composite null hypothesis at any arbitrary stop** time. This paper studies methods for combining e-processes constructed using different information sets (filtrations) for the same n… ▽ More In anytime-valid sequential inference, it is known that any admissible procedure must be based on e-processes, which are composite generalizations of test martingales that quantify the accumulated evidence against a composite null hypothesis at any arbitrary stop** time. This paper studies methods for combining e-processes constructed using different information sets (filtrations) for the same null. Although e-processes constructed in the same filtration can be combined effortlessly (e.g., by averaging), e-processes constructed in different filtrations cannot, because their validity in a coarser filtration does not translate to validity in a finer filtration. This issue arises in exchangeability tests, independence tests, and tests for comparing forecasts with lags. We first establish that a class of functions called adjusters allows us to lift e-processes from a coarser filtration into any finer filtration. We then introduce a characterization theorem for adjusters, formalizing a sense in which using adjusters is necessary. There are two major implications. First, if we have a powerful e-process in a coarsened filtration, then we readily have a powerful e-process in the original filtration. Second, when we coarsen the filtration to construct an e-process, there is an asymptotically logarithmic cost of recovering anytime-validity in the original filtration. △ Less

Submitted 28 May, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: Substantially revised with new results in Sections 5 and 6. Code is available at https://github.com/yjchoe/CombiningEvidenceAcrossFiltrations

arXiv:2401.06432 [pdf, other]

Heterogeneous LoRA for Federated Fine-tuning of On-Device Foundation Models

Authors: Yae Jee Cho, Luyang Liu, Zheng Xu, Aldi Fahrezi, Gauri Joshi

Abstract: Foundation models (FMs) adapt well to specific domains or tasks with fine-tuning, and federated learning (FL) enables the potential for privacy-preserving fine-tuning of the FMs with on-device local data. For federated fine-tuning of FMs, we consider the FMs with small to medium parameter sizes of single digit billion at maximum, referred to as on-device FMs (ODFMs) that can be deployed on devices… ▽ More Foundation models (FMs) adapt well to specific domains or tasks with fine-tuning, and federated learning (FL) enables the potential for privacy-preserving fine-tuning of the FMs with on-device local data. For federated fine-tuning of FMs, we consider the FMs with small to medium parameter sizes of single digit billion at maximum, referred to as on-device FMs (ODFMs) that can be deployed on devices for inference but can only be fine-tuned with parameter efficient methods. In our work, we tackle the data and system heterogeneity problem of federated fine-tuning of ODFMs by proposing a novel method using heterogeneous low-rank approximations (LoRAs), namely HetLoRA. First, we show that the naive approach of using homogeneous LoRA ranks across devices face a trade-off between overfitting and slow convergence, and thus propose HetLoRA, which allows heterogeneous ranks across client devices and efficiently aggregates and distributes these heterogeneous LoRA modules. By applying rank self-pruning locally and sparsity-weighted aggregation at the server, HetLoRA combines the advantages of high and low-rank LoRAs, which achieves improved convergence speed and final performance compared to homogeneous LoRA. Furthermore, HetLoRA offers enhanced computation efficiency compared to full fine-tuning, making it suitable for federated fine-tuning across heterogeneous devices. △ Less

Submitted 20 February, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

arXiv:2311.03658 [pdf, other]

The Linear Representation Hypothesis and the Geometry of Large Language Models

Authors: Kiho Park, Yo Joong Choe, Victor Veitch

Abstract: Informally, the 'linear representation hypothesis' is the idea that high-level concepts are represented linearly as directions in some representation space. In this paper, we address two closely related questions: What does "linear representation" actually mean? And, how do we make sense of geometric notions (e.g., cosine similarity or projection) in the representation space? To answer these, we u… ▽ More Informally, the 'linear representation hypothesis' is the idea that high-level concepts are represented linearly as directions in some representation space. In this paper, we address two closely related questions: What does "linear representation" actually mean? And, how do we make sense of geometric notions (e.g., cosine similarity or projection) in the representation space? To answer these, we use the language of counterfactuals to give two formalizations of "linear representation", one in the output (word) representation space, and one in the input (sentence) space. We then prove these connect to linear probing and model steering, respectively. To make sense of geometric notions, we use the formalization to identify a particular (non-Euclidean) inner product that respects language structure in a sense we make precise. Using this causal inner product, we show how to unify all notions of linear representation. In particular, this allows the construction of probes and steering vectors using counterfactual pairs. Experiments with LLaMA-2 demonstrate the existence of linear representations of concepts, the connection to interpretation and control, and the fundamental role of the choice of inner product. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: Accepted for an oral presentation at NeurIPS 2023 Workshop on Causal Representation Learning. Code is available at https://github.com/KihoPark/linear_rep_geometry

arXiv:2307.08809 [pdf, other]

Local or Global: Selective Knowledge Assimilation for Federated Learning with Limited Labels

Authors: Yae Jee Cho, Gauri Joshi, Dimitrios Dimitriadis

Abstract: Many existing FL methods assume clients with fully-labeled data, while in realistic settings, clients have limited labels due to the expensive and laborious process of labeling. Limited labeled local data of the clients often leads to their local model having poor generalization abilities to their larger unlabeled local data, such as having class-distribution mismatch with the unlabeled data. As a… ▽ More Many existing FL methods assume clients with fully-labeled data, while in realistic settings, clients have limited labels due to the expensive and laborious process of labeling. Limited labeled local data of the clients often leads to their local model having poor generalization abilities to their larger unlabeled local data, such as having class-distribution mismatch with the unlabeled data. As a result, clients may instead look to benefit from the global model trained across clients to leverage their unlabeled data, but this also becomes difficult due to data heterogeneity across clients. In our work, we propose FedLabel where clients selectively choose the local or global model to pseudo-label their unlabeled data depending on which is more of an expert of the data. We further utilize both the local and global models' knowledge via global-local consistency regularization which minimizes the divergence between the two models' outputs when they have identical pseudo-labels for the unlabeled data. Unlike other semi-supervised FL baselines, our method does not require additional experts other than the local or global model, nor require additional parameters to be communicated. We also do not assume any server-labeled data or fully labeled clients. For both cross-device and cross-silo settings, we show that FedLabel outperforms other semi-supervised FL baselines by $8$-$24\%$, and even outperforms standard fully supervised FL baselines ($100\%$ labeled data) with only $5$-$20\%$ of labeled data. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: To appear in the proceedings of ICCV 2023

arXiv:2305.10564 [pdf, other]

Counterfactually Comparing Abstaining Classifiers

Authors: Yo Joong Choe, Aditya Gangrade, Aaditya Ramdas

Abstract: Abstaining classifiers have the option to abstain from making predictions on inputs that they are unsure about. These classifiers are becoming increasingly popular in high-stakes decision-making problems, as they can withhold uncertain predictions to improve their reliability and safety. When evaluating black-box abstaining classifier(s), however, we lack a principled approach that accounts for wh… ▽ More Abstaining classifiers have the option to abstain from making predictions on inputs that they are unsure about. These classifiers are becoming increasingly popular in high-stakes decision-making problems, as they can withhold uncertain predictions to improve their reliability and safety. When evaluating black-box abstaining classifier(s), however, we lack a principled approach that accounts for what the classifier would have predicted on its abstentions. These missing predictions matter when they can eventually be utilized, either directly or as a backup option in a failure mode. In this paper, we introduce a novel approach and perspective to the problem of evaluating and comparing abstaining classifiers by treating abstentions as missing data. Our evaluation approach is centered around defining the counterfactual score of an abstaining classifier, defined as the expected performance of the classifier had it not been allowed to abstain. We specify the conditions under which the counterfactual score is identifiable: if the abstentions are stochastic, and if the evaluation data is independent of the training data (ensuring that the predictions are missing at random), then the score is identifiable. Note that, if abstentions are deterministic, then the score is unidentifiable because the classifier can perform arbitrarily poorly on its abstentions. Leveraging tools from observational causal inference, we then develop nonparametric and doubly robust methods to efficiently estimate this quantity under identification. Our approach is examined in both simulated and real data experiments. △ Less

Submitted 9 November, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: Accepted to NeurIPS 2023. Preliminary work presented at the ICML 2023 Workshop on Counterfactuals in Minds and Machines. Code available at https://github.com/yjchoe/ComparingAbstainingClassifiers

arXiv:2302.03109 [pdf, other]

On the Convergence of Federated Averaging with Cyclic Client Participation

Authors: Yae Jee Cho, Pranay Sharma, Gauri Joshi, Zheng Xu, Satyen Kale, Tong Zhang

Abstract: Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL). Previous convergence analyses of FedAvg either assume full client participation or partial client participation where the clients can be uniformly sampled. However, in practical cross-device FL systems, only a subset of clients that satisfy local criteria such as battery status, n… ▽ More Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL). Previous convergence analyses of FedAvg either assume full client participation or partial client participation where the clients can be uniformly sampled. However, in practical cross-device FL systems, only a subset of clients that satisfy local criteria such as battery status, network connectivity, and maximum participation frequency requirements (to ensure privacy) are available for training at a given time. As a result, client availability follows a natural cyclic pattern. We provide (to our knowledge) the first theoretical framework to analyze the convergence of FedAvg with cyclic client participation with several different client optimizers such as GD, SGD, and shuffled SGD. Our analysis discovers that cyclic client participation can achieve a faster asymptotic convergence rate than vanilla FedAvg with uniform client participation under suitable conditions, providing valuable insights into the design of client sampling protocols. △ Less

Submitted 6 February, 2023; originally announced February 2023.

arXiv:2205.14840 [pdf, other]

Maximizing Global Model Appeal in Federated Learning

Authors: Yae Jee Cho, Divyansh Jhunjhunwala, Tian Li, Virginia Smith, Gauri Joshi

Abstract: Federated learning typically considers collaboratively training a global model using local data at edge clients. Clients may have their own individual requirements, such as having a minimal training loss threshold, which they expect to be met by the global model. However, due to client heterogeneity, the global model may not meet each client's requirements, and only a small subset may find the glo… ▽ More Federated learning typically considers collaboratively training a global model using local data at edge clients. Clients may have their own individual requirements, such as having a minimal training loss threshold, which they expect to be met by the global model. However, due to client heterogeneity, the global model may not meet each client's requirements, and only a small subset may find the global model appealing. In this work, we explore the problem of the global model lacking appeal to the clients due to not being able to satisfy local requirements. We propose MaxFL, which aims to maximize the number of clients that find the global model appealing. We show that having a high global model appeal is important to maintain an adequate pool of clients for training, and can directly improve the test accuracy on both seen and unseen clients. We provide convergence guarantees for MaxFL and show that MaxFL achieves a $22$-$40\%$ and $18$-$50\%$ test accuracy improvement for the training clients and unseen clients respectively, compared to a wide range of FL modeling approaches, including those that tackle data heterogeneity, aim to incentivize clients, and learn personalized or fair models. △ Less

Submitted 4 February, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

arXiv:2204.12703 [pdf, other]

Heterogeneous Ensemble Knowledge Transfer for Training Large Models in Federated Learning

Authors: Yae Jee Cho, Andre Manoel, Gauri Joshi, Robert Sim, Dimitrios Dimitriadis

Abstract: Federated learning (FL) enables edge-devices to collaboratively learn a model without disclosing their private data to a central aggregating server. Most existing FL algorithms require models of identical architecture to be deployed across the clients and server, making it infeasible to train large models due to clients' limited system resources. In this work, we propose a novel ensemble knowledge… ▽ More Federated learning (FL) enables edge-devices to collaboratively learn a model without disclosing their private data to a central aggregating server. Most existing FL algorithms require models of identical architecture to be deployed across the clients and server, making it infeasible to train large models due to clients' limited system resources. In this work, we propose a novel ensemble knowledge transfer method named Fed-ET in which small models (different in architecture) are trained on clients, and used to train a larger model at the server. Unlike in conventional ensemble learning, in FL the ensemble can be trained on clients' highly heterogeneous data. Cognizant of this property, Fed-ET uses a weighted consensus distillation scheme with diversity regularization that efficiently extracts reliable consensus from the ensemble while improving generalization by exploiting the diversity within the ensemble. We show the generalization bound for the ensemble of weighted models trained on heterogeneous datasets that supports the intuition of Fed-ET. Our experiments on image and language tasks show that Fed-ET significantly outperforms other state-of-the-art FL algorithms with fewer communicated parameters, and is also robust against high data-heterogeneity. △ Less

Submitted 27 April, 2022; originally announced April 2022.

Comments: To appear in the proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI 2022)

arXiv:2110.00115 [pdf, other]

doi 10.1287/opre.2021.0792

Comparing Sequential Forecasters

Authors: Yo Joong Choe, Aaditya Ramdas

Abstract: Consider two forecasters, each making a single prediction for a sequence of events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post-hoc, while avoiding unverifiable assumptions on how the forecasts and outcomes were generated? In this paper, we present a rigorous answer to this question by designing novel sequential inference procedures f… ▽ More Consider two forecasters, each making a single prediction for a sequence of events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post-hoc, while avoiding unverifiable assumptions on how the forecasts and outcomes were generated? In this paper, we present a rigorous answer to this question by designing novel sequential inference procedures for estimating the time-varying difference in forecast scores. To do this, we employ confidence sequences (CS), which are sequences of confidence intervals that can be continuously monitored and are valid at arbitrary data-dependent stop** times ("anytime-valid"). The widths of our CSs are adaptive to the underlying variance of the score differences. Underlying their construction is a game-theoretic statistical framework, in which we further identify e-processes and p-processes for sequentially testing a weak null hypothesis -- whether one forecaster outperforms another on average (rather than always). Our methods do not make distributional assumptions on the forecasts or outcomes; our main theorems apply to any bounded scores, and we later provide alternative methods for unbounded scores. We empirically validate our approaches by comparing real-world baseball and weather forecasters. △ Less

Submitted 9 November, 2023; v1 submitted 30 September, 2021; originally announced October 2021.

Comments: Published in Operations Research. Code and data sources available at https://github.com/yjchoe/ComparingForecasters

arXiv:2109.08119 [pdf, other]

Personalized Federated Learning for Heterogeneous Clients with Clustered Knowledge Transfer

Authors: Yae Jee Cho, Jianyu Wang, Tarun Chiruvolu, Gauri Joshi

Abstract: Personalized federated learning (FL) aims to train model(s) that can perform well for individual clients that are highly data and system heterogeneous. Most work in personalized FL, however, assumes using the same model architecture at all clients and increases the communication cost by sending/receiving models. This may not be feasible for realistic scenarios of FL. In practice, clients have high… ▽ More Personalized federated learning (FL) aims to train model(s) that can perform well for individual clients that are highly data and system heterogeneous. Most work in personalized FL, however, assumes using the same model architecture at all clients and increases the communication cost by sending/receiving models. This may not be feasible for realistic scenarios of FL. In practice, clients have highly heterogeneous system-capabilities and limited communication resources. In our work, we propose a personalized FL framework, PerFed-CKT, where clients can use heterogeneous model architectures and do not directly communicate their model parameters. PerFed-CKT uses clustered co-distillation, where clients use logits to transfer their knowledge to other clients that have similar data-distributions. We theoretically show the convergence and generalization properties of PerFed-CKT and empirically show that PerFed-CKT achieves high test accuracy with several orders of magnitude lower communication cost compared to the state-of-the-art personalized FL schemes. △ Less

Submitted 16 September, 2021; originally announced September 2021.

arXiv:2012.08009 [pdf, other]

Bandit-based Communication-Efficient Client Selection Strategies for Federated Learning

Authors: Yae Jee Cho, Samarth Gupta, Gauri Joshi, Osman Yağan

Abstract: Due to communication constraints and intermittent client availability in federated learning, only a subset of clients can participate in each training round. While most prior works assume uniform and unbiased client selection, recent work on biased client selection has shown that selecting clients with higher local losses can improve error convergence speed. However, previously proposed biased sel… ▽ More Due to communication constraints and intermittent client availability in federated learning, only a subset of clients can participate in each training round. While most prior works assume uniform and unbiased client selection, recent work on biased client selection has shown that selecting clients with higher local losses can improve error convergence speed. However, previously proposed biased selection strategies either require additional communication cost for evaluating the exact local loss or utilize stale local loss, which can even make the model diverge. In this paper, we present a bandit-based communication-efficient client selection strategy UCB-CS that achieves faster convergence with lower communication overhead. We also demonstrate how client selection can be used to improve fairness. △ Less

Submitted 14 December, 2020; originally announced December 2020.

arXiv:2010.01243 [pdf, other]

Client Selection in Federated Learning: Convergence Analysis and Power-of-Choice Selection Strategies

Authors: Yae Jee Cho, Jianyu Wang, Gauri Joshi

Abstract: Federated learning is a distributed optimization paradigm that enables a large number of resource-limited client nodes to cooperatively train a model without data sharing. Several works have analyzed the convergence of federated learning by accounting of data heterogeneity, communication and computation limitations, and partial client participation. However, they assume unbiased client participati… ▽ More Federated learning is a distributed optimization paradigm that enables a large number of resource-limited client nodes to cooperatively train a model without data sharing. Several works have analyzed the convergence of federated learning by accounting of data heterogeneity, communication and computation limitations, and partial client participation. However, they assume unbiased client participation, where clients are selected at random or in proportion of their data sizes. In this paper, we present the first convergence analysis of federated optimization for biased client selection strategies, and quantify how the selection bias affects convergence speed. We reveal that biasing client selection towards clients with higher local loss achieves faster error convergence. Using this insight, we propose Power-of-Choice, a communication- and computation-efficient client selection framework that can flexibly span the trade-off between convergence speed and solution bias. Our experiments demonstrate that Power-of-Choice strategies converge up to 3 $\times$ faster and give $10$% higher test accuracy than the baseline random selection. △ Less

Submitted 2 October, 2020; originally announced October 2020.

arXiv:2004.05007 [pdf, other]

An Empirical Study of Invariant Risk Minimization

Authors: Yo Joong Choe, Jiyeon Ham, Kyubyong Park

Abstract: Invariant risk minimization (IRM) (Arjovsky et al., 2019) is a recently proposed framework designed for learning predictors that are invariant to spurious correlations across different training environments. Yet, despite its theoretical justifications, IRM has not been extensively tested across various settings. In an attempt to gain a better understanding of the framework, we empirically investig… ▽ More Invariant risk minimization (IRM) (Arjovsky et al., 2019) is a recently proposed framework designed for learning predictors that are invariant to spurious correlations across different training environments. Yet, despite its theoretical justifications, IRM has not been extensively tested across various settings. In an attempt to gain a better understanding of the framework, we empirically investigate several research questions using IRMv1, which is the first practical algorithm proposed to approximately solve IRM. By extending the ColoredMNIST experiment in different ways, we find that IRMv1 (i) performs better as the spurious correlation varies more widely between training environments, (ii) learns an approximately invariant predictor when the underlying relationship is approximately invariant, and (iii) can be extended to an analogous setting for text classification. △ Less

Submitted 6 July, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

Comments: Presented at the ICML 2020 Workshop on Uncertainty and Robustness in Deep Learning. Code at https://github.com/kakaobrain/irm-empirical-study

arXiv:2004.03289 [pdf, other]

KorNLI and KorSTS: New Benchmark Datasets for Korean Natural Language Understanding

Authors: Jiyeon Ham, Yo Joong Choe, Kyubyong Park, Ilji Choi, Hyungjoon Soh

Abstract: Natural language inference (NLI) and semantic textual similarity (STS) are key tasks in natural language understanding (NLU). Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language. Motivated by this, we construct and release new datasets for Korean NLI and STS, dubbed K… ▽ More Natural language inference (NLI) and semantic textual similarity (STS) are key tasks in natural language understanding (NLU). Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no publicly available NLI or STS datasets in the Korean language. Motivated by this, we construct and release new datasets for Korean NLI and STS, dubbed KorNLI and KorSTS, respectively. Following previous approaches, we machine-translate existing English training sets and manually translate development and test sets into Korean. To accelerate research on Korean NLU, we also establish baselines on KorNLI and KorSTS. Our datasets are publicly available at https://github.com/kakaobrain/KorNLUDatasets. △ Less

Submitted 5 October, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

Comments: Findings of EMNLP 2020. Datasets available at https://github.com/kakaobrain/KorNLUDatasets

arXiv:1911.12071 [pdf, other]

Jejueo Datasets for Machine Translation and Speech Synthesis

Authors: Kyubyong Park, Yo Joong Choe, Jiyeon Ham

Abstract: Jejueo was classified as critically endangered by UNESCO in 2010. Although diverse efforts to revitalize it have been made, there have been few computational approaches. Motivated by this, we construct two new Jejueo datasets: Jejueo Interview Transcripts (JIT) and Jejueo Single Speaker Speech (JSS). The JIT dataset is a parallel corpus containing 170k+ Jejueo-Korean sentences, and the JSS dataset… ▽ More Jejueo was classified as critically endangered by UNESCO in 2010. Although diverse efforts to revitalize it have been made, there have been few computational approaches. Motivated by this, we construct two new Jejueo datasets: Jejueo Interview Transcripts (JIT) and Jejueo Single Speaker Speech (JSS). The JIT dataset is a parallel corpus containing 170k+ Jejueo-Korean sentences, and the JSS dataset consists of 10k high-quality audio files recorded by a native Jejueo speaker and a transcript file. Subsequently, we build neural systems of machine translation and speech synthesis using them. All resources are publicly available via our GitHub repository. We hope that these datasets will attract interest of both language and machine learning communities. △ Less

Submitted 27 November, 2019; originally announced November 2019.

arXiv:1911.12019 [pdf, other]

word2word: A Collection of Bilingual Lexicons for 3,564 Language Pairs

Authors: Yo Joong Choe, Kyubyong Park, Dongwoo Kim

Abstract: We present word2word, a publicly available dataset and an open-source Python package for cross-lingual word translations extracted from sentence-level parallel corpora. Our dataset provides top-k word translations in 3,564 (directed) language pairs across 62 languages in OpenSubtitles2018 (Lison et al., 2018). To obtain this dataset, we use a count-based bilingual lexicon extraction model based on… ▽ More We present word2word, a publicly available dataset and an open-source Python package for cross-lingual word translations extracted from sentence-level parallel corpora. Our dataset provides top-k word translations in 3,564 (directed) language pairs across 62 languages in OpenSubtitles2018 (Lison et al., 2018). To obtain this dataset, we use a count-based bilingual lexicon extraction model based on the observation that not only source and target words but also source words themselves can be highly correlated. We illustrate that the resulting bilingual lexicons have high coverage and attain competitive translation quality for several language pairs. We wrap our dataset and model in an easy-to-use Python library, which supports downloading and retrieving top-k word translations in any of the supported language pairs as well as computing top-k word translations for custom parallel corpora. △ Less

Submitted 27 November, 2019; originally announced November 2019.

arXiv:1907.01256 [pdf, other]

A Neural Grammatical Error Correction System Built On Better Pre-training and Sequential Transfer Learning

Authors: Yo Joong Choe, Jiyeon Ham, Kyubyong Park, Yeoil Yoon

Abstract: Grammatical error correction can be viewed as a low-resource sequence-to-sequence task, because publicly available parallel corpora are limited. To tackle this challenge, we first generate erroneous versions of large unannotated corpora using a realistic noising function. The resulting parallel corpora are subsequently used to pre-train Transformer models. Then, by sequentially applying transfer l… ▽ More Grammatical error correction can be viewed as a low-resource sequence-to-sequence task, because publicly available parallel corpora are limited. To tackle this challenge, we first generate erroneous versions of large unannotated corpora using a realistic noising function. The resulting parallel corpora are subsequently used to pre-train Transformer models. Then, by sequentially applying transfer learning, we adapt these models to the domain and style of the test set. Combined with a context-aware neural spellchecker, our system achieves competitive results in both restricted and low resource tracks in ACL 2019 BEA Shared Task. We release all of our code and materials for reproducibility. △ Less

Submitted 2 July, 2019; originally announced July 2019.

Comments: Accepted to ACL 2019 Workshop on Innovative Use of NLP for Building Educational Applications (BEA)

arXiv:1904.08144 [pdf, other]

Predicting drug-target interaction using 3D structure-embedded graph representations from graph neural networks

Authors: Jaechang Lim, Seongok Ryu, Kyubyong Park, Yo Joong Choe, Jiyeon Ham, Woo Youn Kim

Abstract: Accurate prediction of drug-target interaction (DTI) is essential for in silico drug design. For the purpose, we propose a novel approach for predicting DTI using a GNN that directly incorporates the 3D structure of a protein-ligand complex. We also apply a distance-aware graph attention algorithm with gate augmentation to increase the performance of our model. As a result, our model shows better… ▽ More Accurate prediction of drug-target interaction (DTI) is essential for in silico drug design. For the purpose, we propose a novel approach for predicting DTI using a GNN that directly incorporates the 3D structure of a protein-ligand complex. We also apply a distance-aware graph attention algorithm with gate augmentation to increase the performance of our model. As a result, our model shows better performance than docking and other deep learning methods for both virtual screening and pose prediction. In addition, our model can reproduce the natural population distribution of active molecules and inactive molecules. △ Less

Submitted 17 April, 2019; originally announced April 2019.

Comments: 20 pages, 2 figures

arXiv:1904.07591 [pdf, ps, other]

Golden ratio algorithms with new stepsize rules for variational inequalities

Authors: Dang Van Hieu, Yeol Je Cho, Yi-bin Xiao

Abstract: In this paper, we introduce two golden ratio algorithms with new stepsize rules for solving pseudomonotone and Lipschitz variational inequalities in finite dimensional Hilbert spaces. The presented stepsize rules allow the resulting algorithms to work without the prior knowledge of the Lipschitz constant of operator. The first algorithm uses a sequence of stepsizes which is previously chosen, dimi… ▽ More In this paper, we introduce two golden ratio algorithms with new stepsize rules for solving pseudomonotone and Lipschitz variational inequalities in finite dimensional Hilbert spaces. The presented stepsize rules allow the resulting algorithms to work without the prior knowledge of the Lipschitz constant of operator. The first algorithm uses a sequence of stepsizes which is previously chosen, diminishing and non-summable. While the stepsizes in the second one are updated at each iteration and by a simple computation. A special point is that the sequence of stepsizes generated by the second algorithm is separated from zero. The convergence as well as the convergence rate of the proposed algorithms are established under some standard conditions. Also, we give several numerical results to show the behavior of the algorithms in comparisons with other algorithms. △ Less

Submitted 16 April, 2019; originally announced April 2019.

Comments: 19 pages, 4 figures (Accepted for publication on April 16, 2019)

MSC Class: 65J15; 47H05; 47J25; 47J20; 91B50

Journal ref: Mathematical Methods in the Applied Sciences (April, 2019)

arXiv:1902.07249 [pdf, other]

Discovery of Natural Language Concepts in Individual Units of CNNs

Authors: Seil Na, Yo Joong Choe, Dong-Hyun Lee, Gunhee Kim

Abstract: Although deep convolutional networks have achieved improved performance in many natural language tasks, they have been treated as black boxes because they are difficult to interpret. Especially, little is known about how they represent language in their intermediate layers. In an attempt to understand the representations of deep convolutional networks trained on language tasks, we show that indivi… ▽ More Although deep convolutional networks have achieved improved performance in many natural language tasks, they have been treated as black boxes because they are difficult to interpret. Especially, little is known about how they represent language in their intermediate layers. In an attempt to understand the representations of deep convolutional networks trained on language tasks, we show that individual units are selectively responsive to specific morphemes, words, and phrases, rather than responding to arbitrary and uninterpretable patterns. In order to quantitatively analyze such an intriguing phenomenon, we propose a concept alignment method based on how units respond to the replicated text. We conduct analyses with different architectures on multiple datasets for classification and translation tasks and provide new insights into how deep models understand natural language. △ Less

Submitted 28 February, 2019; v1 submitted 18 February, 2019; originally announced February 2019.

Comments: Published as a conference paper at ICLR 2019

arXiv:1901.05211 [pdf]

doi 10.1038/s41467-018-08227-1

Planar and van der Waals heterostructures for vertical tunnelling single electron transistors

Authors: Gwangwoo Kim, Sung-Soo Kim, Jonghyuk Jeon, Seong In Yoon, Seokmo Hong, Young ** Cho, Abhishek Misra, Servet Ozdemir, Jun Yin, Davit Ghazaryan, Mathew Holwill, Artem Mishchenko, Daria V. Andreeva, Yong-** Kim, Hu Young Jeong, A-Rang Jang, Hyun-Jong Chung, Andre K. Geim, Kostya S. Novoselov, Byeong-Hyeok Sohn, Hyeon Suk Shin

Abstract: Despite a rich choice of two-dimensional materials, which exists these days, heterostructures, both vertical (van der Waals) and in-plane, offer an unprecedented control over the properties and functionalities of the resulted structures. Thus, planar heterostructures allow p-n junctions between different two-dimensional semiconductors and graphene nanoribbons with well-defined edges; and vertical… ▽ More Despite a rich choice of two-dimensional materials, which exists these days, heterostructures, both vertical (van der Waals) and in-plane, offer an unprecedented control over the properties and functionalities of the resulted structures. Thus, planar heterostructures allow p-n junctions between different two-dimensional semiconductors and graphene nanoribbons with well-defined edges; and vertical heterostructures resulted in the observation of superconductivity in purely carbon-based systems and realisation of vertical tunnelling transistors. Here we demonstrate simultaneous use of in-plane and van der Waals heterostructures to build vertical single electron tunnelling transistors. We grow graphene quantum dots inside the matrix of hexagonal boron nitride, which allows a dramatic reduction of the number of localised states along the perimeter of the quantum dots. The use of hexagonal boron nitride tunnel barriers as contacts to the graphene quantum dots make our transistors reproducible and not dependent on the localised states, opening even larger flexibility when designing future devices. △ Less

Submitted 16 January, 2019; originally announced January 2019.

Journal ref: Nature Communications, Volume 10, Article number 230 (2019)

arXiv:1811.11954 [pdf, other]

Approximating Fixed points of Bregman Generalized $α$-nonexpansive map**s

Authors: K. Muangchoo, P. Kumam, Y. J. Cho, S. Dhompongsa

Abstract: In this paper, we introduce a new class of Bregman generalized $α$-nonexpansive map**s in terms of Bregman distances, and investigate the Ishikawa and Noor iterations for these map**s. We establish weak and strong convergence theorems of Ishikawa and Noor iterative schemes for Bregman generalized $α$-nonexpansive map**s in Banach spaces. Furthermore, we propose an example of our generated ma… ▽ More In this paper, we introduce a new class of Bregman generalized $α$-nonexpansive map**s in terms of Bregman distances, and investigate the Ishikawa and Noor iterations for these map**s. We establish weak and strong convergence theorems of Ishikawa and Noor iterative schemes for Bregman generalized $α$-nonexpansive map**s in Banach spaces. Furthermore, we propose an example of our generated map** and some numerical examples which support our main theorem. Our results are new and improve the recent ones in the literature. △ Less

Submitted 28 November, 2018; originally announced November 2018.

arXiv:1805.09252 [pdf, other]

V2X Downlink Coverage Analysis with a Realistic Urban Vehicular Model

Authors: Yae Jee Cho, Kaibin Huang, Chan-Byoung Chae

Abstract: As the realization of vehicular communication such as vehicle-to-vehicle (V2V) or vehicle-to-infrastructure (V2I) is imperative for the autonomous driving cars, the understanding of realistic vehicle-to-everything (V2X) models is needed. While previous research has mostly targeted vehicular models in which vehicles are randomly distributed and the variable of carrier frequency was not considered,… ▽ More As the realization of vehicular communication such as vehicle-to-vehicle (V2V) or vehicle-to-infrastructure (V2I) is imperative for the autonomous driving cars, the understanding of realistic vehicle-to-everything (V2X) models is needed. While previous research has mostly targeted vehicular models in which vehicles are randomly distributed and the variable of carrier frequency was not considered, a more realistic analysis of the V2X model is proposed in this paper. We use a one-dimensional (1D) Poisson cluster process (PCP) to model a realistic scenario of vehicle distribution in a perpendicular cross line road urban area and compare the coverage results with the previous research that distributed vehicles randomly by Poisson Point Process (PPP). Moreover, we incorporate the effect of different carrier frequencies, mmWave and sub-6 GHz, to our analysis by altering the antenna radiation pattern accordingly. Results indicated that while the effect of clustering led to lower outage, using mmWave had even more significance in leading to lower outage. Moreover, line-of-sight (LoS) interference links are shown to be more dominant in lowering the outage than the non-line-of-sight (NLoS) links even though they are less in number. The analytical results give insight into designing and analyzing the urban V2X channels, and are verified by actual urban area three-dimensional (3D) ray-tracing simulation. △ Less

Submitted 25 June, 2018; v1 submitted 10 May, 2018; originally announced May 2018.

arXiv:1804.08154 [pdf, other]

Local White Matter Architecture Defines Functional Brain Dynamics

Authors: Yo Joong Choe, Sivaraman Balakrishnan, Aarti Singh, Jean M. Vettel, Timothy Verstynen

Abstract: Large bundles of myelinated axons, called white matter, anatomically connect disparate brain regions together and compose the structural core of the human connectome. We recently proposed a method of measuring the local integrity along the length of each white matter fascicle, termed the local connectome. If communication efficiency is fundamentally constrained by the integrity along the entire le… ▽ More Large bundles of myelinated axons, called white matter, anatomically connect disparate brain regions together and compose the structural core of the human connectome. We recently proposed a method of measuring the local integrity along the length of each white matter fascicle, termed the local connectome. If communication efficiency is fundamentally constrained by the integrity along the entire length of a white matter bundle, then variability in the functional dynamics of brain networks should be associated with variability in the local connectome. We test this prediction using two statistical approaches that are capable of handling the high dimensionality of data. First, by performing statistical inference on distance-based correlations, we show that similarity in the local connectome between individuals is significantly correlated with similarity in their patterns of functional connectivity. Second, by employing variable selection using sparse canonical correlation analysis and cross-validation, we show that segments of the local connectome are predictive of certain patterns of functional brain dynamics. These results are consistent with the hypothesis that structural variability along axon bundles constrains communication between disparate brain regions. △ Less

Submitted 16 September, 2018; v1 submitted 22 April, 2018; originally announced April 2018.

Comments: Accepted to the 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC 2018)

arXiv:1801.07167 [pdf, ps, other]

RF Lens-Embedded Antenna Array for mmWave MIMO: Design and Performance

Authors: Yae Jee Cho, Gee-Yong Suk, Byoungnam Kim, Dong Ku Kim, Chan-Byoung Chae

Abstract: The requirement of high data-rate in the fifth generation wireless systems (5G) calls for the ultimate utilization of the wide bandwidth in the mmWave frequency band. Researchers seeking to compensate for mmWave's high path loss and to achieve both gain and directivity have proposed that mmWave multiple-input multiple-output (MIMO) systems make use of beamforming systems. Hybrid beamforming in mmW… ▽ More The requirement of high data-rate in the fifth generation wireless systems (5G) calls for the ultimate utilization of the wide bandwidth in the mmWave frequency band. Researchers seeking to compensate for mmWave's high path loss and to achieve both gain and directivity have proposed that mmWave multiple-input multiple-output (MIMO) systems make use of beamforming systems. Hybrid beamforming in mmWave demonstrates promising performance in achieving high gain and directivity by using phase shifters at the analog processing block. What remains a problem, however, is the actual implementation of mmWave beamforming systems; to fabricate such a system is costly and complex. With the aim of reducing such cost and complexity, this article presents actual prototypes of the lens antenna as an effective device to be used in the future 5G mmWave hybrid beamforming systems. Using a lens as a passive phase shifter enables beamforming without the heavy network of active phase shifters, while gain and directivity are achieved by the energy-focusing property of the lens. Proposed in this article are two types of lens antennas, one for static and the other for mobile usage. Their performance is evaluated using measurements and simulation data along with link-level analysis via a software defined radio (SDR) platform. Results show the promising potential of the lens antenna for its high gain and directivity, and its improved beam-switching feasibility compared to when a lens is not used. System-level evaluations reveal the significant throughput enhancement in both real indoor and outdoor environments. Moreover, the lens antenna's design issues are also discussed by evaluating different lens sizes. △ Less

Submitted 22 January, 2018; originally announced January 2018.

arXiv:1711.09052 [pdf, other]

Map-based Millimeter-Wave Channel Models: An Overview, Hybrid Modeling, Data, and Learning

Authors: Yeon-Geun Lim, Yae Jee Cho, MinSoo Sim, Younsun Kim, Chan-Byoung Chae, Reinaldo A. Valenzuela

Abstract: Compared to the current wireless communication systems, millimeter wave (mm-Wave) promises a wide range of spectrum. As viable alternatives to existing mm-Wave channel models, various map-based channel models with different modeling methods have been widely discussed. Map-based channel models are based on a ray-tracing algorithm and include realistic channel parameters in a given map. Such paramet… ▽ More Compared to the current wireless communication systems, millimeter wave (mm-Wave) promises a wide range of spectrum. As viable alternatives to existing mm-Wave channel models, various map-based channel models with different modeling methods have been widely discussed. Map-based channel models are based on a ray-tracing algorithm and include realistic channel parameters in a given map. Such parameters enable researchers to accurately evaluate novel technologies in the mm-Wave range. Diverse map-based modeling methods result in different modeling objectives, including the characteristics of channel parameters and different complexities of the modeling procedure. This article outlines an overview of map-based mm-Wave channel models and proposes a concept of how they can be utilized to integrate a hardware testbed/sounder with a software testbed/sounder. In addition, we categorize map-based channel parameters and provide guidelines for hybrid modeling. Next, we share the measurement data and the map-based channel parameters with the public. Lastly, we evaluate a machine learning-based beam selection algorithm through the shared database. We expect that the offered guidelines and the shared database will enable researchers to readily design a map-based channel model. △ Less

Submitted 10 July, 2019; v1 submitted 24 November, 2017; originally announced November 2017.

arXiv:1707.00227 [pdf, other]

Relationship between Cross-Polarization Discrimination (XPD) and Spatial Correlation in Indoor Small-Cell MIMO Systems

Authors: Yeon-Geun Lim, Yae Jee Cho, TaeckKeun Oh, Yongshik Lee, Chan-Byoung Chae

Abstract: In this letter, we present a correlated channel model for a dual-polarization antenna to omnidirectional antennas in indoor small-cell multiple-input multiple-output (MIMO) systems. In an indoor environment, we confirm that the cross-polarization discrimination (XPD) in the direction of angle-of-departure can be represented as the spatial correlation of the MIMO channel. We also evaluate a dual-po… ▽ More In this letter, we present a correlated channel model for a dual-polarization antenna to omnidirectional antennas in indoor small-cell multiple-input multiple-output (MIMO) systems. In an indoor environment, we confirm that the cross-polarization discrimination (XPD) in the direction of angle-of-departure can be represented as the spatial correlation of the MIMO channel. We also evaluate a dual-polarization antenna-based MIMO channel model and a spatially correlated channel model using a three-dimensional (3D) ray-tracing simulator. Furthermore, we provide the equivalent distance between adjacent antennas according to the XPD, providing insights into designing a dual-polarization antenna and its arrays. △ Less

Submitted 6 December, 2017; v1 submitted 1 July, 2017; originally announced July 2017.

arXiv:1706.01634 [pdf, ps, other]

Random fixed point theorems for Hardy-Rogers self-random operators with applications to random integral equations

Authors: Plern Saipara, Poom Kumam, Yeol Je Cho

Abstract: In this paper, we prove some random fixed point theorems for Hardy-Rogers self-random operators in separable Banach spaces and, as some applications, we show the existence of a solution for random nonlinear integral equations in Banach spaces. Some stochastic versions of deterministic fixed point theorems for Hardy-Rogers self map**s and stochastic integral equations are obtained. In this paper, we prove some random fixed point theorems for Hardy-Rogers self-random operators in separable Banach spaces and, as some applications, we show the existence of a solution for random nonlinear integral equations in Banach spaces. Some stochastic versions of deterministic fixed point theorems for Hardy-Rogers self map**s and stochastic integral equations are obtained. △ Less

Submitted 6 June, 2017; originally announced June 2017.

Comments: 14

MSC Class: 60H25; 47H09; 47H10; 41A50

Journal ref: Stochastics (2017) 1-14

arXiv:1703.06384 [pdf, other]

Effective Enzyme Deployment for Degradation of Interference Molecules in Molecular Communication

Authors: Yae Jee Cho, H. Birkan Yilmaz, Weisi Guo, Chan-Byoung Chae

Abstract: In molecular communication, the heavy tail nature of molecular signals causes inter-symbol interference (ISI). Because of this, it is difficult to decrease symbol periods and achieve high data rate. As a probable solution for ISI mitigation, enzymes were proposed to be used since they are capable of degrading ISI molecules without deteriorating the molecular communication. While most prior work ha… ▽ More In molecular communication, the heavy tail nature of molecular signals causes inter-symbol interference (ISI). Because of this, it is difficult to decrease symbol periods and achieve high data rate. As a probable solution for ISI mitigation, enzymes were proposed to be used since they are capable of degrading ISI molecules without deteriorating the molecular communication. While most prior work has assumed an infinite amount of enzymes deployed around the channel, from a resource perspective, it is more efficient to deploy a limited amount of enzymes at particular locations and structures. This paper considers carrying out such deployment at two structures--around the receiver (Rx) and/or the transmitter (Tx) site. For both of the deployment scenarios, channels with different system environment parameters, Tx-to-Rx distance, size of enzyme area, and symbol period, are compared with each other for analyzing an optimized system environment for ISI mitigation when a limited amount of enzymes are available. △ Less

Submitted 18 March, 2017; originally announced March 2017.

arXiv:1611.06079 [pdf, other]

A Machine Learning Approach to Model the Received Signal in Molecular Communications

Authors: H. Birkan Yilmaz, Changmin Lee, Yae Jee Cho, Chan-Byoung Chae

Abstract: A molecular communication channel is determined by the received signal. Received signal models form the basis for studies focused on modulation, receiver design, capacity, and coding depend on the received signal models. Therefore, it is crucial to model the number of received molecules until time $t$ analytically. Modeling the diffusion-based molecular communication channel with the first-hitting… ▽ More A molecular communication channel is determined by the received signal. Received signal models form the basis for studies focused on modulation, receiver design, capacity, and coding depend on the received signal models. Therefore, it is crucial to model the number of received molecules until time $t$ analytically. Modeling the diffusion-based molecular communication channel with the first-hitting process is an open issue for a spherical transmitter. In this paper, we utilize the artificial neural networks technique to model the received signal for a spherical transmitter and a perfectly absorbing receiver (i.e., first hitting process). The proposed technique may be utilized in other studies that assume a spherical transmitter instead of a point transmitter. △ Less

Submitted 18 November, 2016; originally announced November 2016.

arXiv:1604.05018 [pdf, other]

Effective inter-symbol interference mitigation with a limited amount of enzymes in molecular communications

Authors: Yae Jee Cho, H. Birkan Yilmaz, Weisi Guo, Chan-Byoung Chae

Abstract: In molecular communication via diffusion (MCvD), the inter-symbol interference (ISI) is a well known severe problem that deteriorates both data rates and link reliability. ISI mainly occurs due to the slow and highly random propagation of the messenger molecules, which causes the emitted molecules from the previous symbols to interfere with molecules from the current symbol. An effective way to mi… ▽ More In molecular communication via diffusion (MCvD), the inter-symbol interference (ISI) is a well known severe problem that deteriorates both data rates and link reliability. ISI mainly occurs due to the slow and highly random propagation of the messenger molecules, which causes the emitted molecules from the previous symbols to interfere with molecules from the current symbol. An effective way to mitigate the ISI is using enzymes to degrade undesired molecules. Prior work on ISI mitigation by enzymes has assumed an infinite amount of enzymes randomly distributed around the molecular channel. Taking a different approach, this paper assumes an MCvD channel with a limited amount of enzymes. The main question this paper addresses is how to deploy these enzymes in an effective structure so that ISI mitigation is maximized. To find an effective MCvD channel environment, this study considers optimization of the shape of the transmitter node, the deployment location and structure, the size of the enzyme deployed area, and the half-lives of the enzymes. It also analyzes the dependence of the optimum size of the enzyme area on the distance and half-life. △ Less

Submitted 18 April, 2016; originally announced April 2016.

arXiv:0809.0955 [pdf, ps, other]

doi 10.1103/PhysRevLett.101.237202

Carrier-mediated antiferromagnetic interlayer exchange coupling in diluted magnetic semiconductor multilayers Ga$_{1-x}$Mn$_x$As/GaAs:Be

Authors: J. -H. Chung, S. J. Chung, Sanghoon Lee, B. J. Kirby, J. A. Borchers, Y. J. Cho, X. Liu, J. K. Furdyna

Abstract: We use neutron reflectometry to investigate the interlayer exchange coupling between Ga$_{0.97}$Mn$_{0.03}$As ferromagnetic semiconductor layers separated by non-magnetic Be-doped GaAs spacers. Polarized neutron reflectivity measured below the Curie temperature of Ga$_{0.97}$Mn$_{0.03}$As reveals a characteristic splitting at the wave vector corresponding to twice the multilayer period, indicati… ▽ More We use neutron reflectometry to investigate the interlayer exchange coupling between Ga$_{0.97}$Mn$_{0.03}$As ferromagnetic semiconductor layers separated by non-magnetic Be-doped GaAs spacers. Polarized neutron reflectivity measured below the Curie temperature of Ga$_{0.97}$Mn$_{0.03}$As reveals a characteristic splitting at the wave vector corresponding to twice the multilayer period, indicating that the coupling between the ferromagnetic layers are antiferromagnetic (AFM). When the applied field is increased to above the saturation field, this AFM coupling is suppressed. This behavior is not observed when the spacers are undoped, suggesting that the observed AFM coupling is mediated by charge carriers introduced via Be do**. The behavior of magnetization of the multilayers measured by DC magnetometry is consistent with the neutron reflectometry results. △ Less

Submitted 5 September, 2008; originally announced September 2008.

Comments: 4 pages, 4 figures

arXiv:0802.3871 [pdf, ps, other]

Electron-mediated ferromagnetism and small spin-orbit interaction in a molecular-beam-epitaxy grown n-type $GaAs/Al_{0.3}Ga_{0.7}As$ heterostructure with Mn $δ$-do**

Authors: A. Bove, F. Altomare, N. B. Kundtz, Albert M. Chang, Y. J. Cho, X. Liu, J. Furdyna

Abstract: We report the first evidence of electron-mediated ferromagnetism in a molecular-beam-epitaxy (MBE) grown $GaAs/Al_{0.3}Ga_{0.7}As$ heterostructure with Mn $δ$-do**. The interaction between the magnetic dopants (Mn) and the Two-Dimensional Electron Gas (2DEG) realizes magnetic ordering when the temperature is below the Curie temperature ($T_{C} \sim 1.7K$) and the 2DEG is brought in close proxi… ▽ More We report the first evidence of electron-mediated ferromagnetism in a molecular-beam-epitaxy (MBE) grown $GaAs/Al_{0.3}Ga_{0.7}As$ heterostructure with Mn $δ$-do**. The interaction between the magnetic dopants (Mn) and the Two-Dimensional Electron Gas (2DEG) realizes magnetic ordering when the temperature is below the Curie temperature ($T_{C} \sim 1.7K$) and the 2DEG is brought in close proximity to the Mn layer by gating. The Anomalous Hall Effect (AHE) contribution to the total Hall resistance is shown to be about three to four orders of magnitude smaller than in the case of hole-mediated ferromagnetism indicating the presence of small spin-orbit interaction. △ Less

Submitted 2 July, 2008; v1 submitted 26 February, 2008; originally announced February 2008.

Comments: 4 pages with 4 figures, to be submitted to Physical Review Letters, revised version changed some content

arXiv:0802.3863 [pdf, ps, other]

A novel technique to make Ohmic contact to a buried two-dimensional electron gas in a molecular-beam-epitaxy grown $GaAs/Al_{0.3}Ga_{0.7}As$ heterostructure with Mn $δ$-do**

Authors: A. Bove, F. Altomare, N. B. Kundtz, Albert M. Chang, Y. J. Cho, X. Liu, J. Furdyna

Abstract: We report on the growth and characterization of a new Diluted Magnetic Semiconductor (DMS) heterostructure that presents a Two-Dimensional Electron Gas (2DEG) with a carrier density $n \sim 1.08 \times 10^{12} cm^{-2}$ and a mobility $μ\sim 600 cm^{2} / (Vs)$ at T $\sim$ 4.2K. As far as we know this is the highest mobility value reported in the literature for GaMnAs systems. A novel technique wa… ▽ More We report on the growth and characterization of a new Diluted Magnetic Semiconductor (DMS) heterostructure that presents a Two-Dimensional Electron Gas (2DEG) with a carrier density $n \sim 1.08 \times 10^{12} cm^{-2}$ and a mobility $μ\sim 600 cm^{2} / (Vs)$ at T $\sim$ 4.2K. As far as we know this is the highest mobility value reported in the literature for GaMnAs systems. A novel technique was developed to make Ohmic contact to the buried 2DEG without destroying the magnetic properties of our crystal. △ Less

Submitted 18 July, 2008; v1 submitted 26 February, 2008; originally announced February 2008.

Comments: 5 pages with 6 figures, to be submitted to Journal of Applied Physics, worked on figures

arXiv:0708.2289 [pdf]

doi 10.1103/PhysRevB.76.205316

Definitive Evidence of Interlayer Coupling Between (Ga,Mn)As Layers Separated by a Nonmagnetic Spacer

Authors: B. J. Kirby, J. A. Borchers, X. Liu, Y. J. Cho, M. Dobrowolska, J. K. Furdyna

Abstract: We have used polarized neutron reflectometry to study the structural and magnetic properties of the individual layers in a series of (Al,Be,Ga)As/(Ga,Mn)As/GaAs/(Ga,Mn)As multilayer samples. Structurally, we observe that the samples are virtually identical except for the GaAs spacer thickness (which varies from 3-12 nm), and confirm that the spacers contain little or no Mn. Magnetically, we obse… ▽ More We have used polarized neutron reflectometry to study the structural and magnetic properties of the individual layers in a series of (Al,Be,Ga)As/(Ga,Mn)As/GaAs/(Ga,Mn)As multilayer samples. Structurally, we observe that the samples are virtually identical except for the GaAs spacer thickness (which varies from 3-12 nm), and confirm that the spacers contain little or no Mn. Magnetically, we observe that for the sample with the thickest spacer layer, modulation do** by the(Al,Be,Ga)As results in (Ga,Mn)As layers with very different temperature dependent magnetizations. However, as the spacer layer thickness is reduced, the temperature dependent magnetizations of the top an bottom (Ga,Mn)As layers become progressively more similar - a trend we find to be independent of the crystallographic direction along which spins are magnetized. These results definitively show that (Ga,Mn)As layers can couple across a non-magnetic spacer, and that such coupling depends on spacer thickness. △ Less

Submitted 10 September, 2007; v1 submitted 16 August, 2007; originally announced August 2007.

Comments: Submitted to Physical Review B

arXiv:math/0309117 [pdf, ps, other]

Refinements of Some Reverses of Schwarz's Inequality in 2-Inner Product Spaces and Applications for Integrals

Authors: P. Cerone, Y. J. Cho, S. S. Dragomir, S. S. Kim

Abstract: Refinements of some recent reverse inequalities for the celebrated Cauchy-Bunyakovsky-Schwarz inequality in 2-inner product spaces are given. Using this framework, applications for determinantal integral inequalities are also provided. Refinements of some recent reverse inequalities for the celebrated Cauchy-Bunyakovsky-Schwarz inequality in 2-inner product spaces are given. Using this framework, applications for determinantal integral inequalities are also provided. △ Less

Submitted 5 September, 2003; originally announced September 2003.

MSC Class: 46C05; 26D15

arXiv:math/0309062 [pdf, ps, other]

Norm Estimates for the Difference Between Bochner's Integral and the Convex Combination of Function's Values

Authors: P. Cerone, Y. J. Cho, S. S. Dragomir, J. K. Kim, S. S. Kim

Abstract: Norm estimates are developed between the Bochner integral of a vector-valued function in Banach spaces having the Radon-Nikodym property and the convex combination of function values taken on a division of the interval [a,b]. Norm estimates are developed between the Bochner integral of a vector-valued function in Banach spaces having the Radon-Nikodym property and the convex combination of function values taken on a division of the interval [a,b]. △ Less

Submitted 4 September, 2003; originally announced September 2003.

MSC Class: 26D15; 41A55

arXiv:math/0309032 [pdf, ps, other]

On Some Gronwall Type Inequalities Involving Iterated Integrals

Authors: Y. J. Cho, S. S. Dragomir, Y. -H. Kim

Abstract: In this paper, some new Gronwall type inequalities involving iterated integrals are given. In this paper, some new Gronwall type inequalities involving iterated integrals are given. △ Less

Submitted 2 September, 2003; originally announced September 2003.

MSC Class: 26D15; 35A05

arXiv:math/0309028 [pdf, ps, other]

Some Reverses of the Cauchy-Bunyakovsky-Schwarz Inequality in 2-Inner Product Spaces

Authors: S. S. Dragomir, Y. J. Cho, S. S. Kim

Abstract: In this paper, some reverses of the Cauchy-Bunyakovsky-Schwarz inequality in 2-inner product spaces are given. Using this framework, some applications for determinantal integral inequalities are also provided. In this paper, some reverses of the Cauchy-Bunyakovsky-Schwarz inequality in 2-inner product spaces are given. Using this framework, some applications for determinantal integral inequalities are also provided. △ Less

Submitted 1 September, 2003; originally announced September 2003.

MSC Class: 46C05; 46C99; 26D15; 26D10

arXiv:math/0308270 [pdf, ps, other]

Some Boas-Bellman Type Inequalities in 2-Inner Product Spaces

Authors: S. S. Dragomir, Y. J. Cho, S. S. Kim, A. Sofo

Abstract: Some inequalities in 2-inner product spaces generalizing Bessel's result that are similar to the Boas-Bellman inequality from inner product spaces, are given. Applications for determinantal integral inequalities are also provided. Some inequalities in 2-inner product spaces generalizing Bessel's result that are similar to the Boas-Bellman inequality from inner product spaces, are given. Applications for determinantal integral inequalities are also provided. △ Less

Submitted 27 August, 2003; originally announced August 2003.

MSC Class: 46C05; 46C99; 26D15; 26D10

Showing 1–41 of 41 results for author: Choe, Y J