Search | arXiv e-print repository

2BP: 2-Stage Backpropagation

Authors: Christopher Rae, Joseph K. L. Lee, James Richings

Abstract: As Deep Neural Networks (DNNs) grow in size and complexity, they often exceed the memory capacity of a single accelerator, necessitating the sharding of model parameters across multiple accelerators. Pipeline parallelism is a commonly used sharding strategy for training large DNNs. However, current implementations of pipeline parallelism are being unintentionally bottlenecked by the automatic diff… ▽ More As Deep Neural Networks (DNNs) grow in size and complexity, they often exceed the memory capacity of a single accelerator, necessitating the sharding of model parameters across multiple accelerators. Pipeline parallelism is a commonly used sharding strategy for training large DNNs. However, current implementations of pipeline parallelism are being unintentionally bottlenecked by the automatic differentiation tools provided by ML frameworks. This paper introduces 2-stage backpropagation (2BP). By splitting the backward propagation step into two separate stages, we can reduce idle compute time. We tested 2BP on various model architectures and pipelining schedules, achieving increases in throughput in all cases. Using 2BP, we were able to achieve a 1.70x increase in throughput compared to traditional methods when training a LLaMa-like transformer with 7 billion parameters across 4 GPUs. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2404.10536 [pdf, ps, other]

Benchmarking Machine Learning Applications on Heterogeneous Architecture using Reframe

Authors: Christopher Rae, Joseph K. L. Lee, James Richings, Michele Weiland

Abstract: With the rapid increase in machine learning workloads performed on HPC systems, it is beneficial to regularly perform machine learning specific benchmarks to monitor performance and identify issues. Furthermore, as part of the Edinburgh International Data Facility, EPCC currently hosts a wide range of machine learning accelerators including Nvidia GPUs, the Graphcore Bow Pod64 and Cerebras CS-2, w… ▽ More With the rapid increase in machine learning workloads performed on HPC systems, it is beneficial to regularly perform machine learning specific benchmarks to monitor performance and identify issues. Furthermore, as part of the Edinburgh International Data Facility, EPCC currently hosts a wide range of machine learning accelerators including Nvidia GPUs, the Graphcore Bow Pod64 and Cerebras CS-2, which are managed via Kubernetes and Slurm. We extended the Reframe framework to support the Kubernetes scheduler backend, and utilise Reframe to perform machine learning benchmarks, and we discuss the preliminary results collected and challenges involved in integrating Reframe across multiple platforms and architectures. △ Less

Submitted 25 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

Comments: Author accepted version of paper in the PERMAVOST workshop at the 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC 24)

arXiv:2403.12449 [pdf, other]

Multi-Object RANSAC: Efficient Plane Clustering Method in a Clutter

Authors: Seunghyeon Lim, Youngjae Yoo, Jun Ki Lee, Byoung-Tak Zhang

Abstract: In this paper, we propose a novel method for plane clustering specialized in cluttered scenes using an RGB-D camera and validate its effectiveness through robot gras** experiments. Unlike existing methods, which focus on large-scale indoor structures, our approach -- Multi-Object RANSAC emphasizes cluttered environments that contain a wide range of objects with different scales. It enhances plan… ▽ More In this paper, we propose a novel method for plane clustering specialized in cluttered scenes using an RGB-D camera and validate its effectiveness through robot gras** experiments. Unlike existing methods, which focus on large-scale indoor structures, our approach -- Multi-Object RANSAC emphasizes cluttered environments that contain a wide range of objects with different scales. It enhances plane segmentation by generating subplanes in Deep Plane Clustering (DPC) module, which are then merged with the final planes by post-processing. DPC rearranges the point cloud by voting layers to make subplane clusters, trained in a self-supervised manner using pseudo-labels generated from RANSAC. Multi-Object RANSAC demonstrates superior plane instance segmentation performances over other recent RANSAC applications. We conducted an experiment on robot suction-based gras**, comparing our method with vision-based gras** network and RANSAC applications. The results from this real-world scenario showed its remarkable performance surpassing the baseline methods, highlighting its potential for advanced scene understanding and manipulation. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 7 pages, 6 figures

arXiv:2403.10774 [pdf, other]

Detecting Bias in Large Language Models: Fine-tuned KcBERT

Authors: J. K. Lee, T. M. Chung

Abstract: The rapid advancement of large language models (LLMs) has enabled natural language processing capabilities similar to those of humans, and LLMs are being widely utilized across various societal domains such as education and healthcare. While the versatility of these models has increased, they have the potential to generate subjective and normative language, leading to discriminatory treatment or o… ▽ More The rapid advancement of large language models (LLMs) has enabled natural language processing capabilities similar to those of humans, and LLMs are being widely utilized across various societal domains such as education and healthcare. While the versatility of these models has increased, they have the potential to generate subjective and normative language, leading to discriminatory treatment or outcomes among social groups, especially due to online offensive language. In this paper, we define such harm as societal bias and assess ethnic, gender, and racial biases in a model fine-tuned with Korean comments using Bidirectional Encoder Representations from Transformers (KcBERT) and KOLD data through template-based Masked Language Modeling (MLM). To quantitatively evaluate biases, we employ LPBS and CBS metrics. Compared to KcBERT, the fine-tuned model shows a reduction in ethnic bias but demonstrates significant changes in gender and racial biases. Based on these results, we propose two methods to mitigate societal bias. Firstly, a data balancing approach during the pre-training phase adjusts the uniformity of data by aligning the distribution of the occurrences of specific words and converting surrounding harmful words into non-harmful words. Secondly, during the in-training phase, we apply Debiasing Regularization by adjusting dropout and regularization, confirming a decrease in training loss. Our contribution lies in demonstrating that societal bias exists in Korean language models due to language-dependent characteristics. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 14 pages, 5 figures

arXiv:2403.10764 [pdf, other]

ECRC: Emotion-Causality Recognition in Korean Conversation for GCN

Authors: J. K. Lee, T. M. Chung

Abstract: In this multi-task learning study on simultaneous analysis of emotions and their underlying causes in conversational contexts, deep neural network methods were employed to effectively process and train large labeled datasets. However, these approaches are typically limited to conducting context analyses across the entire corpus because they rely on one of the two methods: word- or sentence-level e… ▽ More In this multi-task learning study on simultaneous analysis of emotions and their underlying causes in conversational contexts, deep neural network methods were employed to effectively process and train large labeled datasets. However, these approaches are typically limited to conducting context analyses across the entire corpus because they rely on one of the two methods: word- or sentence-level embedding. The former struggles with polysemy and homonyms, whereas the latter causes information loss when processing long sentences. In this study, we overcome the limitations of previous embeddings by utilizing both word- and sentence-level embeddings. Furthermore, we propose the emotion-causality recognition in conversation (ECRC) model, which is based on a novel graph structure, thereby leveraging the strengths of both embedding methods. This model uniquely integrates the bidirectional long short-term memory (Bi-LSTM) and graph neural network (GCN) models for Korean conversation analysis. Compared with models that rely solely on one embedding method, the proposed model effectively structures abstract concepts, such as language features and relationships, thereby minimizing information loss. To assess model performance, we compared the multi-task learning results of three deep neural network models with varying graph structures. Additionally, we evaluated the proposed model using Korean and English datasets. The experimental results show that the proposed model performs better in emotion and causality multi-task learning (74.62% and 75.30%, respectively) when node and edge characteristics are incorporated into the graph structure. Similar results were recorded for the Korean ECC and Wellness datasets (74.62% and 73.44%, respectively) with 71.35% on the IEMOCAP English dataset. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: 10 pages, 5 figures

arXiv:2311.03210 [pdf, other]

Quantum Task Offloading with the OpenMP API

Authors: Joseph K. L. Lee, Oliver T. Brown, Mark Bull, Martin Ruefenacht, Johannes Doerfert, Michael Klemm, Martin Schulz

Abstract: Most of the widely used quantum programming languages and libraries are not designed for the tightly coupled nature of hybrid quantum-classical algorithms, which run on quantum resources that are integrated on-premise with classical HPC infrastructure. We propose a programming model using the API provided by OpenMP to target quantum devices, which provides an easy-to-use and efficient interface fo… ▽ More Most of the widely used quantum programming languages and libraries are not designed for the tightly coupled nature of hybrid quantum-classical algorithms, which run on quantum resources that are integrated on-premise with classical HPC infrastructure. We propose a programming model using the API provided by OpenMP to target quantum devices, which provides an easy-to-use and efficient interface for HPC applications to utilize quantum compute resources. We have implemented a variational quantum eigensolver using the programming model, which has been tested using a classical simulator. We are in the process of testing on the quantum resources hosted at the Leibniz Supercomputing Centre (LRZ). △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: Poster extended abstract for Supercomputing 2023 (SC23)

arXiv:2305.00512 [pdf, other]

Experiences of running an HPC RISC-V testbed

Authors: Nick Brown, Maurice Jamieson, Joseph K. L. Lee

Abstract: Funded by the UK ExCALIBUR H\&ES exascale programme, in early 2022 a RISC-V testbed for HPC was stood up to provide free access for scientific software developers to experiment with RISC-V for their workloads. Here we report on successes, challenges, and lessons learnt from this activity with a view to better understanding the suitability of RISC-V for HPC and important areas to focus RISC-V HPC c… ▽ More Funded by the UK ExCALIBUR H\&ES exascale programme, in early 2022 a RISC-V testbed for HPC was stood up to provide free access for scientific software developers to experiment with RISC-V for their workloads. Here we report on successes, challenges, and lessons learnt from this activity with a view to better understanding the suitability of RISC-V for HPC and important areas to focus RISC-V HPC community efforts upon. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: Author accepted version of extended abstract in RISC-V Summit Europe

arXiv:2304.10324 [pdf, other]

Backporting RISC-V Vector assembly

Authors: Joseph K. L. Lee, Maurice Jamieson, Nick Brown

Abstract: Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that provides the RISC-V vector extension (RVV) only supports version 0.7.1, which is incompatible with the latest ratified version 1.0. The challenge is that upstr… ▽ More Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that provides the RISC-V vector extension (RVV) only supports version 0.7.1, which is incompatible with the latest ratified version 1.0. The challenge is that upstream compiler toolchains, such as Clang, only target the ratified v1.0 and do not support the older v0.7.1. Because v1.0 is not compatible with v0.7.1, the only way to program vectorised code is to use a vendor-provided, older compiler. In this paper we introduce the rvv-rollback tool which translates assembly code generated by the compiler using vector extension v1.0 instructions to v0.7.1. We utilise this tool to compare vectorisation performance of the vendor-provided GNU 8.4 compiler (supports v0.7.1) against LLVM 15.0 (supports only v1.0), where we found that the LLVM compiler is capable of auto-vectorising more computational kernels, and delivers greater performance than GNU in most, but not all, cases. We also tested LLVM vectorisation with vector length agnostic and specific settings, and observed cases with significant difference in performance. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: Preprint of paper accepted to First International Workshop on RISC-V for HPC (2023)

arXiv:2304.10319 [pdf, other]

Test-driving RISC-V Vector hardware for HPC

Authors: Joseph K. L. Lee, Maurice Jamieson, Nick Brown, Ricardo Jesus

Abstract: Whilst the RISC-V Vector extension (RVV) has been ratified, at the time of writing both hardware implementations and open source software support are still limited for vectorisation on RISC-V. This is important because vectorisation is crucial to obtaining good performance for High Performance Computing (HPC) workloads and, as of April 2023, the Allwinner D1 SoC, containing the XuanTie C906 proces… ▽ More Whilst the RISC-V Vector extension (RVV) has been ratified, at the time of writing both hardware implementations and open source software support are still limited for vectorisation on RISC-V. This is important because vectorisation is crucial to obtaining good performance for High Performance Computing (HPC) workloads and, as of April 2023, the Allwinner D1 SoC, containing the XuanTie C906 processor, is the only mass-produced and commercially available hardware supporting RVV. This paper surveys the current state of RISC-V vectorisation as of 2023, reporting the landscape of both the hardware and software ecosystem. Driving our discussion from experiences in setting up the Allwinner D1 as part of the EPCC RISC-V testbed, we report the results of benchmarking the Allwinner D1 using the RAJA Performance Suite, which demonstrated reasonable vectorisation speedup using vendor-provided compiler, as well as favourable performance compared to the StarFive VisionFive V2 with SiFive's U74 processor. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: Preprint of paper accepted to First International Workshop on RISC-V for HPC (2023)

arXiv:2303.12128 [pdf, other]

Simulation Environment with Customized RISC-V Instructions for Logic-in-Memory Architectures

Authors: Jia-Hui Su, Chen-Hua Lu, Jenq Kuen Lee, Andrea Coluccio, Fabrizio Riente, Marco Vacca, Marco Ottavi, Kuan-Hsun Chen

Abstract: Nowadays, various memory-hungry applications like machine learning algorithms are knocking "the memory wall". Toward this, emerging memories featuring computational capacity are foreseen as a promising solution that performs data process inside the memory itself, so-called computation-in-memory, while eliminating the need for costly data movement. Recent research shows that utilizing the custom ex… ▽ More Nowadays, various memory-hungry applications like machine learning algorithms are knocking "the memory wall". Toward this, emerging memories featuring computational capacity are foreseen as a promising solution that performs data process inside the memory itself, so-called computation-in-memory, while eliminating the need for costly data movement. Recent research shows that utilizing the custom extension of RISC-V instruction set architecture to support computation-in-memory operations is effective. To evaluate the applicability of such methods further, this work enhances the standard GNU binary utilities to generate RISC-V executables with Logic-in-Memory (LiM) operations and develop a new gem5 simulation environment, which simulates the entire system (CPU, peripherals, etc.) in a cycle-accurate manner together with a user-defined LiM module integrated into the system. This work provides a modular testbed for the research community to evaluate potential LiM solutions and co-designs between hardware and software. △ Less

Submitted 27 March, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

arXiv:2201.06680 [pdf, other]

Evaluation of the Architecture Alternatives for Real-time Intrusion Detection Systems for Connected Vehicles

Authors: Mubark B Jedh, Jian Kai Lee, Lotfi ben Othmane

Abstract: Attackers demonstrated the use of remote access to the in-vehicle network of connected vehicles to launch cyber-attacks and remotely take control of these vehicles. Machine-learning-based Intrusion Detection Systems (IDSs) techniques have been proposed for the detection of such attacks. The evaluation of some of these IDS demonstrated their efficacy in terms of accuracy in detecting message inject… ▽ More Attackers demonstrated the use of remote access to the in-vehicle network of connected vehicles to launch cyber-attacks and remotely take control of these vehicles. Machine-learning-based Intrusion Detection Systems (IDSs) techniques have been proposed for the detection of such attacks. The evaluation of some of these IDS demonstrated their efficacy in terms of accuracy in detecting message injections but was performed offline, which limits the confidence in their use for real-time protection scenarios. This paper evaluates four architecture designs for real-time IDS for connected vehicles using Controller Area Network (CAN) datasets collected from a moving vehicle under malicious speed reading message injections. The evaluation shows that a real-time IDS for a connected vehicle designed as two processes, a process for CAN Bus monitoring and another one for anomaly detection engine is reliable (no loss of messages) and could be used for real-time resilience mechanisms as a response to cyber-attacks. △ Less

Submitted 17 January, 2022; originally announced January 2022.

arXiv:2110.15403 [pdf, other]

Selective Regression Under Fairness Criteria

Authors: Abhin Shah, Yuheng Bu, Joshua Ka-Wing Lee, Subhro Das, Rameswar Panda, Prasanna Sattigeri, Gregory W. Wornell

Abstract: Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we redu… ▽ More Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we reduce the coverage, and thus selective regression can magnify disparities between different sensitive subgroups. Motivated by these disparities, we propose new fairness criteria for selective regression requiring the performance of every subgroup to improve with a decrease in coverage. We prove that if a feature representation satisfies the sufficiency criterion or is calibrated for mean and variance, than the proposed fairness criteria is met. Further, we introduce two approaches to mitigate the performance disparity across subgroups: (a) by regularizing an upper bound of conditional mutual information under a Gaussian assumption and (b) by regularizing a contrastive loss for conditional mean and conditional variance prediction. The effectiveness of these approaches is demonstrated on synthetic and real-world datasets. △ Less

Submitted 14 July, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

arXiv:1908.08641 [pdf, other]

Stackelberg Punishment and Bully-Proofing Autonomous Vehicles

Authors: Matt Cooper, Jun Ki Lee, Jacob Beck, Joshua D. Fishman, Michael Gillett, Zoë Papakipos, Aaron Zhang, Jerome Ramos, Aansh Shah, Michael L. Littman

Abstract: Mutually beneficial behavior in repeated games can be enforced via the threat of punishment, as enshrined in game theory's well-known "folk theorem." There is a cost, however, to a player for generating these disincentives. In this work, we seek to minimize this cost by computing a "Stackelberg punishment," in which the player selects a behavior that sufficiently punishes the other player while ma… ▽ More Mutually beneficial behavior in repeated games can be enforced via the threat of punishment, as enshrined in game theory's well-known "folk theorem." There is a cost, however, to a player for generating these disincentives. In this work, we seek to minimize this cost by computing a "Stackelberg punishment," in which the player selects a behavior that sufficiently punishes the other player while maximizing its own score under the assumption that the other player will adopt a best response. This idea generalizes the concept of a Stackelberg equilibrium. Known efficient algorithms for computing a Stackelberg equilibrium can be adapted to efficiently produce a Stackelberg punishment. We demonstrate an application of this idea in an experiment involving a virtual autonomous vehicle and human participants. We find that a self-driving car with a Stackelberg punishment policy discourages human drivers from bullying in a driving scenario requiring social negotiation. △ Less

Submitted 22 August, 2019; originally announced August 2019.

Comments: 10 pages, The 11th International Conference on Social Robotics

arXiv:1902.04257 [pdf, other]

Deep Reinforcement Learning from Policy-Dependent Human Feedback

Authors: Dilip Arumugam, Jun Ki Lee, Sophie Saskin, Michael L. Littman

Abstract: To widen their accessibility and increase their utility, intelligent agents must be able to learn complex behaviors as specified by (non-expert) human users. Moreover, they will need to learn these behaviors within a reasonable amount of time while efficiently leveraging the sparse feedback a human trainer is capable of providing. Recent work has shown that human feedback can be characterized as a… ▽ More To widen their accessibility and increase their utility, intelligent agents must be able to learn complex behaviors as specified by (non-expert) human users. Moreover, they will need to learn these behaviors within a reasonable amount of time while efficiently leveraging the sparse feedback a human trainer is capable of providing. Recent work has shown that human feedback can be characterized as a critique of an agent's current behavior rather than as an alternative reward signal to be maximized, culminating in the COnvergent Actor-Critic by Humans (COACH) algorithm for making direct policy updates based on human feedback. Our work builds on COACH, moving to a setting where the agent's policy is represented by a deep neural network. We employ a series of modifications on top of the original COACH algorithm that are critical for successfully learning behaviors from high-dimensional observations, while also satisfying the constraint of obtaining reduced sample complexity. We demonstrate the effectiveness of our Deep COACH algorithm in the rich 3D world of Minecraft with an agent that learns to complete tasks by map** from raw pixels to actions using only real-time human feedback in 10-15 minutes of interaction. △ Less

Submitted 12 February, 2019; originally announced February 2019.

arXiv:1812.02868 [pdf, other]

Measuring and Characterizing Generalization in Deep Reinforcement Learning

Authors: Sam Witty, Jun Ki Lee, Emma Tosch, Akanksha Atrey, Michael Littman, David Jensen

Abstract: Deep reinforcement-learning methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re-examine what is meant by generalization in RL, and propose several definitions based on an agent's performance in on-policy, off-… ▽ More Deep reinforcement-learning methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re-examine what is meant by generalization in RL, and propose several definitions based on an agent's performance in on-policy, off-policy, and unreachable states. We propose a set of practical methods for evaluating agents with these definitions of generalization. We demonstrate these techniques on a common benchmark task for deep RL, and we show that the learned networks make poor decisions for states that differ only slightly from on-policy states, even though those states are not selected adversarially. Taken together, these results call into question the extent to which deep Q-networks learn generalized representations, and suggest that more experimentation and analysis is necessary before claims of representation learning can be supported. △ Less

Submitted 11 December, 2018; v1 submitted 6 December, 2018; originally announced December 2018.

arXiv:1812.01129 [pdf, other]

Mitigating Planner Overfitting in Model-Based Reinforcement Learning

Authors: Dilip Arumugam, David Abel, Kavosh Asadi, Nakul Gopalan, Christopher Grimm, Jun Ki Lee, Lucas Lehnert, Michael L. Littman

Abstract: An agent with an inaccurate model of its environment faces a difficult choice: it can ignore the errors in its model and act in the real world in whatever way it determines is optimal with respect to its model. Alternatively, it can take a more conservative stance and eschew its model in favor of optimizing its behavior solely via real-world interaction. This latter approach can be exceedingly slo… ▽ More An agent with an inaccurate model of its environment faces a difficult choice: it can ignore the errors in its model and act in the real world in whatever way it determines is optimal with respect to its model. Alternatively, it can take a more conservative stance and eschew its model in favor of optimizing its behavior solely via real-world interaction. This latter approach can be exceedingly slow to learn from experience, while the former can lead to "planner overfitting" - aspects of the agent's behavior are optimized to exploit errors in its model. This paper explores an intermediate position in which the planner seeks to avoid overfitting through a kind of regularization of the plans it considers. We present three different approaches that demonstrably mitigate planner overfitting in reinforcement-learning environments. △ Less

Submitted 19 March, 2020; v1 submitted 3 December, 2018; originally announced December 2018.

arXiv:1806.06927 [pdf, other]

Auto-Meta: Automated Gradient Based Meta Learner Search

Authors: Jaehong Kim, Sangyeul Lee, Sungwan Kim, Moonsu Cha, Jung Kwon Lee, Youngduck Choi, Yongseok Choi, Dong-Yeon Cho, Jiwon Kim

Abstract: Fully automating machine learning pipelines is one of the key challenges of current artificial intelligence research, since practical machine learning often requires costly and time-consuming human-powered processes such as model design, algorithm development, and hyperparameter tuning. In this paper, we verify that automated architecture search synergizes with the effect of gradient-based meta le… ▽ More Fully automating machine learning pipelines is one of the key challenges of current artificial intelligence research, since practical machine learning often requires costly and time-consuming human-powered processes such as model design, algorithm development, and hyperparameter tuning. In this paper, we verify that automated architecture search synergizes with the effect of gradient-based meta learning. We adopt the progressive neural architecture search \cite{liu:pnas_google:DBLP:journals/corr/abs-1712-00559} to find optimal architectures for meta-learners. The gradient based meta-learner whose architecture was automatically found achieved state-of-the-art results on the 5-shot 5-way Mini-ImageNet classification problem with $74.65\%$ accuracy, which is $11.54\%$ improvement over the result obtained by the first gradient-based meta-learner called MAML \cite{finn:maml:DBLP:conf/icml/FinnAL17}. To our best knowledge, this work is the first successful neural architecture search implementation in the context of meta learning. △ Less

Submitted 10 December, 2018; v1 submitted 11 June, 2018; originally announced June 2018.

Comments: Presented at NIPS 2018 Workshop on Meta-Learning (MetaLearn 2018)

arXiv:1705.08690 [pdf, other]

Continual Learning with Deep Generative Replay

Authors: Hanul Shin, Jung Kwon Lee, Jaehong Kim, Jiwon Kim

Abstract: Attempts to train a comprehensive artificial intelligence capable of solving multiple tasks have been impeded by a chronic problem called catastrophic forgetting. Although simply replaying all previous data alleviates the problem, it requires large memory and even worse, often infeasible in real world applications where the access to past data is limited. Inspired by the generative nature of hippo… ▽ More Attempts to train a comprehensive artificial intelligence capable of solving multiple tasks have been impeded by a chronic problem called catastrophic forgetting. Although simply replaying all previous data alleviates the problem, it requires large memory and even worse, often infeasible in real world applications where the access to past data is limited. Inspired by the generative nature of hippocampus as a short-term memory system in primate brain, we propose the Deep Generative Replay, a novel framework with a cooperative dual model architecture consisting of a deep generative model ("generator") and a task solving model ("solver"). With only these two models, training data for previous tasks can easily be sampled and interleaved with those for a new task. We test our methods in several sequential learning settings involving image classification tasks. △ Less

Submitted 11 December, 2017; v1 submitted 24 May, 2017; originally announced May 2017.

Comments: NIPS 2017

arXiv:1703.05192 [pdf, other]

Learning to Discover Cross-Domain Relations with Generative Adversarial Networks

Authors: Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, Jiwon Kim

Abstract: While humans easily recognize relations between data from different domains without any supervision, learning to automatically discover them is in general very challenging and needs many ground-truth pairs that illustrate the relations. To avoid costly pairing, we address the task of discovering cross-domain relations given unpaired data. We propose a method based on generative adversarial network… ▽ More While humans easily recognize relations between data from different domains without any supervision, learning to automatically discover them is in general very challenging and needs many ground-truth pairs that illustrate the relations. To avoid costly pairing, we address the task of discovering cross-domain relations given unpaired data. We propose a method based on generative adversarial networks that learns to discover relations between different domains (DiscoGAN). Using the discovered relations, our proposed network successfully transfers style from one domain to another while preserving key attributes such as orientation and face identity. Source code for official implementation is publicly available https://github.com/SKTBrain/DiscoGAN △ Less

Submitted 15 May, 2017; v1 submitted 15 March, 2017; originally announced March 2017.

Comments: Accepted to International Conference on Machine Learning (ICML) 2017

arXiv:1511.04587 [pdf, other]

Accurate Image Super-Resolution Using Very Deep Convolutional Networks

Authors: Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee

Abstract: We present a highly accurate single-image super-resolution (SR) method. Our method uses a very deep convolutional network inspired by VGG-net used for ImageNet classification \cite{simonyan2015very}. We find increasing our network depth shows a significant improvement in accuracy. Our final model uses 20 weight layers. By cascading small filters many times in a deep network structure, contextual i… ▽ More We present a highly accurate single-image super-resolution (SR) method. Our method uses a very deep convolutional network inspired by VGG-net used for ImageNet classification \cite{simonyan2015very}. We find increasing our network depth shows a significant improvement in accuracy. Our final model uses 20 weight layers. By cascading small filters many times in a deep network structure, contextual information over large image regions is exploited in an efficient way. With very deep networks, however, convergence speed becomes a critical issue during training. We propose a simple yet effective training procedure. We learn residuals only and use extremely high learning rates ($10^4$ times higher than SRCNN \cite{dong2015image}) enabled by adjustable gradient clip**. Our proposed method performs better than existing methods in accuracy and visual improvements in our results are easily noticeable. △ Less

Submitted 11 November, 2016; v1 submitted 14 November, 2015; originally announced November 2015.

Comments: CVPR 2016 Oral

arXiv:1511.04491 [pdf, other]

Deeply-Recursive Convolutional Network for Image Super-Resolution

Authors: Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee

Abstract: We propose an image super-resolution method (SR) using a deeply-recursive convolutional network (DRCN). Our network has a very deep recursive layer (up to 16 recursions). Increasing recursion depth can improve performance without introducing new parameters for additional convolutions. Albeit advantages, learning a DRCN is very hard with a standard gradient descent method due to exploding/vanishing… ▽ More We propose an image super-resolution method (SR) using a deeply-recursive convolutional network (DRCN). Our network has a very deep recursive layer (up to 16 recursions). Increasing recursion depth can improve performance without introducing new parameters for additional convolutions. Albeit advantages, learning a DRCN is very hard with a standard gradient descent method due to exploding/vanishing gradients. To ease the difficulty of training, we propose two extensions: recursive-supervision and skip-connection. Our method outperforms previous methods by a large margin. △ Less

Submitted 11 November, 2016; v1 submitted 13 November, 2015; originally announced November 2015.

Comments: CVPR 2016 Oral

arXiv:1308.6705 [pdf, other]

Digital breadcrumbs: Detecting urban mobility patterns and transport mode choices from cellphone networks

Authors: Thomas Holleczek, Liang Yu, Joseph K. Lee, Oliver Senn, Kristian Kloeckl, Carlo Ratti, Patrick Jaillet

Abstract: Many modern and growing cities are facing declines in public transport usage, with few efficient methods to explain why. In this article, we show that urban mobility patterns and transport mode choices can be derived from cellphone call detail records coupled with public transport data recorded from smart cards. Specifically, we present new data mining approaches to determine the spatial and tempo… ▽ More Many modern and growing cities are facing declines in public transport usage, with few efficient methods to explain why. In this article, we show that urban mobility patterns and transport mode choices can be derived from cellphone call detail records coupled with public transport data recorded from smart cards. Specifically, we present new data mining approaches to determine the spatial and temporal variability of public and private transportation usage and transport mode preferences across Singapore. Our results, which were validated by Singapore's quadriennial Household Interview Travel Survey (HITS), revealed that there are 3.5 (HITS: 3.5 million) million and 4.3 (HITS: 4.4 million) million inter-district passengers by public and private transport, respectively. Along with classifying which transportation connections are weak or underserved, the analysis shows that the mode share of public transport use increases from 38 percent in the morning to 44 percent around mid-day and 52 percent in the evening. △ Less

Submitted 30 August, 2013; originally announced August 2013.

arXiv:1108.1593 [pdf]

Multilayer Approach to Defend Phishing Attacks

Authors: Cynthia Dhinakaran, Dhinaharan Nagamalai, Jae Kwang Lee

Abstract: Spam messes up users inbox, consumes resources and spread attacks like DDoS, MiM, phishing etc. Phishing is a byproduct of email and causes financial loss to users and loss of reputation to financial institutions. In this paper we examine the characteristics of phishing and technology used by Phishers. In order to counter anti-phishing technology, phishers change their mode of operation; therefore… ▽ More Spam messes up users inbox, consumes resources and spread attacks like DDoS, MiM, phishing etc. Phishing is a byproduct of email and causes financial loss to users and loss of reputation to financial institutions. In this paper we examine the characteristics of phishing and technology used by Phishers. In order to counter anti-phishing technology, phishers change their mode of operation; therefore a continuous evaluation of phishing only helps us combat phisher effectiveness. In our study, we collected seven hundred thousand spam from a corporate server for a period of 13 months from February 2008 to February 2009. From the collected data, we identified different kinds of phishing scams and mode of operation. Our observation shows that phishers are dynamic and depend more on social engineering techniques rather than software vulnerabilities. We believe that this study will develop more efficient anti-phishing methodologies. Based on our analysis, we developed an anti-phishing methodology and implemented in our network. The results show that this approach is highly effective to prevent phishing attacks. The proposed approach reduced more than 80% of the false negatives and more than 95% of phishing attacks in our network. △ Less

Submitted 7 August, 2011; originally announced August 2011.

Comments: 8 Pages, Journal of Internet Technology (JIT) 2010

arXiv:1012.1665 [pdf]

An In-depth Analysis of Spam and Spammers

Authors: Dhinaharan Nagamalai, Beatrice Cynthia Dhinakaran, Jae Kwang Lee

Abstract: Electronic mail services have become an important source of communication for millions of people all over the world. Due to this tremendous growth, there has been a significant increase in spam traffic. Spam messes up user's inbox, consumes network resources and spread worms and viruses. In this paper we study the characteristics of spam and the technology used by spammers. In order to counter ant… ▽ More Electronic mail services have become an important source of communication for millions of people all over the world. Due to this tremendous growth, there has been a significant increase in spam traffic. Spam messes up user's inbox, consumes network resources and spread worms and viruses. In this paper we study the characteristics of spam and the technology used by spammers. In order to counter anti spam technology, spammers change their mode of operation, therefore continues evaluation of the characteristics of spam and spammers technology has become mandatory. These evaluations help us to enhance the existing anti spam technology and thereby help us to combat spam effectively. In order to characterize spam, we collected four hundred thousand spam mails from a corporate mail server for a period of 14 months from January 2006 to February 2007. For analysis we classified spam based on attachment and contents. We observed that spammers use software tools to send spam with attachment. The main features of this software are hiding sender's identity, randomly selecting text messages, identifying open relay machines, mass mailing capability and defining spamming duration. Spammers do not use spam software to send spam without attachment. From our study we observed that, four years old heavy users email accounts attract more spam than four years old light users mail accounts. Relatively new email accounts which are 14 months old do not receive spam. But in some special cases like DDoS attacks, we found that new email accounts receive spam and 14 months old heavy users email accounts have attracted more spam than 14 months old light users. We believe that this analysis could be useful to develop more efficient anti spam techniques. △ Less

Submitted 7 December, 2010; originally announced December 2010.

Comments: 14 pages, 8 Figures,5 tables, IJSA Vol 2, No 2, 2008

Journal ref: International Journal of Security and its Applications,Vol. 2, No. 2, April, 2008

arXiv:1011.3279 [pdf]

doi 10.5121/ijnsa.2010.2420

Bayesian Based Comment Spam Defending Tool

Authors: Dhinaharan Nagamalai, Beatrice Cynthia Dhinakaran, Jae Kwang Lee

Abstract: Spam messes up user's inbox, consumes network resources and spread worms and viruses. Spam is flooding of unsolicited, unwanted e mail. Spam in blogs is called blog spam or comment spam.It is done by posting comments or flooding spams to the services such as blogs, forums,news,email archives and guestbooks. Blog spams generally appears on guestbooks or comment pages where spammers fill a comment b… ▽ More Spam messes up user's inbox, consumes network resources and spread worms and viruses. Spam is flooding of unsolicited, unwanted e mail. Spam in blogs is called blog spam or comment spam.It is done by posting comments or flooding spams to the services such as blogs, forums,news,email archives and guestbooks. Blog spams generally appears on guestbooks or comment pages where spammers fill a comment box with spam words. In addition to wasting user's time with unwanted comments, spam also consumes a lot of bandwidth. In this paper, we propose a software tool to prevent such blog spams by using Bayesian Algorithm based technique. It is derived from Bayes' Theorem. It gives an output which has a probability that any comment is spam, given that it has certain words in it. With using our past entries and a comment entry, this value is obtained and compared with a threshold value to find if it exceeds the threshold value or not. By using this concept, we developed a software tool to block comment spam. The experimental results show that the Bayesian based tool is working well. This paper has the major findings and their significance of blog spam filter. △ Less

Submitted 14 November, 2010; originally announced November 2010.

Comments: 14 Pages,4 Figures, International Journal of Network Security & Its Applications (IJNSA), Vol.2, No.4, October 2010

arXiv:1011.1050 [pdf]

doi 10.1109/ICCIT.2007.89

Characterizing Spam traffic and Spammers

Authors: Cynthia Dhinakaran, Dhinaharan Nagamalai, Jae Kwang Lee

Abstract: There is a tremendous increase in spam traffic these days. Spam messages muddle up users inbox, consume network resources, and build up DDoS attacks, spread worms and viruses. Our goal is to present a definite figure about the characteristics of spam and spammers. Since spammers change their mode of operation to counter anti spam technology,continues evaluation of the characteristics of spam and s… ▽ More There is a tremendous increase in spam traffic these days. Spam messages muddle up users inbox, consume network resources, and build up DDoS attacks, spread worms and viruses. Our goal is to present a definite figure about the characteristics of spam and spammers. Since spammers change their mode of operation to counter anti spam technology,continues evaluation of the characteristics of spam and spammers technology has become mandatory. These evaluations help us to enhance the existing technology to combat spam effectively. We collected 400 thousand spam mails from a spam trap set up in a corporate mail server for a period of 14 months form January 2006 to February 2007. Spammers use common techniques to spam end users regardless of corporate server and public mail server. So we believe that our spam collection is a sample of world wide spam traffic. Studying the characteristics of this sample helps us to better understand the features of spam and spammers technology. We believe that this analysis could be useful to develop more efficient anti spam techniques. △ Less

Submitted 3 November, 2010; originally announced November 2010.

Comments: 6 pages, 4 Figures, ICCIT 2007, IEEE CS

arXiv:1011.0792 [pdf]

doi 10.1109/FGCN.2007.61

An Empirical Study of Spam and Spam Vulnerable email Accounts

Authors: Cynthia Dhinakaran, Dhinaharan Nagamalai, Jae Kwang Lee

Abstract: Spam messages muddle up users inbox, consume network resources, and build up DDoS attacks, spread malware. Our goal is to present a definite figure about the characteristics of spam and spam vulnerable email accounts. These evaluations help us to enhance the existing technology to combat spam effectively. We collected 400 thousand spam mails from a spam trap set up in a corporate mail server for a… ▽ More Spam messages muddle up users inbox, consume network resources, and build up DDoS attacks, spread malware. Our goal is to present a definite figure about the characteristics of spam and spam vulnerable email accounts. These evaluations help us to enhance the existing technology to combat spam effectively. We collected 400 thousand spam mails from a spam trap set up in a corporate mail server for a period of 14 months form January 2006 to February 2007. Spammers use common techniques to spam end users regardless of corporate server and public mail server. So we believe that our spam collection is a sample of world wide spam traffic. Studying the characteristics of this sample helps us to better understand the features of spam and spam vulnerable e-mail accounts. We believe that this analysis is highly useful to develop more efficient anti spam techniques. In our analysis we classified spam based on attachment and contents. According to our study the four years old heavy users email accounts attract more spam than four years oldlight users mail accounts. The 14 months old relatively new email accounts don't receive spam. In some special cases like DDoS attacks, the new email accounts receive spam. During DDoS attack 14 months old heavy users email accounts have attracted more number of spam than 14 months old light users mail accounts. △ Less

Submitted 2 November, 2010; originally announced November 2010.

Comments: 6 pages, 5 Figures, FGCN 2007, IEEE CS

arXiv:1010.2802 [pdf]

doi 10.1109/NetCoM.2009.86

"Reminder: please update your details": Phishing Trends

Authors: Cynthia Dhinakaran, Jae Kwang lee, Dhinaharan Nagamalai

Abstract: Spam messes up users inbox, consumes resources and spread attacks like DDoS, MiM, Phishing etc., Phishing is a byproduct of email and causes financial loss to users and loss of reputation to financial institutions. In this paper we study the characteristics of phishing and technology used by phishers. In order to counter anti phishing technology, phishers change their mode of operation; therefore… ▽ More Spam messes up users inbox, consumes resources and spread attacks like DDoS, MiM, Phishing etc., Phishing is a byproduct of email and causes financial loss to users and loss of reputation to financial institutions. In this paper we study the characteristics of phishing and technology used by phishers. In order to counter anti phishing technology, phishers change their mode of operation; therefore continuous evaluation of phishing helps us to combat phishers effectively. We have collected seven hundred thousand spam from a corporate server for a period of 13 months from February 2008 to February 2009. From the collected date, we identified different kinds of phishing scams and mode of their operation. Our observation shows that phishers are dynamic and depend more on social engineering techniques rather than software vulnerabilities. We believe that this study would be useful to develop more efficient anti phishing methodologies. △ Less

Submitted 13 October, 2010; originally announced October 2010.

Comments: 6 pages, 6 Figures, NETCOM 2009, IEEE CS

arXiv:1010.1583 [pdf]

doi 10.1109/MUE.2007.157

Multi Layer Approach to Defend DDoS Attacks Caused by Spam

Authors: Dhinaharan Nagamalai, Cynthia Dhinakaran, Jae Kwang Lee

Abstract: Corporate mail services are designed to perform better than public mail services. Fast mail delivery, large size file transfer as an attachments, high level spam and virus protection, commercial advertisement free environment are some of the advantages worth to mention. But these mail services are frequent target of hackers and spammers. Distributed Denial of service attacks are becoming more comm… ▽ More Corporate mail services are designed to perform better than public mail services. Fast mail delivery, large size file transfer as an attachments, high level spam and virus protection, commercial advertisement free environment are some of the advantages worth to mention. But these mail services are frequent target of hackers and spammers. Distributed Denial of service attacks are becoming more common and sophisticated. The researchers have proposed various solutions to the DDOS attacks. Can we stop these kinds of attacks with available technology? These days the DDoS attack through spam has increased and disturbed the mail services of various organizations. Spam penetrates through all the filters to establish DDoS attacks, which causes serious problems to users and the data. In this paper we propose a multilayer approach to defend DDoS attack caused by spam mails. This approach is a combination of fine tuning of source filters, content filters, strictly implementing mail policies, educating user, network monitoring and logical solutions to the ongoing attack. We have conducted several experiments in corporate mail services; the results show that this approach is highly effective to prevent DDoS attack caused by spam. The defense mechanism reduced 60% of the incoming spam traffic and repelled many DDoS attacks caused by spam △ Less

Submitted 7 October, 2010; originally announced October 2010.

Comments: 6 pages,5 figures,MUE 2007, IEEE CS

Journal ref: MUE 2007, IEEE CS

arXiv:0812.0904 [pdf, ps, other]

An Approximation of the Outage Probability for Multi-hop AF Fixed Gain Relay

Authors: Jun Kyoung Lee, Janghoon Yang, Dong Ku Kim

Abstract: In this letter, we present a closed-form approximation of the outage probability for the multi-hop amplify-and-forward (AF) relaying systems with fixed gain in Rayleigh fading channel. The approximation is derived from the outage event for each hop. The simulation results show the tightness of the proposed approximation in low and high signal-to-noise ratio (SNR) region. In this letter, we present a closed-form approximation of the outage probability for the multi-hop amplify-and-forward (AF) relaying systems with fixed gain in Rayleigh fading channel. The approximation is derived from the outage event for each hop. The simulation results show the tightness of the proposed approximation in low and high signal-to-noise ratio (SNR) region. △ Less

Submitted 4 December, 2008; originally announced December 2008.

Comments: 3 pages, 3 figures, Submitted to IEEE Communication Letters

Showing 1–30 of 30 results for author: Lee, J K