-
2BP: 2-Stage Backpropagation
Authors:
Christopher Rae,
Joseph K. L. Lee,
James Richings
Abstract:
As Deep Neural Networks (DNNs) grow in size and complexity, they often exceed the memory capacity of a single accelerator, necessitating the sharding of model parameters across multiple accelerators. Pipeline parallelism is a commonly used sharding strategy for training large DNNs. However, current implementations of pipeline parallelism are being unintentionally bottlenecked by the automatic diff…
▽ More
As Deep Neural Networks (DNNs) grow in size and complexity, they often exceed the memory capacity of a single accelerator, necessitating the sharding of model parameters across multiple accelerators. Pipeline parallelism is a commonly used sharding strategy for training large DNNs. However, current implementations of pipeline parallelism are being unintentionally bottlenecked by the automatic differentiation tools provided by ML frameworks. This paper introduces 2-stage backpropagation (2BP). By splitting the backward propagation step into two separate stages, we can reduce idle compute time. We tested 2BP on various model architectures and pipelining schedules, achieving increases in throughput in all cases. Using 2BP, we were able to achieve a 1.70x increase in throughput compared to traditional methods when training a LLaMa-like transformer with 7 billion parameters across 4 GPUs.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Benchmarking Machine Learning Applications on Heterogeneous Architecture using Reframe
Authors:
Christopher Rae,
Joseph K. L. Lee,
James Richings,
Michele Weiland
Abstract:
With the rapid increase in machine learning workloads performed on HPC systems, it is beneficial to regularly perform machine learning specific benchmarks to monitor performance and identify issues. Furthermore, as part of the Edinburgh International Data Facility, EPCC currently hosts a wide range of machine learning accelerators including Nvidia GPUs, the Graphcore Bow Pod64 and Cerebras CS-2, w…
▽ More
With the rapid increase in machine learning workloads performed on HPC systems, it is beneficial to regularly perform machine learning specific benchmarks to monitor performance and identify issues. Furthermore, as part of the Edinburgh International Data Facility, EPCC currently hosts a wide range of machine learning accelerators including Nvidia GPUs, the Graphcore Bow Pod64 and Cerebras CS-2, which are managed via Kubernetes and Slurm. We extended the Reframe framework to support the Kubernetes scheduler backend, and utilise Reframe to perform machine learning benchmarks, and we discuss the preliminary results collected and challenges involved in integrating Reframe across multiple platforms and architectures.
△ Less
Submitted 25 April, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Multi-Object RANSAC: Efficient Plane Clustering Method in a Clutter
Authors:
Seunghyeon Lim,
Youngjae Yoo,
Jun Ki Lee,
Byoung-Tak Zhang
Abstract:
In this paper, we propose a novel method for plane clustering specialized in cluttered scenes using an RGB-D camera and validate its effectiveness through robot gras** experiments. Unlike existing methods, which focus on large-scale indoor structures, our approach -- Multi-Object RANSAC emphasizes cluttered environments that contain a wide range of objects with different scales. It enhances plan…
▽ More
In this paper, we propose a novel method for plane clustering specialized in cluttered scenes using an RGB-D camera and validate its effectiveness through robot gras** experiments. Unlike existing methods, which focus on large-scale indoor structures, our approach -- Multi-Object RANSAC emphasizes cluttered environments that contain a wide range of objects with different scales. It enhances plane segmentation by generating subplanes in Deep Plane Clustering (DPC) module, which are then merged with the final planes by post-processing. DPC rearranges the point cloud by voting layers to make subplane clusters, trained in a self-supervised manner using pseudo-labels generated from RANSAC. Multi-Object RANSAC demonstrates superior plane instance segmentation performances over other recent RANSAC applications. We conducted an experiment on robot suction-based gras**, comparing our method with vision-based gras** network and RANSAC applications. The results from this real-world scenario showed its remarkable performance surpassing the baseline methods, highlighting its potential for advanced scene understanding and manipulation.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Detecting Bias in Large Language Models: Fine-tuned KcBERT
Authors:
J. K. Lee,
T. M. Chung
Abstract:
The rapid advancement of large language models (LLMs) has enabled natural language processing capabilities similar to those of humans, and LLMs are being widely utilized across various societal domains such as education and healthcare. While the versatility of these models has increased, they have the potential to generate subjective and normative language, leading to discriminatory treatment or o…
▽ More
The rapid advancement of large language models (LLMs) has enabled natural language processing capabilities similar to those of humans, and LLMs are being widely utilized across various societal domains such as education and healthcare. While the versatility of these models has increased, they have the potential to generate subjective and normative language, leading to discriminatory treatment or outcomes among social groups, especially due to online offensive language. In this paper, we define such harm as societal bias and assess ethnic, gender, and racial biases in a model fine-tuned with Korean comments using Bidirectional Encoder Representations from Transformers (KcBERT) and KOLD data through template-based Masked Language Modeling (MLM). To quantitatively evaluate biases, we employ LPBS and CBS metrics. Compared to KcBERT, the fine-tuned model shows a reduction in ethnic bias but demonstrates significant changes in gender and racial biases. Based on these results, we propose two methods to mitigate societal bias. Firstly, a data balancing approach during the pre-training phase adjusts the uniformity of data by aligning the distribution of the occurrences of specific words and converting surrounding harmful words into non-harmful words. Secondly, during the in-training phase, we apply Debiasing Regularization by adjusting dropout and regularization, confirming a decrease in training loss. Our contribution lies in demonstrating that societal bias exists in Korean language models due to language-dependent characteristics.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
ECRC: Emotion-Causality Recognition in Korean Conversation for GCN
Authors:
J. K. Lee,
T. M. Chung
Abstract:
In this multi-task learning study on simultaneous analysis of emotions and their underlying causes in conversational contexts, deep neural network methods were employed to effectively process and train large labeled datasets. However, these approaches are typically limited to conducting context analyses across the entire corpus because they rely on one of the two methods: word- or sentence-level e…
▽ More
In this multi-task learning study on simultaneous analysis of emotions and their underlying causes in conversational contexts, deep neural network methods were employed to effectively process and train large labeled datasets. However, these approaches are typically limited to conducting context analyses across the entire corpus because they rely on one of the two methods: word- or sentence-level embedding. The former struggles with polysemy and homonyms, whereas the latter causes information loss when processing long sentences. In this study, we overcome the limitations of previous embeddings by utilizing both word- and sentence-level embeddings. Furthermore, we propose the emotion-causality recognition in conversation (ECRC) model, which is based on a novel graph structure, thereby leveraging the strengths of both embedding methods. This model uniquely integrates the bidirectional long short-term memory (Bi-LSTM) and graph neural network (GCN) models for Korean conversation analysis. Compared with models that rely solely on one embedding method, the proposed model effectively structures abstract concepts, such as language features and relationships, thereby minimizing information loss. To assess model performance, we compared the multi-task learning results of three deep neural network models with varying graph structures. Additionally, we evaluated the proposed model using Korean and English datasets. The experimental results show that the proposed model performs better in emotion and causality multi-task learning (74.62% and 75.30%, respectively) when node and edge characteristics are incorporated into the graph structure. Similar results were recorded for the Korean ECC and Wellness datasets (74.62% and 73.44%, respectively) with 71.35% on the IEMOCAP English dataset.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Quantum Task Offloading with the OpenMP API
Authors:
Joseph K. L. Lee,
Oliver T. Brown,
Mark Bull,
Martin Ruefenacht,
Johannes Doerfert,
Michael Klemm,
Martin Schulz
Abstract:
Most of the widely used quantum programming languages and libraries are not designed for the tightly coupled nature of hybrid quantum-classical algorithms, which run on quantum resources that are integrated on-premise with classical HPC infrastructure. We propose a programming model using the API provided by OpenMP to target quantum devices, which provides an easy-to-use and efficient interface fo…
▽ More
Most of the widely used quantum programming languages and libraries are not designed for the tightly coupled nature of hybrid quantum-classical algorithms, which run on quantum resources that are integrated on-premise with classical HPC infrastructure. We propose a programming model using the API provided by OpenMP to target quantum devices, which provides an easy-to-use and efficient interface for HPC applications to utilize quantum compute resources. We have implemented a variational quantum eigensolver using the programming model, which has been tested using a classical simulator. We are in the process of testing on the quantum resources hosted at the Leibniz Supercomputing Centre (LRZ).
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Experiences of running an HPC RISC-V testbed
Authors:
Nick Brown,
Maurice Jamieson,
Joseph K. L. Lee
Abstract:
Funded by the UK ExCALIBUR H\&ES exascale programme, in early 2022 a RISC-V testbed for HPC was stood up to provide free access for scientific software developers to experiment with RISC-V for their workloads. Here we report on successes, challenges, and lessons learnt from this activity with a view to better understanding the suitability of RISC-V for HPC and important areas to focus RISC-V HPC c…
▽ More
Funded by the UK ExCALIBUR H\&ES exascale programme, in early 2022 a RISC-V testbed for HPC was stood up to provide free access for scientific software developers to experiment with RISC-V for their workloads. Here we report on successes, challenges, and lessons learnt from this activity with a view to better understanding the suitability of RISC-V for HPC and important areas to focus RISC-V HPC community efforts upon.
△ Less
Submitted 30 April, 2023;
originally announced May 2023.
-
Backporting RISC-V Vector assembly
Authors:
Joseph K. L. Lee,
Maurice Jamieson,
Nick Brown
Abstract:
Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that provides the RISC-V vector extension (RVV) only supports version 0.7.1, which is incompatible with the latest ratified version 1.0. The challenge is that upstr…
▽ More
Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that provides the RISC-V vector extension (RVV) only supports version 0.7.1, which is incompatible with the latest ratified version 1.0. The challenge is that upstream compiler toolchains, such as Clang, only target the ratified v1.0 and do not support the older v0.7.1. Because v1.0 is not compatible with v0.7.1, the only way to program vectorised code is to use a vendor-provided, older compiler. In this paper we introduce the rvv-rollback tool which translates assembly code generated by the compiler using vector extension v1.0 instructions to v0.7.1. We utilise this tool to compare vectorisation performance of the vendor-provided GNU 8.4 compiler (supports v0.7.1) against LLVM 15.0 (supports only v1.0), where we found that the LLVM compiler is capable of auto-vectorising more computational kernels, and delivers greater performance than GNU in most, but not all, cases. We also tested LLVM vectorisation with vector length agnostic and specific settings, and observed cases with significant difference in performance.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Test-driving RISC-V Vector hardware for HPC
Authors:
Joseph K. L. Lee,
Maurice Jamieson,
Nick Brown,
Ricardo Jesus
Abstract:
Whilst the RISC-V Vector extension (RVV) has been ratified, at the time of writing both hardware implementations and open source software support are still limited for vectorisation on RISC-V. This is important because vectorisation is crucial to obtaining good performance for High Performance Computing (HPC) workloads and, as of April 2023, the Allwinner D1 SoC, containing the XuanTie C906 proces…
▽ More
Whilst the RISC-V Vector extension (RVV) has been ratified, at the time of writing both hardware implementations and open source software support are still limited for vectorisation on RISC-V. This is important because vectorisation is crucial to obtaining good performance for High Performance Computing (HPC) workloads and, as of April 2023, the Allwinner D1 SoC, containing the XuanTie C906 processor, is the only mass-produced and commercially available hardware supporting RVV. This paper surveys the current state of RISC-V vectorisation as of 2023, reporting the landscape of both the hardware and software ecosystem. Driving our discussion from experiences in setting up the Allwinner D1 as part of the EPCC RISC-V testbed, we report the results of benchmarking the Allwinner D1 using the RAJA Performance Suite, which demonstrated reasonable vectorisation speedup using vendor-provided compiler, as well as favourable performance compared to the StarFive VisionFive V2 with SiFive's U74 processor.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
Simulation Environment with Customized RISC-V Instructions for Logic-in-Memory Architectures
Authors:
Jia-Hui Su,
Chen-Hua Lu,
Jenq Kuen Lee,
Andrea Coluccio,
Fabrizio Riente,
Marco Vacca,
Marco Ottavi,
Kuan-Hsun Chen
Abstract:
Nowadays, various memory-hungry applications like machine learning algorithms are knocking "the memory wall". Toward this, emerging memories featuring computational capacity are foreseen as a promising solution that performs data process inside the memory itself, so-called computation-in-memory, while eliminating the need for costly data movement. Recent research shows that utilizing the custom ex…
▽ More
Nowadays, various memory-hungry applications like machine learning algorithms are knocking "the memory wall". Toward this, emerging memories featuring computational capacity are foreseen as a promising solution that performs data process inside the memory itself, so-called computation-in-memory, while eliminating the need for costly data movement. Recent research shows that utilizing the custom extension of RISC-V instruction set architecture to support computation-in-memory operations is effective. To evaluate the applicability of such methods further, this work enhances the standard GNU binary utilities to generate RISC-V executables with Logic-in-Memory (LiM) operations and develop a new gem5 simulation environment, which simulates the entire system (CPU, peripherals, etc.) in a cycle-accurate manner together with a user-defined LiM module integrated into the system. This work provides a modular testbed for the research community to evaluate potential LiM solutions and co-designs between hardware and software.
△ Less
Submitted 27 March, 2023; v1 submitted 21 March, 2023;
originally announced March 2023.
-
Evaluation of the Architecture Alternatives for Real-time Intrusion Detection Systems for Connected Vehicles
Authors:
Mubark B Jedh,
Jian Kai Lee,
Lotfi ben Othmane
Abstract:
Attackers demonstrated the use of remote access to the in-vehicle network of connected vehicles to launch cyber-attacks and remotely take control of these vehicles. Machine-learning-based Intrusion Detection Systems (IDSs) techniques have been proposed for the detection of such attacks. The evaluation of some of these IDS demonstrated their efficacy in terms of accuracy in detecting message inject…
▽ More
Attackers demonstrated the use of remote access to the in-vehicle network of connected vehicles to launch cyber-attacks and remotely take control of these vehicles. Machine-learning-based Intrusion Detection Systems (IDSs) techniques have been proposed for the detection of such attacks. The evaluation of some of these IDS demonstrated their efficacy in terms of accuracy in detecting message injections but was performed offline, which limits the confidence in their use for real-time protection scenarios. This paper evaluates four architecture designs for real-time IDS for connected vehicles using Controller Area Network (CAN) datasets collected from a moving vehicle under malicious speed reading message injections. The evaluation shows that a real-time IDS for a connected vehicle designed as two processes, a process for CAN Bus monitoring and another one for anomaly detection engine is reliable (no loss of messages) and could be used for real-time resilience mechanisms as a response to cyber-attacks.
△ Less
Submitted 17 January, 2022;
originally announced January 2022.
-
Selective Regression Under Fairness Criteria
Authors:
Abhin Shah,
Yuheng Bu,
Joshua Ka-Wing Lee,
Subhro Das,
Rameswar Panda,
Prasanna Sattigeri,
Gregory W. Wornell
Abstract:
Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we redu…
▽ More
Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we reduce the coverage, and thus selective regression can magnify disparities between different sensitive subgroups. Motivated by these disparities, we propose new fairness criteria for selective regression requiring the performance of every subgroup to improve with a decrease in coverage. We prove that if a feature representation satisfies the sufficiency criterion or is calibrated for mean and variance, than the proposed fairness criteria is met. Further, we introduce two approaches to mitigate the performance disparity across subgroups: (a) by regularizing an upper bound of conditional mutual information under a Gaussian assumption and (b) by regularizing a contrastive loss for conditional mean and conditional variance prediction. The effectiveness of these approaches is demonstrated on synthetic and real-world datasets.
△ Less
Submitted 14 July, 2022; v1 submitted 28 October, 2021;
originally announced October 2021.
-
Stackelberg Punishment and Bully-Proofing Autonomous Vehicles
Authors:
Matt Cooper,
Jun Ki Lee,
Jacob Beck,
Joshua D. Fishman,
Michael Gillett,
Zoƫ Papakipos,
Aaron Zhang,
Jerome Ramos,
Aansh Shah,
Michael L. Littman
Abstract:
Mutually beneficial behavior in repeated games can be enforced via the threat of punishment, as enshrined in game theory's well-known "folk theorem." There is a cost, however, to a player for generating these disincentives. In this work, we seek to minimize this cost by computing a "Stackelberg punishment," in which the player selects a behavior that sufficiently punishes the other player while ma…
▽ More
Mutually beneficial behavior in repeated games can be enforced via the threat of punishment, as enshrined in game theory's well-known "folk theorem." There is a cost, however, to a player for generating these disincentives. In this work, we seek to minimize this cost by computing a "Stackelberg punishment," in which the player selects a behavior that sufficiently punishes the other player while maximizing its own score under the assumption that the other player will adopt a best response. This idea generalizes the concept of a Stackelberg equilibrium. Known efficient algorithms for computing a Stackelberg equilibrium can be adapted to efficiently produce a Stackelberg punishment. We demonstrate an application of this idea in an experiment involving a virtual autonomous vehicle and human participants. We find that a self-driving car with a Stackelberg punishment policy discourages human drivers from bullying in a driving scenario requiring social negotiation.
△ Less
Submitted 22 August, 2019;
originally announced August 2019.
-
Deep Reinforcement Learning from Policy-Dependent Human Feedback
Authors:
Dilip Arumugam,
Jun Ki Lee,
Sophie Saskin,
Michael L. Littman
Abstract:
To widen their accessibility and increase their utility, intelligent agents must be able to learn complex behaviors as specified by (non-expert) human users. Moreover, they will need to learn these behaviors within a reasonable amount of time while efficiently leveraging the sparse feedback a human trainer is capable of providing. Recent work has shown that human feedback can be characterized as a…
▽ More
To widen their accessibility and increase their utility, intelligent agents must be able to learn complex behaviors as specified by (non-expert) human users. Moreover, they will need to learn these behaviors within a reasonable amount of time while efficiently leveraging the sparse feedback a human trainer is capable of providing. Recent work has shown that human feedback can be characterized as a critique of an agent's current behavior rather than as an alternative reward signal to be maximized, culminating in the COnvergent Actor-Critic by Humans (COACH) algorithm for making direct policy updates based on human feedback. Our work builds on COACH, moving to a setting where the agent's policy is represented by a deep neural network. We employ a series of modifications on top of the original COACH algorithm that are critical for successfully learning behaviors from high-dimensional observations, while also satisfying the constraint of obtaining reduced sample complexity. We demonstrate the effectiveness of our Deep COACH algorithm in the rich 3D world of Minecraft with an agent that learns to complete tasks by map** from raw pixels to actions using only real-time human feedback in 10-15 minutes of interaction.
△ Less
Submitted 12 February, 2019;
originally announced February 2019.
-
Measuring and Characterizing Generalization in Deep Reinforcement Learning
Authors:
Sam Witty,
Jun Ki Lee,
Emma Tosch,
Akanksha Atrey,
Michael Littman,
David Jensen
Abstract:
Deep reinforcement-learning methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re-examine what is meant by generalization in RL, and propose several definitions based on an agent's performance in on-policy, off-…
▽ More
Deep reinforcement-learning methods have achieved remarkable performance on challenging control tasks. Observations of the resulting behavior give the impression that the agent has constructed a generalized representation that supports insightful action decisions. We re-examine what is meant by generalization in RL, and propose several definitions based on an agent's performance in on-policy, off-policy, and unreachable states. We propose a set of practical methods for evaluating agents with these definitions of generalization. We demonstrate these techniques on a common benchmark task for deep RL, and we show that the learned networks make poor decisions for states that differ only slightly from on-policy states, even though those states are not selected adversarially. Taken together, these results call into question the extent to which deep Q-networks learn generalized representations, and suggest that more experimentation and analysis is necessary before claims of representation learning can be supported.
△ Less
Submitted 11 December, 2018; v1 submitted 6 December, 2018;
originally announced December 2018.
-
Mitigating Planner Overfitting in Model-Based Reinforcement Learning
Authors:
Dilip Arumugam,
David Abel,
Kavosh Asadi,
Nakul Gopalan,
Christopher Grimm,
Jun Ki Lee,
Lucas Lehnert,
Michael L. Littman
Abstract:
An agent with an inaccurate model of its environment faces a difficult choice: it can ignore the errors in its model and act in the real world in whatever way it determines is optimal with respect to its model. Alternatively, it can take a more conservative stance and eschew its model in favor of optimizing its behavior solely via real-world interaction. This latter approach can be exceedingly slo…
▽ More
An agent with an inaccurate model of its environment faces a difficult choice: it can ignore the errors in its model and act in the real world in whatever way it determines is optimal with respect to its model. Alternatively, it can take a more conservative stance and eschew its model in favor of optimizing its behavior solely via real-world interaction. This latter approach can be exceedingly slow to learn from experience, while the former can lead to "planner overfitting" - aspects of the agent's behavior are optimized to exploit errors in its model. This paper explores an intermediate position in which the planner seeks to avoid overfitting through a kind of regularization of the plans it considers. We present three different approaches that demonstrably mitigate planner overfitting in reinforcement-learning environments.
△ Less
Submitted 19 March, 2020; v1 submitted 3 December, 2018;
originally announced December 2018.
-
Auto-Meta: Automated Gradient Based Meta Learner Search
Authors:
Jaehong Kim,
Sangyeul Lee,
Sungwan Kim,
Moonsu Cha,
Jung Kwon Lee,
Youngduck Choi,
Yongseok Choi,
Dong-Yeon Cho,
Jiwon Kim
Abstract:
Fully automating machine learning pipelines is one of the key challenges of current artificial intelligence research, since practical machine learning often requires costly and time-consuming human-powered processes such as model design, algorithm development, and hyperparameter tuning. In this paper, we verify that automated architecture search synergizes with the effect of gradient-based meta le…
▽ More
Fully automating machine learning pipelines is one of the key challenges of current artificial intelligence research, since practical machine learning often requires costly and time-consuming human-powered processes such as model design, algorithm development, and hyperparameter tuning. In this paper, we verify that automated architecture search synergizes with the effect of gradient-based meta learning. We adopt the progressive neural architecture search \cite{liu:pnas_google:DBLP:journals/corr/abs-1712-00559} to find optimal architectures for meta-learners. The gradient based meta-learner whose architecture was automatically found achieved state-of-the-art results on the 5-shot 5-way Mini-ImageNet classification problem with $74.65\%$ accuracy, which is $11.54\%$ improvement over the result obtained by the first gradient-based meta-learner called MAML \cite{finn:maml:DBLP:conf/icml/FinnAL17}. To our best knowledge, this work is the first successful neural architecture search implementation in the context of meta learning.
△ Less
Submitted 10 December, 2018; v1 submitted 11 June, 2018;
originally announced June 2018.
-
Continual Learning with Deep Generative Replay
Authors:
Hanul Shin,
Jung Kwon Lee,
Jaehong Kim,
Jiwon Kim
Abstract:
Attempts to train a comprehensive artificial intelligence capable of solving multiple tasks have been impeded by a chronic problem called catastrophic forgetting. Although simply replaying all previous data alleviates the problem, it requires large memory and even worse, often infeasible in real world applications where the access to past data is limited. Inspired by the generative nature of hippo…
▽ More
Attempts to train a comprehensive artificial intelligence capable of solving multiple tasks have been impeded by a chronic problem called catastrophic forgetting. Although simply replaying all previous data alleviates the problem, it requires large memory and even worse, often infeasible in real world applications where the access to past data is limited. Inspired by the generative nature of hippocampus as a short-term memory system in primate brain, we propose the Deep Generative Replay, a novel framework with a cooperative dual model architecture consisting of a deep generative model ("generator") and a task solving model ("solver"). With only these two models, training data for previous tasks can easily be sampled and interleaved with those for a new task. We test our methods in several sequential learning settings involving image classification tasks.
△ Less
Submitted 11 December, 2017; v1 submitted 24 May, 2017;
originally announced May 2017.
-
Learning to Discover Cross-Domain Relations with Generative Adversarial Networks
Authors:
Taeksoo Kim,
Moonsu Cha,
Hyunsoo Kim,
Jung Kwon Lee,
Jiwon Kim
Abstract:
While humans easily recognize relations between data from different domains without any supervision, learning to automatically discover them is in general very challenging and needs many ground-truth pairs that illustrate the relations. To avoid costly pairing, we address the task of discovering cross-domain relations given unpaired data. We propose a method based on generative adversarial network…
▽ More
While humans easily recognize relations between data from different domains without any supervision, learning to automatically discover them is in general very challenging and needs many ground-truth pairs that illustrate the relations. To avoid costly pairing, we address the task of discovering cross-domain relations given unpaired data. We propose a method based on generative adversarial networks that learns to discover relations between different domains (DiscoGAN). Using the discovered relations, our proposed network successfully transfers style from one domain to another while preserving key attributes such as orientation and face identity. Source code for official implementation is publicly available https://github.com/SKTBrain/DiscoGAN
△ Less
Submitted 15 May, 2017; v1 submitted 15 March, 2017;
originally announced March 2017.
-
Accurate Image Super-Resolution Using Very Deep Convolutional Networks
Authors:
Jiwon Kim,
Jung Kwon Lee,
Kyoung Mu Lee
Abstract:
We present a highly accurate single-image super-resolution (SR) method. Our method uses a very deep convolutional network inspired by VGG-net used for ImageNet classification \cite{simonyan2015very}. We find increasing our network depth shows a significant improvement in accuracy. Our final model uses 20 weight layers. By cascading small filters many times in a deep network structure, contextual i…
▽ More
We present a highly accurate single-image super-resolution (SR) method. Our method uses a very deep convolutional network inspired by VGG-net used for ImageNet classification \cite{simonyan2015very}. We find increasing our network depth shows a significant improvement in accuracy. Our final model uses 20 weight layers. By cascading small filters many times in a deep network structure, contextual information over large image regions is exploited in an efficient way. With very deep networks, however, convergence speed becomes a critical issue during training. We propose a simple yet effective training procedure. We learn residuals only and use extremely high learning rates ($10^4$ times higher than SRCNN \cite{dong2015image}) enabled by adjustable gradient clip**. Our proposed method performs better than existing methods in accuracy and visual improvements in our results are easily noticeable.
△ Less
Submitted 11 November, 2016; v1 submitted 14 November, 2015;
originally announced November 2015.
-
Deeply-Recursive Convolutional Network for Image Super-Resolution
Authors:
Jiwon Kim,
Jung Kwon Lee,
Kyoung Mu Lee
Abstract:
We propose an image super-resolution method (SR) using a deeply-recursive convolutional network (DRCN). Our network has a very deep recursive layer (up to 16 recursions). Increasing recursion depth can improve performance without introducing new parameters for additional convolutions. Albeit advantages, learning a DRCN is very hard with a standard gradient descent method due to exploding/vanishing…
▽ More
We propose an image super-resolution method (SR) using a deeply-recursive convolutional network (DRCN). Our network has a very deep recursive layer (up to 16 recursions). Increasing recursion depth can improve performance without introducing new parameters for additional convolutions. Albeit advantages, learning a DRCN is very hard with a standard gradient descent method due to exploding/vanishing gradients. To ease the difficulty of training, we propose two extensions: recursive-supervision and skip-connection. Our method outperforms previous methods by a large margin.
△ Less
Submitted 11 November, 2016; v1 submitted 13 November, 2015;
originally announced November 2015.
-
Digital breadcrumbs: Detecting urban mobility patterns and transport mode choices from cellphone networks
Authors:
Thomas Holleczek,
Liang Yu,
Joseph K. Lee,
Oliver Senn,
Kristian Kloeckl,
Carlo Ratti,
Patrick Jaillet
Abstract:
Many modern and growing cities are facing declines in public transport usage, with few efficient methods to explain why. In this article, we show that urban mobility patterns and transport mode choices can be derived from cellphone call detail records coupled with public transport data recorded from smart cards. Specifically, we present new data mining approaches to determine the spatial and tempo…
▽ More
Many modern and growing cities are facing declines in public transport usage, with few efficient methods to explain why. In this article, we show that urban mobility patterns and transport mode choices can be derived from cellphone call detail records coupled with public transport data recorded from smart cards. Specifically, we present new data mining approaches to determine the spatial and temporal variability of public and private transportation usage and transport mode preferences across Singapore. Our results, which were validated by Singapore's quadriennial Household Interview Travel Survey (HITS), revealed that there are 3.5 (HITS: 3.5 million) million and 4.3 (HITS: 4.4 million) million inter-district passengers by public and private transport, respectively. Along with classifying which transportation connections are weak or underserved, the analysis shows that the mode share of public transport use increases from 38 percent in the morning to 44 percent around mid-day and 52 percent in the evening.
△ Less
Submitted 30 August, 2013;
originally announced August 2013.
-
Multilayer Approach to Defend Phishing Attacks
Authors:
Cynthia Dhinakaran,
Dhinaharan Nagamalai,
Jae Kwang Lee
Abstract:
Spam messes up users inbox, consumes resources and spread attacks like DDoS, MiM, phishing etc. Phishing is a byproduct of email and causes financial loss to users and loss of reputation to financial institutions. In this paper we examine the characteristics of phishing and technology used by Phishers. In order to counter anti-phishing technology, phishers change their mode of operation; therefore…
▽ More
Spam messes up users inbox, consumes resources and spread attacks like DDoS, MiM, phishing etc. Phishing is a byproduct of email and causes financial loss to users and loss of reputation to financial institutions. In this paper we examine the characteristics of phishing and technology used by Phishers. In order to counter anti-phishing technology, phishers change their mode of operation; therefore a continuous evaluation of phishing only helps us combat phisher effectiveness. In our study, we collected seven hundred thousand spam from a corporate server for a period of 13 months from February 2008 to February 2009. From the collected data, we identified different kinds of phishing scams and mode of operation. Our observation shows that phishers are dynamic and depend more on social engineering techniques rather than software vulnerabilities. We believe that this study will develop more efficient anti-phishing methodologies. Based on our analysis, we developed an anti-phishing methodology and implemented in our network. The results show that this approach is highly effective to prevent phishing attacks. The proposed approach reduced more than 80% of the false negatives and more than 95% of phishing attacks in our network.
△ Less
Submitted 7 August, 2011;
originally announced August 2011.
-
An In-depth Analysis of Spam and Spammers
Authors:
Dhinaharan Nagamalai,
Beatrice Cynthia Dhinakaran,
Jae Kwang Lee
Abstract:
Electronic mail services have become an important source of communication for millions of people all over the world. Due to this tremendous growth, there has been a significant increase in spam traffic. Spam messes up user's inbox, consumes network resources and spread worms and viruses. In this paper we study the characteristics of spam and the technology used by spammers. In order to counter ant…
▽ More
Electronic mail services have become an important source of communication for millions of people all over the world. Due to this tremendous growth, there has been a significant increase in spam traffic. Spam messes up user's inbox, consumes network resources and spread worms and viruses. In this paper we study the characteristics of spam and the technology used by spammers. In order to counter anti spam technology, spammers change their mode of operation, therefore continues evaluation of the characteristics of spam and spammers technology has become mandatory. These evaluations help us to enhance the existing anti spam technology and thereby help us to combat spam effectively. In order to characterize spam, we collected four hundred thousand spam mails from a corporate mail server for a period of 14 months from January 2006 to February 2007. For analysis we classified spam based on attachment and contents. We observed that spammers use software tools to send spam with attachment. The main features of this software are hiding sender's identity, randomly selecting text messages, identifying open relay machines, mass mailing capability and defining spamming duration. Spammers do not use spam software to send spam without attachment. From our study we observed that, four years old heavy users email accounts attract more spam than four years old light users mail accounts. Relatively new email accounts which are 14 months old do not receive spam. But in some special cases like DDoS attacks, we found that new email accounts receive spam and 14 months old heavy users email accounts have attracted more spam than 14 months old light users. We believe that this analysis could be useful to develop more efficient anti spam techniques.
△ Less
Submitted 7 December, 2010;
originally announced December 2010.
-
Bayesian Based Comment Spam Defending Tool
Authors:
Dhinaharan Nagamalai,
Beatrice Cynthia Dhinakaran,
Jae Kwang Lee
Abstract:
Spam messes up user's inbox, consumes network resources and spread worms and viruses. Spam is flooding of unsolicited, unwanted e mail. Spam in blogs is called blog spam or comment spam.It is done by posting comments or flooding spams to the services such as blogs, forums,news,email archives and guestbooks. Blog spams generally appears on guestbooks or comment pages where spammers fill a comment b…
▽ More
Spam messes up user's inbox, consumes network resources and spread worms and viruses. Spam is flooding of unsolicited, unwanted e mail. Spam in blogs is called blog spam or comment spam.It is done by posting comments or flooding spams to the services such as blogs, forums,news,email archives and guestbooks. Blog spams generally appears on guestbooks or comment pages where spammers fill a comment box with spam words. In addition to wasting user's time with unwanted comments, spam also consumes a lot of bandwidth. In this paper, we propose a software tool to prevent such blog spams by using Bayesian Algorithm based technique. It is derived from Bayes' Theorem. It gives an output which has a probability that any comment is spam, given that it has certain words in it. With using our past entries and a comment entry, this value is obtained and compared with a threshold value to find if it exceeds the threshold value or not. By using this concept, we developed a software tool to block comment spam. The experimental results show that the Bayesian based tool is working well. This paper has the major findings and their significance of blog spam filter.
△ Less
Submitted 14 November, 2010;
originally announced November 2010.
-
Characterizing Spam traffic and Spammers
Authors:
Cynthia Dhinakaran,
Dhinaharan Nagamalai,
Jae Kwang Lee
Abstract:
There is a tremendous increase in spam traffic these days. Spam messages muddle up users inbox, consume network resources, and build up DDoS attacks, spread worms and viruses. Our goal is to present a definite figure about the characteristics of spam and spammers. Since spammers change their mode of operation to counter anti spam technology,continues evaluation of the characteristics of spam and s…
▽ More
There is a tremendous increase in spam traffic these days. Spam messages muddle up users inbox, consume network resources, and build up DDoS attacks, spread worms and viruses. Our goal is to present a definite figure about the characteristics of spam and spammers. Since spammers change their mode of operation to counter anti spam technology,continues evaluation of the characteristics of spam and spammers technology has become mandatory. These evaluations help us to enhance the existing technology to combat spam effectively. We collected 400 thousand spam mails from a spam trap set up in a corporate mail server for a period of 14 months form January 2006 to February 2007. Spammers use common techniques to spam end users regardless of corporate server and public mail server. So we believe that our spam collection is a sample of world wide spam traffic. Studying the characteristics of this sample helps us to better understand the features of spam and spammers technology. We believe that this analysis could be useful to develop more efficient anti spam techniques.
△ Less
Submitted 3 November, 2010;
originally announced November 2010.
-
An Empirical Study of Spam and Spam Vulnerable email Accounts
Authors:
Cynthia Dhinakaran,
Dhinaharan Nagamalai,
Jae Kwang Lee
Abstract:
Spam messages muddle up users inbox, consume network resources, and build up DDoS attacks, spread malware. Our goal is to present a definite figure about the characteristics of spam and spam vulnerable email accounts. These evaluations help us to enhance the existing technology to combat spam effectively. We collected 400 thousand spam mails from a spam trap set up in a corporate mail server for a…
▽ More
Spam messages muddle up users inbox, consume network resources, and build up DDoS attacks, spread malware. Our goal is to present a definite figure about the characteristics of spam and spam vulnerable email accounts. These evaluations help us to enhance the existing technology to combat spam effectively. We collected 400 thousand spam mails from a spam trap set up in a corporate mail server for a period of 14 months form January 2006 to February 2007. Spammers use common techniques to spam end users regardless of corporate server and public mail server. So we believe that our spam collection is a sample of world wide spam traffic. Studying the characteristics of this sample helps us to better understand the features of spam and spam vulnerable e-mail accounts. We believe that this analysis is highly useful to develop more efficient anti spam techniques. In our analysis we classified spam based on attachment and contents. According to our study the four years old heavy users email accounts attract more spam than four years oldlight users mail accounts. The 14 months old relatively new email accounts don't receive spam. In some special cases like DDoS attacks, the new email accounts receive spam. During DDoS attack 14 months old heavy users email accounts have attracted more number of spam than 14 months old light users mail accounts.
△ Less
Submitted 2 November, 2010;
originally announced November 2010.
-
"Reminder: please update your details": Phishing Trends
Authors:
Cynthia Dhinakaran,
Jae Kwang lee,
Dhinaharan Nagamalai
Abstract:
Spam messes up users inbox, consumes resources and spread attacks like DDoS, MiM, Phishing etc., Phishing is a byproduct of email and causes financial loss to users and loss of reputation to financial institutions. In this paper we study the characteristics of phishing and technology used by phishers. In order to counter anti phishing technology, phishers change their mode of operation; therefore…
▽ More
Spam messes up users inbox, consumes resources and spread attacks like DDoS, MiM, Phishing etc., Phishing is a byproduct of email and causes financial loss to users and loss of reputation to financial institutions. In this paper we study the characteristics of phishing and technology used by phishers. In order to counter anti phishing technology, phishers change their mode of operation; therefore continuous evaluation of phishing helps us to combat phishers effectively. We have collected seven hundred thousand spam from a corporate server for a period of 13 months from February 2008 to February 2009. From the collected date, we identified different kinds of phishing scams and mode of their operation. Our observation shows that phishers are dynamic and depend more on social engineering techniques rather than software vulnerabilities. We believe that this study would be useful to develop more efficient anti phishing methodologies.
△ Less
Submitted 13 October, 2010;
originally announced October 2010.
-
Multi Layer Approach to Defend DDoS Attacks Caused by Spam
Authors:
Dhinaharan Nagamalai,
Cynthia Dhinakaran,
Jae Kwang Lee
Abstract:
Corporate mail services are designed to perform better than public mail services. Fast mail delivery, large size file transfer as an attachments, high level spam and virus protection, commercial advertisement free environment are some of the advantages worth to mention. But these mail services are frequent target of hackers and spammers. Distributed Denial of service attacks are becoming more comm…
▽ More
Corporate mail services are designed to perform better than public mail services. Fast mail delivery, large size file transfer as an attachments, high level spam and virus protection, commercial advertisement free environment are some of the advantages worth to mention. But these mail services are frequent target of hackers and spammers. Distributed Denial of service attacks are becoming more common and sophisticated. The researchers have proposed various solutions to the DDOS attacks. Can we stop these kinds of attacks with available technology? These days the DDoS attack through spam has increased and disturbed the mail services of various organizations. Spam penetrates through all the filters to establish DDoS attacks, which causes serious problems to users and the data. In this paper we propose a multilayer approach to defend DDoS attack caused by spam mails. This approach is a combination of fine tuning of source filters, content filters, strictly implementing mail policies, educating user, network monitoring and logical solutions to the ongoing attack. We have conducted several experiments in corporate mail services; the results show that this approach is highly effective to prevent DDoS attack caused by spam. The defense mechanism reduced 60% of the incoming spam traffic and repelled many DDoS attacks caused by spam
△ Less
Submitted 7 October, 2010;
originally announced October 2010.
-
An Approximation of the Outage Probability for Multi-hop AF Fixed Gain Relay
Authors:
Jun Kyoung Lee,
Janghoon Yang,
Dong Ku Kim
Abstract:
In this letter, we present a closed-form approximation of the outage probability for the multi-hop amplify-and-forward (AF) relaying systems with fixed gain in Rayleigh fading channel. The approximation is derived from the outage event for each hop. The simulation results show the tightness of the proposed approximation in low and high signal-to-noise ratio (SNR) region.
In this letter, we present a closed-form approximation of the outage probability for the multi-hop amplify-and-forward (AF) relaying systems with fixed gain in Rayleigh fading channel. The approximation is derived from the outage event for each hop. The simulation results show the tightness of the proposed approximation in low and high signal-to-noise ratio (SNR) region.
△ Less
Submitted 4 December, 2008;
originally announced December 2008.