-
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Authors:
Byungsoo Jeon,
Mengdi Wu,
Shiyi Cao,
Sunghyun Kim,
Sunghyun Park,
Neeraj Aggarwal,
Colin Unger,
Daiyaan Arfeen,
Peiyuan Liao,
Xupeng Miao,
Mohammad Alizadeh,
Gregory R. Ganger,
Tianqi Chen,
Zhihao Jia
Abstract:
Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only c…
▽ More
Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only consider sequential pipeline stages and thus ignore the topology of a DNN, resulting in missed model-parallel opportunities. This paper presents graph pipeline parallelism (GPP), a new pipeline-parallel scheme that partitions a DNN into pipeline stages whose dependencies are identified by a directed acyclic graph. GPP generalizes existing sequential pipeline parallelism and preserves the inherent topology of a DNN to enable concurrent execution of computationally-independent operators, resulting in reduced memory requirement and improved GPU performance. In addition, we develop GraphPipe, a distributed system that exploits GPP strategies to enable performant and scalable DNN training. GraphPipe partitions a DNN into a graph of stages, optimizes micro-batch schedules for these stages, and parallelizes DNN training using the discovered GPP strategies. Evaluation on a variety of DNNs shows that GraphPipe outperforms existing pipeline-parallel systems such as PipeDream and Piper by up to 1.6X. GraphPipe also reduces the search time by 9-21X compared to PipeDream and Piper.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Authors:
Tsai-Shien Chen,
Aliaksandr Siarohin,
Willi Menapace,
Ekaterina Deyneka,
Hsiang-wei Chao,
Byung Eun Jeon,
Yuwei Fang,
Hsin-Ying Lee,
Jian Ren,
Ming-Hsuan Yang,
Sergey Tulyakov
Abstract:
The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to collect. First of all, manual labeling is more time-consuming, as it requires an annotator to watch an entire video. Second, videos have a temporal dimension, consisting of several scenes stacked together, a…
▽ More
The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to collect. First of all, manual labeling is more time-consuming, as it requires an annotator to watch an entire video. Second, videos have a temporal dimension, consisting of several scenes stacked together, and showing multiple actions. Accordingly, to establish a video dataset with high-quality captions, we propose an automatic approach leveraging multimodal inputs, such as textual video description, subtitles, and individual video frames. Specifically, we curate 3.8M high-resolution videos from the publicly available HD-VILA-100M dataset. We then split them into semantically consistent video clips, and apply multiple cross-modality teacher models to obtain captions for each video. Next, we finetune a retrieval model on a small subset where the best caption of each video is manually selected and then employ the model in the whole dataset to select the best caption as the annotation. In this way, we get 70M videos paired with high-quality text captions. We dub the dataset as Panda-70M. We show the value of the proposed dataset on three downstream tasks: video captioning, video and text retrieval, and text-driven video generation. The models trained on the proposed data score substantially better on the majority of metrics across all the tasks.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search
Authors:
Chanwoong Yoon,
Gangwoo Kim,
Byeongguk Jeon,
Sungdong Kim,
Yohan Jo,
Jaewoo Kang
Abstract:
Conversational search, unlike single-turn retrieval tasks, requires understanding the current question within a dialogue context. The common approach of rewrite-then-retrieve aims to decontextualize questions to be self-sufficient for off-the-shelf retrievers, but most existing methods produce sub-optimal query rewrites due to the limited ability to incorporate signals from the retrieval results.…
▽ More
Conversational search, unlike single-turn retrieval tasks, requires understanding the current question within a dialogue context. The common approach of rewrite-then-retrieve aims to decontextualize questions to be self-sufficient for off-the-shelf retrievers, but most existing methods produce sub-optimal query rewrites due to the limited ability to incorporate signals from the retrieval results. To overcome this limitation, we present a novel framework RetPO (Retriever's Preference Optimization), which is designed to optimize a language model (LM) for reformulating search queries in line with the preferences of the target retrieval systems. The process begins by prompting a large LM to produce various potential rewrites and then collects retrieval performance for these rewrites as the retrievers' preferences. Through the process, we construct a large-scale dataset called RF collection, containing Retrievers' Feedback on over 410K query rewrites across 12K conversations. Furthermore, we fine-tune a smaller LM using this dataset to align it with the retrievers' preferences as feedback. The resulting model achieves state-of-the-art performance on two recent conversational search benchmarks, significantly outperforming existing baselines, including GPT-3.5.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Motion-induced error reduction for high-speed dynamic digital fringe projection system
Authors:
Sanghoon Jeon,
Hyo-Geon Lee,
Jae-Sung Lee,
Bo-Min Kang,
Byung-Wook Jeon,
Jun Young Yoon,
Jae-Sang Hyun
Abstract:
In phase-shifting profilometry (PSP), any motion during the acquisition of fringe patterns can introduce errors because it assumes both the object and measurement system are stationary. Therefore, we propose a method to pixel-wise reduce the errors when the measurement system is in motion due to a motorized linear stage. The proposed method introduces motion-induced error reduction algorithm, whic…
▽ More
In phase-shifting profilometry (PSP), any motion during the acquisition of fringe patterns can introduce errors because it assumes both the object and measurement system are stationary. Therefore, we propose a method to pixel-wise reduce the errors when the measurement system is in motion due to a motorized linear stage. The proposed method introduces motion-induced error reduction algorithm, which leverages the motor's encoder and pinhole model of the camera and projector. 3D shape measurement is possible with only three fringe patterns by applying geometric constraints of the digital fringe projection system. We address the mismatch problem due to the motion-induced camera pixel disparities and reduce phase-shift errors. These processes are easy to implement and require low computational cost. Experimental results demonstrate that the presented method effectively reduces the errors even in non-uniform motion.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models
Authors:
Gangwoo Kim,
Sungdong Kim,
Byeongguk Jeon,
Joonsuk Park,
Jaewoo Kang
Abstract:
Questions in open-domain question answering are often ambiguous, allowing multiple interpretations. One approach to handling them is to identify all possible interpretations of the ambiguous question (AQ) and to generate a long-form answer addressing them all, as suggested by Stelmakh et al., (2022). While it provides a comprehensive response without bothering the user for clarification, consideri…
▽ More
Questions in open-domain question answering are often ambiguous, allowing multiple interpretations. One approach to handling them is to identify all possible interpretations of the ambiguous question (AQ) and to generate a long-form answer addressing them all, as suggested by Stelmakh et al., (2022). While it provides a comprehensive response without bothering the user for clarification, considering multiple dimensions of ambiguity and gathering corresponding knowledge remains a challenge. To cope with the challenge, we propose a novel framework, Tree of Clarifications (ToC): It recursively constructs a tree of disambiguations for the AQ -- via few-shot prompting leveraging external knowledge -- and uses it to generate a long-form answer. ToC outperforms existing baselines on ASQA in a few-shot setup across the metrics, while surpassing fully-supervised baselines trained on the whole training set in terms of Disambig-F1 and Disambig-ROUGE. Code is available at https://github.com/gankim/tree-of-clarifications.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
MCPNS: A Macropixel Collocated Position and Its Neighbors Search for Plenoptic 2.0 Video Coding
Authors:
Vinh Van Duong,
Thuc Nguyen Huu,
Jonghoon Yim,
Byeungwoo Jeon
Abstract:
Recently, it was demonstrated that a newly focused plenoptic 2.0 camera can capture much higher spatial resolution owing to its effective light field sampling, as compared to a traditional unfocused plenoptic 1.0 camera. However, due to the nature difference of the optical structure between the plenoptic 1.0 and 2.0 cameras, the existing fast motion estimation (ME) method for plenoptic 1.0 videos…
▽ More
Recently, it was demonstrated that a newly focused plenoptic 2.0 camera can capture much higher spatial resolution owing to its effective light field sampling, as compared to a traditional unfocused plenoptic 1.0 camera. However, due to the nature difference of the optical structure between the plenoptic 1.0 and 2.0 cameras, the existing fast motion estimation (ME) method for plenoptic 1.0 videos is expected to be sub-optimal for encoding plenoptic 2.0 videos. In this paper, we point out the main motion characteristic differences between plenoptic 1.0 and 2.0 videos and then propose a new fast ME, called macropixel collocated position and its neighbors search (MCPNS) for plenoptic 2.0 videos. In detail, we propose to reduce the number of macropixel collocated position (MCP) search candidates based on the new observation of center-biased motion vector distribution at macropixel resolution. After that, due to large motion deviation behavior around each MCP location in plenoptic 2.0 videos, we propose to select a certain number of key MCP locations with the lowest matching cost to perform the neighbors MCP search to improve the motion search accuracy. Different from existing methods, our method can achieve better performance without requiring prior knowledge of microlens array orientations. Our simulation results confirmed the effectiveness of the proposed algorithm in terms of both bitrate savings and computational costs compared to existing methods.
△ Less
Submitted 27 November, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
QP Chaser: Polynomial Trajectory Generation for Autonomous Aerial Tracking
Authors:
Yunwoo Lee,
Jungwon Park,
Seungwoo Jung,
Boseong Jeon,
Dahyun Oh,
H. ** Kim
Abstract:
Maintaining the visibility of the targets is one of the major objectives of aerial tracking applications. This paper proposes QP Chaser, a trajectory planning pipeline that can enhance the visibility of single- and dual-target in both static and dynamic environments. As the name suggests, the proposed planner generates a target-visible trajectory via quadratic programming problems. First, the pred…
▽ More
Maintaining the visibility of the targets is one of the major objectives of aerial tracking applications. This paper proposes QP Chaser, a trajectory planning pipeline that can enhance the visibility of single- and dual-target in both static and dynamic environments. As the name suggests, the proposed planner generates a target-visible trajectory via quadratic programming problems. First, the predictor forecasts the reachable sets of moving objects with a sample-and-check strategy considering obstacles. Subsequently, the trajectory planner reinforces the visibility of targets with consideration of 1) path topology and 2) reachable sets of targets and obstacles. We define a target-visible region (TVR) with topology analysis of not only static obstacles but also dynamic obstacles, and it reflects reachable sets of moving targets and obstacles to maintain the whole body of the target within the camera image robustly and ceaselessly. The online performance of the proposed planner is validated in multiple scenarios, including high-fidelity simulations and real-world experiments.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification
Authors:
Kwangje Baeg,
Yeong-Gwan Kim,
Young-Sub Han,
Byoung-Ki Jeon
Abstract:
Recently, researchers have utilized neural network-based speaker embedding techniques in speaker-recognition tasks to identify speakers accurately. However, speaker-discriminative embeddings do not always represent speech features such as age group well. In an embedding model that has been highly trained to capture speaker traits, the task of age group classification is closer to speech informatio…
▽ More
Recently, researchers have utilized neural network-based speaker embedding techniques in speaker-recognition tasks to identify speakers accurately. However, speaker-discriminative embeddings do not always represent speech features such as age group well. In an embedding model that has been highly trained to capture speaker traits, the task of age group classification is closer to speech information leakage. Hence, to improve age group classification performance, we consider the use of speaker-discriminative embeddings derived from adversarial multi-task learning to align features and reduce the domain discrepancy in age subgroups. In addition, we investigated different types of speaker embeddings to learn and generalize the domain-invariant representations for age groups. Experimental results on the VoxCeleb Enrichment dataset verify the effectiveness of our proposed adaptive adversarial network in multi-objective scenarios and leveraging speaker embeddings for the domain adaptation task.
△ Less
Submitted 22 January, 2023;
originally announced January 2023.
-
Baechi: Fast Device Placement of Machine Learning Graphs
Authors:
Beomyeol Jeon,
Linda Cai,
Chirag Shetty,
Pallavi Srivastava,
**tao Jiang,
Xiaolan Ke,
Yitao Meng,
Cong Xie,
Indranil Gupta
Abstract:
Machine Learning graphs (or models) can be challenging or impossible to train when either devices have limited memory, or models are large. To split the model across devices, learning-based approaches are still popular. While these result in model placements that train fast on data (i.e., low step times), learning-based model-parallelism is time-consuming, taking many hours or days to create a pla…
▽ More
Machine Learning graphs (or models) can be challenging or impossible to train when either devices have limited memory, or models are large. To split the model across devices, learning-based approaches are still popular. While these result in model placements that train fast on data (i.e., low step times), learning-based model-parallelism is time-consuming, taking many hours or days to create a placement plan of operators on devices. We present the Baechi system, the first to adopt an algorithmic approach to the placement problem for running machine learning training graphs on small clusters of memory-constrained devices. We integrate our implementation of Baechi into two popular open-source learning frameworks: TensorFlow and PyTorch. Our experimental results using GPUs show that: (i) Baechi generates placement plans 654 X - 206K X faster than state-of-the-art learning-based approaches, and (ii) Baechi-placed model's step (training) time is comparable to expert placements in PyTorch, and only up to 6.2% worse than expert placements in TensorFlow. We prove mathematically that our two algorithms are within a constant factor of the optimal. Our work shows that compared to learning-based approaches, algorithmic approaches can face different challenges for adaptation to Machine learning systems, but also they offer proven bounds, and significant performance benefits.
△ Less
Submitted 20 January, 2023;
originally announced January 2023.
-
Deep Reinforcement Learning for Asset Allocation: Reward Clip**
Authors:
Jiwon Kim,
Moon-Ju Kang,
KangHun Lee,
HyungJun Moon,
Bo-Kwan Jeon
Abstract:
Recently, there are many trials to apply reinforcement learning in asset allocation for earning more stable profits. In this paper, we compare performance between several reinforcement learning algorithms - actor-only, actor-critic and PPO models. Furthermore, we analyze each models' character and then introduce the advanced algorithm, so called Reward clip** model. It seems that the Reward Clip…
▽ More
Recently, there are many trials to apply reinforcement learning in asset allocation for earning more stable profits. In this paper, we compare performance between several reinforcement learning algorithms - actor-only, actor-critic and PPO models. Furthermore, we analyze each models' character and then introduce the advanced algorithm, so called Reward clip** model. It seems that the Reward Clip** model is better than other existing models in finance domain, especially portfolio optimization - it has strength both in bull and bear markets. Finally, we compare the performance for these models with traditional investment strategies during decreasing and increasing markets.
△ Less
Submitted 1 January, 2023;
originally announced January 2023.
-
Ray-Space Motion Compensation for Lenslet Plenoptic Video Coding
Authors:
Thuc Nguyen Huu,
Vinh Van Duong,
Jonghoon Yim,
Byeungwoo Jeon
Abstract:
Plenoptic images and videos bearing rich information demand a tremendous amount of data storage and high transmission cost. While there has been much study on plenoptic image coding, investigations into plenoptic video coding have been very limited. We investigate the motion compensation for plenoptic video coding from a slightly different perspective by looking at the problem in the ray-space dom…
▽ More
Plenoptic images and videos bearing rich information demand a tremendous amount of data storage and high transmission cost. While there has been much study on plenoptic image coding, investigations into plenoptic video coding have been very limited. We investigate the motion compensation for plenoptic video coding from a slightly different perspective by looking at the problem in the ray-space domain instead of in the conventional pixel domain. Here, we develop a novel motion compensation scheme for lenslet video under two sub-cases of ray-space motion, that is, integer ray-space motion and fractional ray-space motion. The proposed new scheme of light field motion-compensated prediction is designed such that it can be easily integrated into well-known video coding techniques such as HEVC. Experimental results compared to relevant existing methods have shown remarkable compression efficiency with an average gain of 19.63% and a peak gain of 29.1%.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
Cut and Continuous Paste towards Real-time Deep Fall Detection
Authors:
Sunhee Hwang,
Minsong Ki,
Seung-Hyun Lee,
Sanghoon Park,
Byoung-Ki Jeon
Abstract:
Deep learning based fall detection is one of the crucial tasks for intelligent video surveillance systems, which aims to detect unintentional falls of humans and alarm dangerous situations. In this work, we propose a simple and efficient framework to detect falls through a single and small-sized convolutional neural network. To this end, we first introduce a new image synthesis method that represe…
▽ More
Deep learning based fall detection is one of the crucial tasks for intelligent video surveillance systems, which aims to detect unintentional falls of humans and alarm dangerous situations. In this work, we propose a simple and efficient framework to detect falls through a single and small-sized convolutional neural network. To this end, we first introduce a new image synthesis method that represents human motion in a single frame. This simplifies the fall detection task as an image classification task. Besides, the proposed synthetic data generation method enables to generate a sufficient amount of training dataset, resulting in satisfactory performance even with the small model. At the inference step, we also represent real human motion in a single image by estimating mean of input frames. In the experiment, we conduct both qualitative and quantitative evaluations on URFD and AIHub airport datasets to show the effectiveness of our method.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.
-
Information-Weighted Consensus Filter with Partial Information Exchange
Authors:
Byoung-Ju Jeon,
Shaoming He
Abstract:
In this paper, the information-weighted consensus filter (ICF) with partial information exchange is proposed to reduce the bandwidth of the signals transmitted between the sensor nodes and guarantee its convergence to the centralized Kalman filter (CKF). In the proposed algorithm, a part of information chosen with the entry selection matrix is transmitted to the sensor nodes in the neighborhood at…
▽ More
In this paper, the information-weighted consensus filter (ICF) with partial information exchange is proposed to reduce the bandwidth of the signals transmitted between the sensor nodes and guarantee its convergence to the centralized Kalman filter (CKF). In the proposed algorithm, a part of information chosen with the entry selection matrix is transmitted to the sensor nodes in the neighborhood at each consensus step, and consensus averaging is conducted at each sensor node with the partial and the local information. This ensures that the proposed distributed estimation algorithm converges to the centralized algorithm, while allowing the proposed algorithm to achieve bandwidth reduction of the signals transmitted between the sensors. With the proposed algorithm, the stability of the estimation error dynamics is proven and the convergence to the centralized algorithm is mathematically shown using the property of the average consensus. Simulations are conducted to validate the proposed ICF with partial information exchange and the related theoretical findings.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Aerial Chasing of a Dynamic Target in Complex Environments
Authors:
Boseong Felipe Jeon,
Changhyeon Kim,
Hojoon Shin,
H. ** Kim
Abstract:
Rapidly generating an optimal chasing motion of a drone to follow a dynamic target among obstacles is challenging due to numerical issues rising from multiple conflicting objectives and non-convex constraints. This study proposes to resolve the difficulties with a fast and reliable pipeline that incorporates 1) a target movement forecaster and 2) a chasing planner. They are based on a sample-and-c…
▽ More
Rapidly generating an optimal chasing motion of a drone to follow a dynamic target among obstacles is challenging due to numerical issues rising from multiple conflicting objectives and non-convex constraints. This study proposes to resolve the difficulties with a fast and reliable pipeline that incorporates 1) a target movement forecaster and 2) a chasing planner. They are based on a sample-and-check approach that consists of the generation of high-quality candidate primitives and the feasibility tests with a light computation load. We forecast the movement of the target by selecting an optimal prediction among a set of candidates built from past observations. Based on the prediction, we construct a set of prospective chasing trajectories which reduce the high-order derivatives, while maintaining the desired relative distance from the predicted target movement. Then, the candidate trajectories are tested on safety of the chaser and visibility toward the target without loose approximation of the constraints. The proposed algorithm is thoroughly evaluated in challenging scenarios involving dynamic obstacles. Also, the overall process from the target recognition to the chasing motion planning is implemented fully onboard on a drone, demonstrating real-world applicability.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Collage: Seamless Integration of Deep Learning Backends with Automatic Placement
Authors:
Byungsoo Jeon,
Sunghyun Park,
Peiyuan Liao,
Sheng Xu,
Tianqi Chen,
Zhihao Jia
Abstract:
The strong demand for efficient and performant deployment of Deep Learning (DL) applications prompts the rapid development of a rich DL ecosystem. To keep up with this fast advancement, it is crucial for modern DL frameworks to efficiently integrate a variety of optimized tensor algebra libraries and runtimes as their backends and generate the fastest possible executable using these backends. Howe…
▽ More
The strong demand for efficient and performant deployment of Deep Learning (DL) applications prompts the rapid development of a rich DL ecosystem. To keep up with this fast advancement, it is crucial for modern DL frameworks to efficiently integrate a variety of optimized tensor algebra libraries and runtimes as their backends and generate the fastest possible executable using these backends. However, current DL frameworks require significant manual effort and expertise to integrate every new backend while failing to unleash its full potential. Given the fast-evolving nature of the DL ecosystem, this manual approach often slows down continuous innovations across different layers; it prevents hardware vendors from the fast deployment of their cutting-edge libraries, DL framework developers must repeatedly adjust their hand-coded rules to accommodate new versions of libraries, and machine learning practitioners need to wait for the integration of new technologies and often encounter unsatisfactory performance.
In this paper, we propose Collage, a DL framework that offers seamless integration of DL backends. Collage provides an expressive backend registration interface that allows users to precisely specify the capability of various backends. By leveraging the specifications of available backends, Collage automatically searches for an optimized backend placement strategy for a given workload and execution environment. Our evaluation shows that Collage outperforms the best existing framework for each hardware by $1.26\times$, $1.43\times$, $1.40\times$ on average on NVIDIA's RTX 2070 GPU, V100 GPU, and Intel's Xeon 8259CL CPU, respectively. Collage has been open-sourced and deployed in Apache TVM.
△ Less
Submitted 27 October, 2022; v1 submitted 31 October, 2021;
originally announced November 2021.
-
Task-Driven Deep Image Enhancement Network for Autonomous Driving in Bad Weather
Authors:
Younkwan Lee,
Jihyo Jeon,
Yeongmin Ko,
Byunggwan Jeon,
Moongu Jeon
Abstract:
Visual perception in autonomous driving is a crucial part of a vehicle to navigate safely and sustainably in different traffic conditions. However, in bad weather such as heavy rain and haze, the performance of visual perception is greatly affected by several degrading effects. Recently, deep learning-based perception methods have addressed multiple degrading effects to reflect real-world bad weat…
▽ More
Visual perception in autonomous driving is a crucial part of a vehicle to navigate safely and sustainably in different traffic conditions. However, in bad weather such as heavy rain and haze, the performance of visual perception is greatly affected by several degrading effects. Recently, deep learning-based perception methods have addressed multiple degrading effects to reflect real-world bad weather cases but have shown limited success due to 1) high computational costs for deployment on mobile devices and 2) poor relevance between image enhancement and visual perception in terms of the model ability. To solve these issues, we propose a task-driven image enhancement network connected to the high-level vision task, which takes in an image corrupted by bad weather as input. Specifically, we introduce a novel low memory network to reduce most of the layer connections of dense blocks for less memory and computational cost while maintaining high performance. We also introduce a new task-driven training strategy to robustly guide the high-level task model suitable for both high-quality restoration of images and highly accurate perception. Experiment results demonstrate that the proposed method improves the performance among lane and 2D object detection, and depth estimation largely under adverse weather in terms of both low memory and accuracy.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
Convolutional Neural Network-based Intrusion Detection System for AVTP Streams in Automotive Ethernet-based Networks
Authors:
Seonghoon Jeong,
Boosun Jeon,
Boheung Chung,
Huy Kang Kim
Abstract:
Connected and autonomous vehicles (CAVs) are an innovative form of traditional vehicles. Automotive Ethernet replaces the controller area network and FlexRay to support the large throughput required by high-definition applications. As CAVs have numerous functions, they exhibit a large attack surface and an increased vulnerability to attacks. However, no previous studies have focused on intrusion d…
▽ More
Connected and autonomous vehicles (CAVs) are an innovative form of traditional vehicles. Automotive Ethernet replaces the controller area network and FlexRay to support the large throughput required by high-definition applications. As CAVs have numerous functions, they exhibit a large attack surface and an increased vulnerability to attacks. However, no previous studies have focused on intrusion detection in automotive Ethernet-based networks. In this paper, we present an intrusion detection method for detecting audio-video transport protocol (AVTP) stream injection attacks in automotive Ethernet-based networks. To the best of our knowledge, this is the first such method developed for automotive Ethernet. The proposed intrusion detection model is based on feature generation and a convolutional neural network (CNN). To evaluate our intrusion detection system, we built a physical BroadR-Reach-based testbed and captured real AVTP packets. The experimental results show that the model exhibits outstanding performance: the F1-score and recall are greater than 0.9704 and 0.9949, respectively. In terms of the inference time per input and the generation intervals of AVTP traffic, our CNN model can readily be employed for real-time detection.
△ Less
Submitted 6 February, 2021;
originally announced February 2021.
-
Privacy-preserving Decentralized Aggregation for Federated Learning
Authors:
Beomyeol Jeon,
S. M. Ferdous,
Muntasir Raihan Rahman,
Anwar Walid
Abstract:
Federated learning is a promising framework for learning over decentralized data spanning multiple regions. This approach avoids expensive central training data aggregation cost and can improve privacy because distributed sites do not have to reveal privacy-sensitive data. In this paper, we develop a privacy-preserving decentralized aggregation protocol for federated learning. We formulate the dis…
▽ More
Federated learning is a promising framework for learning over decentralized data spanning multiple regions. This approach avoids expensive central training data aggregation cost and can improve privacy because distributed sites do not have to reveal privacy-sensitive data. In this paper, we develop a privacy-preserving decentralized aggregation protocol for federated learning. We formulate the distributed aggregation protocol with the Alternating Direction Method of Multiplier (ADMM) and examine its privacy weakness. Unlike prior work that use Differential Privacy or homomorphic encryption for privacy, we develop a protocol that controls communication among participants in each round of aggregation to minimize privacy leakage. We establish its privacy guarantee against an honest-but-curious adversary. We also propose an efficient algorithm to construct such a communication pattern, inspired by combinatorial block design theory. Our secure aggregation protocol based on this novel group communication pattern design leads to an efficient algorithm for federated training with privacy guarantees. We evaluate our federated training algorithm on image classification and next-word prediction applications over benchmark datasets with 9 and 15 distributed sites. Evaluation results show that our algorithm performs comparably to the standard centralized federated learning method while preserving privacy; the degradation in test accuracy is only up to 0.73%.
△ Less
Submitted 28 December, 2020; v1 submitted 13 December, 2020;
originally announced December 2020.
-
Detection-Aware Trajectory Generation for a Drone Cinematographer
Authors:
Boseong Felipe Jeon,
Dongseok Shim,
H. ** Kim
Abstract:
This work investigates an efficient trajectory generation for chasing a dynamic target, which incorporates the detectability objective. The proposed method actively guides the motion of a cinematographer drone so that the color of a target is well-distinguished against the colors of the background in the view of the drone. For the objective, we define a measure of color detectability given a chasi…
▽ More
This work investigates an efficient trajectory generation for chasing a dynamic target, which incorporates the detectability objective. The proposed method actively guides the motion of a cinematographer drone so that the color of a target is well-distinguished against the colors of the background in the view of the drone. For the objective, we define a measure of color detectability given a chasing path. After computing a discrete path optimized for the metric, we generate a dynamically feasible trajectory. The whole pipeline can be updated on-the-fly to respond to the motion of the target. For the efficient discrete path generation, we construct a directed acyclic graph (DAG) for which a topological sorting can be determined analytically without the depth-first search. The smooth path is obtained in quadratic programming (QP) framework. We validate the enhanced performance of state-of-the-art object detection and tracking algorithms when the camera drone executes the trajectory obtained from the proposed method.
△ Less
Submitted 3 September, 2020;
originally announced September 2020.
-
Multi-Scale Deep Compressive Imaging
Authors:
Thuong Nguyen Canh,
Byeungwoo Jeon
Abstract:
Recently, deep learning-based compressive imaging (DCI) has surpassed the conventional compressive imaging in reconstruction quality and faster running time. While multi-scale has shown superior performance over single-scale, research in DCI has been limited to single-scale sampling. Despite training with single-scale images, DCI tends to favor low-frequency components similar to the conventional…
▽ More
Recently, deep learning-based compressive imaging (DCI) has surpassed the conventional compressive imaging in reconstruction quality and faster running time. While multi-scale has shown superior performance over single-scale, research in DCI has been limited to single-scale sampling. Despite training with single-scale images, DCI tends to favor low-frequency components similar to the conventional multi-scale sampling, especially at low subrate. From this perspective, it would be easier for the network to learn multi-scale features with a multi-scale sampling architecture. In this work, we proposed a multi-scale deep compressive imaging (MS-DCI) framework which jointly learns to decompose, sample, and reconstruct images at multi-scale. A three-phase end-to-end training scheme was introduced with an initial and two enhance reconstruction phases to demonstrate the efficiency of multi-scale sampling and further improve the reconstruction performance. We analyzed the decomposition methods (including Pyramid, Wavelet, and Scale-space), sampling matrices, and measurements and showed the empirical benefit of MS-DCI which consistently outperforms both conventional and deep learning-based approaches.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
Restricted Structural Random Matrix for Compressive Sensing
Authors:
Thuong Nguyen Canh,
Byeungwoo Jeon
Abstract:
Compressive sensing (CS) is well-known for its unique functionalities of sensing, compressing, and security (i.e. CS measurements are equally important). However, there is a tradeoff. Improving sensing and compressing efficiency with prior signal information tends to favor particular measurements, thus decrease the security. This work aimed to improve the sensing and compressing efficiency without…
▽ More
Compressive sensing (CS) is well-known for its unique functionalities of sensing, compressing, and security (i.e. CS measurements are equally important). However, there is a tradeoff. Improving sensing and compressing efficiency with prior signal information tends to favor particular measurements, thus decrease the security. This work aimed to improve the sensing and compressing efficiency without compromise the security with a novel sampling matrix, named Restricted Structural Random Matrix (RSRM). RSRM unified the advantages of frame-based and block-based sensing together with the global smoothness prior (i.e. low-resolution signals are highly correlated). RSRM acquired compressive measurements with random projection (equally important) of multiple randomly sub-sampled signals, which was restricted to be the low-resolution signals (equal in energy), thereby, its observations are equally important. RSRM was proven to satisfies the Restricted Isometry Property and shows comparable reconstruction performance with recent state-of-the-art compressive sensing and deep learning-based methods.
△ Less
Submitted 17 February, 2020;
originally announced February 2020.
-
Dropout Prediction over Weeks in MOOCs by Learning Representations of Clicks and Videos
Authors:
Byungsoo Jeon,
Namyong Park
Abstract:
This paper addresses a key challenge in MOOC dropout prediction, namely to build meaningful representations from clickstream data. While a variety of feature extraction techniques have been explored extensively for such purposes, to our knowledge, no prior works have explored modeling of educational content (e.g. video) and their correlation with the learner's behavior (e.g. clickstream) in this c…
▽ More
This paper addresses a key challenge in MOOC dropout prediction, namely to build meaningful representations from clickstream data. While a variety of feature extraction techniques have been explored extensively for such purposes, to our knowledge, no prior works have explored modeling of educational content (e.g. video) and their correlation with the learner's behavior (e.g. clickstream) in this context. We bridge this gap by devising a method to learn representation for videos and the correlation between videos and clicks. The results indicate that modeling videos and their correlation with clicks bring statistically significant improvements in predicting dropout.
△ Less
Submitted 5 February, 2020;
originally announced February 2020.
-
Dropout Prediction over Weeks in MOOCs via Interpretable Multi-Layer Representation Learning
Authors:
Byungsoo Jeon,
Namyong Park,
Seo** Bang
Abstract:
Massive Open Online Courses (MOOCs) have become popular platforms for online learning. While MOOCs enable students to study at their own pace, this flexibility makes it easy for students to drop out of class. In this paper, our goal is to predict if a learner is going to drop out within the next week, given clickstream data for the current week. To this end, we present a multi-layer representation…
▽ More
Massive Open Online Courses (MOOCs) have become popular platforms for online learning. While MOOCs enable students to study at their own pace, this flexibility makes it easy for students to drop out of class. In this paper, our goal is to predict if a learner is going to drop out within the next week, given clickstream data for the current week. To this end, we present a multi-layer representation learning solution based on branch and bound (BB) algorithm, which learns from low-level clickstreams in an unsupervised manner, produces interpretable results, and avoids manual feature engineering. In experiments on Coursera data, we show that our model learns a representation that allows a simple model to perform similarly well to more complex, task-specific models, and how the BB algorithm enables interpretable results. In our analysis of the observed limitations, we discuss promising future directions.
△ Less
Submitted 4 February, 2020;
originally announced February 2020.
-
Integrated Motion Planner for Real-time Aerial Videography with a Drone in a Dense Environment
Authors:
Boseong Jeon,
H. ** Kim
Abstract:
This letter suggests an integrated approach for a drone (or multirotor) to perform an autonomous videography task in a 3-D obstacle environment by following a moving object. The proposed system includes 1) a target motion prediction module which can be applied to dense environments and 2) a hierarchical chasing planner based on a proposed metric for visibility. In the prediction module, we minimiz…
▽ More
This letter suggests an integrated approach for a drone (or multirotor) to perform an autonomous videography task in a 3-D obstacle environment by following a moving object. The proposed system includes 1) a target motion prediction module which can be applied to dense environments and 2) a hierarchical chasing planner based on a proposed metric for visibility. In the prediction module, we minimize observation error given that the target object itself does not collide with obstacles. The estimated future trajectory of target is obtained by covariant optimization. The other module, chasing planner, is in a bi-level structure composed of preplanner and smooth planner. In the first phase, we leverage a graph-search method to preplan a chasing corridor which incorporates safety and visibility of target during a time window. In the subsequent phase, we generate a smooth and dynamically feasible path within the corridor using quadratic programming (QP). We validate our approach with multiple complex scenarios and actual experiments. The source code can be found in https://github.com/icsl-Jeon/traj_gen_vis
△ Less
Submitted 20 November, 2019;
originally announced November 2019.
-
Shift R-CNN: Deep Monocular 3D Object Detection with Closed-Form Geometric Constraints
Authors:
Andretti Naiden,
Vlad Paunescu,
Gyeongmo Kim,
ByeongMoon Jeon,
Marius Leordeanu
Abstract:
We propose Shift R-CNN, a hybrid model for monocular 3D object detection, which combines deep learning with the power of geometry. We adapt a Faster R-CNN network for regressing initial 2D and 3D object properties and combine it with a least squares solution for the inverse 2D to 3D geometric map** problem, using the camera projection matrix. The closed-form solution of the mathematical system,…
▽ More
We propose Shift R-CNN, a hybrid model for monocular 3D object detection, which combines deep learning with the power of geometry. We adapt a Faster R-CNN network for regressing initial 2D and 3D object properties and combine it with a least squares solution for the inverse 2D to 3D geometric map** problem, using the camera projection matrix. The closed-form solution of the mathematical system, along with the initial output of the adapted Faster R-CNN are then passed through a final ShiftNet network that refines the result using our newly proposed Volume Displacement Loss. Our novel, geometrically constrained deep learning approach to monocular 3D object detection obtains top results on KITTI 3D Object Detection Benchmark, being the best among all monocular methods that do not use any pre-trained network for depth estimation.
△ Less
Submitted 23 May, 2019;
originally announced May 2019.
-
Time-series Insights into the Process of Passing or Failing Online University Courses using Neural-Induced Interpretable Student States
Authors:
Byungsoo Jeon,
Eyal Shafran,
Luke Breitfeller,
Jason Levin,
Carolyn P. Rose
Abstract:
This paper addresses a key challenge in Educational Data Mining, namely to model student behavioral trajectories in order to provide a means for identifying students most at-risk, with the goal of providing supportive interventions. While many forms of data including clickstream data or data from sensors have been used extensively in time series models for such purposes, in this paper we explore t…
▽ More
This paper addresses a key challenge in Educational Data Mining, namely to model student behavioral trajectories in order to provide a means for identifying students most at-risk, with the goal of providing supportive interventions. While many forms of data including clickstream data or data from sensors have been used extensively in time series models for such purposes, in this paper we explore the use of textual data, which is sometimes available in the records of students at large, online universities. We propose a time series model that constructs an evolving student state representation using both clickstream data and a signal extracted from the textual notes recorded by human mentors assigned to each student. We explore how the addition of this textual data improves both the predictive power of student states for the purpose of identifying students at risk for course failure as well as for providing interpretable insights about student course engagement processes.
△ Less
Submitted 1 May, 2019;
originally announced May 2019.
-
Online Trajectory Generation of a MAV for Chasing a Moving Target in 3D Dense Environments
Authors:
Boseong Felipe Jeon,
H. ** Kim
Abstract:
This work deals with a moving target chasing mission of an aerial vehicle equipped with a vision sensor in a cluttered environment. In contrast to obstacle-free or sparse environments, the chaser should be able to handle collision and occlusion simultaneously with flight efficiency. In order to tackle these challenges with real-time replanning, we introduce a metric for target visibility and propo…
▽ More
This work deals with a moving target chasing mission of an aerial vehicle equipped with a vision sensor in a cluttered environment. In contrast to obstacle-free or sparse environments, the chaser should be able to handle collision and occlusion simultaneously with flight efficiency. In order to tackle these challenges with real-time replanning, we introduce a metric for target visibility and propose a cascaded chasing planner. By means of the graph-search methods, we first generate a sequence of chasing corridors and waypoints which ensure safety and optimize visibility. In the following phase, the corridors and waypoints are utilized as constraints and objective in quadratic programming from which we complete a dynamically feasible trajectory for chasing. The proposed algorithm is tested in multiple dense environments. The simulator AutoChaser with full code implementation and GUI can be found in https://github.com/icsl-Jeon/traj_gen_vis
△ Less
Submitted 6 April, 2019;
originally announced April 2019.
-
Multi-Scale Deep Compressive Sensing Network
Authors:
Thuong Nguyen Canh,
Byeungwoo Jeon
Abstract:
With joint learning of sampling and recovery, the deep learning-based compressive sensing (DCS) has shown significant improvement in performance and running time reduction. Its reconstructed image, however, losses high-frequency content especially at low subrates. This happens similarly in the multi-scale sampling scheme which also samples more low-frequency components. In this paper, we propose a…
▽ More
With joint learning of sampling and recovery, the deep learning-based compressive sensing (DCS) has shown significant improvement in performance and running time reduction. Its reconstructed image, however, losses high-frequency content especially at low subrates. This happens similarly in the multi-scale sampling scheme which also samples more low-frequency components. In this paper, we propose a multi-scale DCS convolutional neural network (MS-DCSNet) in which we convert image signal using multiple scale-based wavelet transform, then capture it through convolution block by block across scales. The initial reconstructed image is directly recovered from multi-scale measurements. Multi-scale wavelet convolution is utilized to enhance the final reconstruction quality. The network is able to learn both multi-scale sampling and multi-scale reconstruction, thus results in better reconstruction quality.
△ Less
Submitted 18 September, 2018; v1 submitted 15 September, 2018;
originally announced September 2018.
-
Scene Understanding Networks for Autonomous Driving based on Around View Monitoring System
Authors:
JeongYeol Baek,
Ioana Veronica Chelu,
Livia Iordache,
Vlad Paunescu,
HyunJoo Ryu,
Alexandru Ghiuta,
Andrei Petreanu,
YunSung Soh,
Andrei Leica,
ByeongMoon Jeon
Abstract:
Modern driver assistance systems rely on a wide range of sensors (RADAR, LIDAR, ultrasound and cameras) for scene understanding and prediction. These sensors are typically used for detecting traffic participants and scene elements required for navigation. In this paper we argue that relying on camera based systems, specifically Around View Monitoring (AVM) system has great potential to achieve the…
▽ More
Modern driver assistance systems rely on a wide range of sensors (RADAR, LIDAR, ultrasound and cameras) for scene understanding and prediction. These sensors are typically used for detecting traffic participants and scene elements required for navigation. In this paper we argue that relying on camera based systems, specifically Around View Monitoring (AVM) system has great potential to achieve these goals in both parking and driving modes with decreased costs. The contributions of this paper are as follows: we present a new end-to-end solution for delimiting the safe drivable area for each frame by means of identifying the closest obstacle in each direction from the driving vehicle, we use this approach to calculate the distance to the nearest obstacles and we incorporate it into a unified end-to-end architecture capable of joint object detection, curb detection and safe drivable area detection. Furthermore, we describe the family of networks for both a high accuracy solution and a low complexity solution. We also introduce further augmentation of the base architecture with 3D object detection.
△ Less
Submitted 17 May, 2018;
originally announced May 2018.
-
Attentive Interaction Model: Modeling Changes in View in Argumentation
Authors:
Yohan Jo,
Shivani Poddar,
Byungsoo Jeon,
Qinlan Shen,
Carolyn P. Rose,
Graham Neubig
Abstract:
We present a neural architecture for modeling argumentative dialogue that explicitly models the interplay between an Opinion Holder's (OH's) reasoning and a challenger's argument, with the goal of predicting if the argument successfully changes the OH's view. The model has two components: (1) vulnerable region detection, an attention model that identifies parts of the OH's reasoning that are amena…
▽ More
We present a neural architecture for modeling argumentative dialogue that explicitly models the interplay between an Opinion Holder's (OH's) reasoning and a challenger's argument, with the goal of predicting if the argument successfully changes the OH's view. The model has two components: (1) vulnerable region detection, an attention model that identifies parts of the OH's reasoning that are amenable to change, and (2) interaction encoding, which identifies the relationship between the content of the OH's reasoning and that of the challenger's argument. Based on evaluation on discussions from the Change My View forum on Reddit, the two components work together to predict an OH's change in view, outperforming several baselines. A posthoc analysis suggests that sentences picked out by the attention model are addressed more frequently by successful arguments than by unsuccessful ones.
△ Less
Submitted 18 April, 2018; v1 submitted 30 March, 2018;
originally announced April 2018.
-
Compressive Sensing of Color Images Using Nonlocal Higher Order Dictionary
Authors:
Khanh Quoc Dinh,
Thuong Nguyen Canh,
Byeungwoo Jeon
Abstract:
This paper addresses an ill-posed problem of recovering a color image from its compressively sensed measurement data. Differently from the typical 1D vector-based approach of the state-of-the-art methods, we exploit the nonlocal similarities inherently existing in images by treating each patch of a color image as a 3D tensor consisting of not only horizontal and vertical but also spectral dimensio…
▽ More
This paper addresses an ill-posed problem of recovering a color image from its compressively sensed measurement data. Differently from the typical 1D vector-based approach of the state-of-the-art methods, we exploit the nonlocal similarities inherently existing in images by treating each patch of a color image as a 3D tensor consisting of not only horizontal and vertical but also spectral dimensions. A group of nonlocal similar patches form a 4D tensor for which a nonlocal higher order dictionary is learned via higher order singular value decomposition. The multiple sub-dictionaries contained in the higher order dictionary decorrelate the group in each corresponding dimension, thus help the detail of color images to be reconstructed better. Furthermore, we promote sparsity of the final solution using a sparsity regularization based on a weight tensor. It can distinguish those coefficients of the sparse representation generated by the higher order dictionary which are expected to have large magnitude from the others in the optimization. Accordingly, in the iterative solution, it acts like a weighting process which is designed by approximating the minimum mean squared error filter for more faithful recovery. Experimental results confirm improvement by the proposed method over the state-of-the-art ones.
△ Less
Submitted 26 November, 2017;
originally announced November 2017.
-
Block Compressive Sensing of Image and Video with Nonlocal Lagrangian Multiplier and Patch-based Sparse Representation
Authors:
Trinh Van Chien,
Khanh Quoc Dinh,
Byeungwoo Jeon,
Martin Burger
Abstract:
Although block compressive sensing (BCS) makes it tractable to sense large-sized images and video, its recovery performance has yet to be significantly improved because its recovered images or video usually suffer from blurred edges, loss of details, and high-frequency oscillatory artifacts, especially at a low subrate. This paper addresses these problems by designing a modified total variation te…
▽ More
Although block compressive sensing (BCS) makes it tractable to sense large-sized images and video, its recovery performance has yet to be significantly improved because its recovered images or video usually suffer from blurred edges, loss of details, and high-frequency oscillatory artifacts, especially at a low subrate. This paper addresses these problems by designing a modified total variation technique that employs multi-block gradient processing, a denoised Lagrangian multiplier, and patch-based sparse representation. In the case of video, the proposed recovery method is able to exploit both spatial and temporal similarities. Simulation results confirm the improved performance of the proposed method for compressive sensing of images and video in terms of both objective and subjective qualities.
△ Less
Submitted 15 March, 2017;
originally announced March 2017.
-
Visual Fashion-Product Search at SK Planet
Authors:
Taewan Kim,
Seyeong Kim,
Sangil Na,
Hayoon Kim,
Moonki Kim,
Byoung-Ki Jeon
Abstract:
We build a large-scale visual search system which finds similar product images given a fashion item. Defining similarity among arbitrary fashion-products is still remains a challenging problem, even there is no exact ground-truth. To resolve this problem, we define more than 90 fashion-related attributes, and combination of these attributes can represent thousands of unique fashion-styles. The fas…
▽ More
We build a large-scale visual search system which finds similar product images given a fashion item. Defining similarity among arbitrary fashion-products is still remains a challenging problem, even there is no exact ground-truth. To resolve this problem, we define more than 90 fashion-related attributes, and combination of these attributes can represent thousands of unique fashion-styles. The fashion-attributes are one of the ingredients to define semantic similarity among fashion-product images. To build our system at scale, these fashion-attributes are again used to build an inverted indexing scheme. In addition to these fashion-attributes for semantic similarity, we extract colour and appearance features in a region-of-interest (ROI) of a fashion item for visual similarity. By sharing our approach, we expect active discussion on that how to apply current computer vision research into the e-commerce industry.
△ Less
Submitted 11 April, 2017; v1 submitted 26 September, 2016;
originally announced September 2016.
-
Total variation reconstruction for compressive sensing using nonlocal Lagrangian multiplier
Authors:
Trinh Van Chien,
Khanh Quoc Dinh,
Viet Anh Nguyen,
Byeungwoo Jeon
Abstract:
Total variation has proved its effectiveness in solving inverse problems for compressive sensing. Besides, the nonlocal means filter used as regularization preserves texture better for recovered images, but it is quite complex to implement. In this paper, based on existence of both noise and image information in the Lagrangian multiplier, we propose a simple method in term of implementation called…
▽ More
Total variation has proved its effectiveness in solving inverse problems for compressive sensing. Besides, the nonlocal means filter used as regularization preserves texture better for recovered images, but it is quite complex to implement. In this paper, based on existence of both noise and image information in the Lagrangian multiplier, we propose a simple method in term of implementation called nonlocal Lagrangian multiplier (NLLM) in order to reduce noise and boost useful image information. Experimental results show that the proposed NLLM is superior both in subjective and objective qualities of recovered image over other recovery algorithms.
△ Less
Submitted 28 August, 2016;
originally announced August 2016.