-
SLM as Guardian: Pioneering AI Safety with Small Language Models
Authors:
Ohjoon Kwon,
Donghyeon Jeon,
Nayoung Choi,
Gyu-Hwung Cho,
Changbong Kim,
Hyunwoo Lee,
Inho Kang,
Sun Kim,
Taiwoo Park
Abstract:
Most prior safety research of large language models (LLMs) has focused on enhancing the alignment of LLMs to better suit the safety requirements of humans. However, internalizing such safeguard features into larger models brought challenges of higher training cost and unintended degradation of helpfulness. To overcome such challenges, a modular approach employing a smaller LLM to detect harmful us…
▽ More
Most prior safety research of large language models (LLMs) has focused on enhancing the alignment of LLMs to better suit the safety requirements of humans. However, internalizing such safeguard features into larger models brought challenges of higher training cost and unintended degradation of helpfulness. To overcome such challenges, a modular approach employing a smaller LLM to detect harmful user queries is regarded as a convenient solution in designing LLM-based system with safety requirements.
In this paper, we leverage a smaller LLM for both harmful query detection and safeguard response generation. We introduce our safety requirements and the taxonomy of harmfulness categories, and then propose a multi-task learning mechanism fusing the two tasks into a single model. We demonstrate the effectiveness of our approach, providing on par or surpassing harmful query detection and safeguard response performance compared to the publicly available LLMs.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Taxonomy and Analysis of Sensitive User Queries in Generative AI Search
Authors:
Hwiyeol Jo,
Taiwoo Park,
Nayoung Choi,
Changbong Kim,
Ohjoon Kwon,
Donghyeon Jeon,
Hyunwoo Lee,
Eui-Hyeon Lee,
Kyoungho Shin,
Sun Suk Lim,
Kyungmi Kim,
Jihye Lee,
Sun Kim
Abstract:
Although there has been a growing interest among industries to integrate generative LLMs into their services, limited experiences and scarcity of resources acts as a barrier in launching and servicing large-scale LLM-based conversational services. In this paper, we share our experiences in develo** and operating generative AI models within a national-scale search engine, with a specific focus on…
▽ More
Although there has been a growing interest among industries to integrate generative LLMs into their services, limited experiences and scarcity of resources acts as a barrier in launching and servicing large-scale LLM-based conversational services. In this paper, we share our experiences in develo** and operating generative AI models within a national-scale search engine, with a specific focus on the sensitiveness of user queries. We propose a taxonomy for sensitive search queries, outline our approaches, and present a comprehensive analysis report on sensitive queries from actual users.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seong** Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in develo** their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Treewidth versus clique number. IV. Tree-independence number of graphs excluding an induced star
Authors:
Clément Dallard,
Matjaž Krnc,
O-joung Kwon,
Martin Milanič,
Andrea Munaro,
Kenny Štorgel,
Sebastian Wiederrecht
Abstract:
Many recent works address the question of characterizing induced obstructions to bounded treewidth. In 2022, Lozin and Razgon completely answered this question for graph classes defined by finitely many forbidden induced subgraphs. Their result also implies a characterization of graph classes defined by finitely many forbidden induced subgraphs that are $(tw,ω)$-bounded, that is, treewidth can onl…
▽ More
Many recent works address the question of characterizing induced obstructions to bounded treewidth. In 2022, Lozin and Razgon completely answered this question for graph classes defined by finitely many forbidden induced subgraphs. Their result also implies a characterization of graph classes defined by finitely many forbidden induced subgraphs that are $(tw,ω)$-bounded, that is, treewidth can only be large due to the presence of a large clique. This condition is known to be satisfied for any graph class with bounded tree-independence number, a graph parameter introduced independently by Yolov in 2018 and by Dallard, Milanič, and Štorgel in 2024. Dallard et al. conjectured that $(tw,ω)$-boundedness is actually equivalent to bounded tree-independence number. We address this conjecture in the context of graph classes defined by finitely many forbidden induced subgraphs and prove it for the case of graph classes excluding an induced star. We also prove it for subclasses of the class of line graphs, determine the exact values of the tree-independence numbers of line graphs of complete graphs and line graphs of complete bipartite graphs, and characterize the tree-independence number of $P_4$-free graphs, which implies a linear-time algorithm for its computation. Applying the algorithmic framework provided in a previous paper of the series leads to polynomial-time algorithms for the Maximum Weight Independent Set problem in an infinite family of graph classes.
△ Less
Submitted 20 February, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Unified Speech-Text Pretraining for Spoken Dialog Modeling
Authors:
Heeseung Kim,
Soonshin Seo,
Kyeongseok Jeong,
Ohsung Kwon,
Jungwhan Kim,
Jaehong Lee,
Eunwoo Song,
Myungwoo Oh,
Sungroh Yoon,
Kang Min Yoo
Abstract:
While recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech, an LLM-based strategy for modeling spoken dialogs remains elusive and calls for further investigation. This work proposes an extensive speech-text LLM framework, named the Unified Spoken Dialog Model (USDM), to generate coherent spoken responses with…
▽ More
While recent work shows promising results in expanding the capabilities of large language models (LLM) to directly understand and synthesize speech, an LLM-based strategy for modeling spoken dialogs remains elusive and calls for further investigation. This work proposes an extensive speech-text LLM framework, named the Unified Spoken Dialog Model (USDM), to generate coherent spoken responses with organic prosodic features relevant to the given input speech without relying on automatic speech recognition (ASR) or text-to-speech (TTS) solutions. Our approach employs a multi-step speech-text inference scheme that leverages chain-of-reasoning capabilities exhibited by the underlying LLM. We also propose a generalized speech-text pretraining scheme that helps with capturing cross-modal semantics. Automatic and human evaluations show that the proposed approach is effective in generating natural-sounding spoken responses, outperforming both prior and cascaded baselines. Detailed comparative studies reveal that, despite the cascaded approach being stronger in individual components, the joint speech-text modeling improves robustness against recognition errors and speech quality. Demo is available at https://unifiedsdm.github.io.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
A Comparative Analysis of Text-to-Image Generative AI Models in Scientific Contexts: A Case Study on Nuclear Power
Authors:
Veda Joynt,
Jacob Cooper,
Naman Bhargava,
Katie Vu,
O Hwang Kwon,
Todd R. Allen,
Aditi Verma,
Majdi I. Radaideh
Abstract:
In this work, we propose and assess the potential of generative artificial intelligence (AI) to generate public engagement around potential clean energy sources. Such an application could increase energy literacy -- an awareness of low-carbon energy sources among the public therefore leading to increased participation in decision-making about the future of energy systems. We explore the use of gen…
▽ More
In this work, we propose and assess the potential of generative artificial intelligence (AI) to generate public engagement around potential clean energy sources. Such an application could increase energy literacy -- an awareness of low-carbon energy sources among the public therefore leading to increased participation in decision-making about the future of energy systems. We explore the use of generative AI to communicate technical information about low-carbon energy sources to the general public, specifically in the realm of nuclear energy. We explored 20 AI-powered text-to-image generators and compared their individual performances on general and scientific nuclear-related prompts. Of these models, DALL-E, DreamStudio, and Craiyon demonstrated promising performance in generating relevant images from general-level text related to nuclear topics. However, these models fall short in three crucial ways: (1) they fail to accurately represent technical details of energy systems; (2) they reproduce existing biases surrounding gender and work in the energy sector; and (3) they fail to accurately represent indigenous landscapes -- which have historically been sites of resource extraction and waste deposition for energy industries. This work is performed to motivate the development of specialized generative tools and their captions to improve energy literacy and effectively engage the public with low-carbon energy sources.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Evaluating the Efficacy of Interactive Language Therapy Based on LLM for High-Functioning Autistic Adolescent Psychological Counseling
Authors:
Yu** Cho,
Mingeon Kim,
Seo** Kim,
Oyun Kwon,
Ryan Donghan Kwon,
Yoonha Lee,
Dohyun Lim
Abstract:
This study investigates the efficacy of Large Language Models (LLMs) in interactive language therapy for high-functioning autistic adolescents. With the rapid advancement of artificial intelligence, particularly in natural language processing, LLMs present a novel opportunity to augment traditional psychological counseling methods. This research primarily focuses on evaluating the LLM's ability to…
▽ More
This study investigates the efficacy of Large Language Models (LLMs) in interactive language therapy for high-functioning autistic adolescents. With the rapid advancement of artificial intelligence, particularly in natural language processing, LLMs present a novel opportunity to augment traditional psychological counseling methods. This research primarily focuses on evaluating the LLM's ability to engage in empathetic, adaptable, and contextually appropriate interactions within a therapeutic setting. A comprehensive evaluation was conducted by a panel of clinical psychologists and psychiatrists using a specially developed scorecard. The assessment covered various aspects of the LLM's performance, including empathy, communication skills, adaptability, engagement, and the ability to establish a therapeutic alliance. The study avoided direct testing with patients, prioritizing privacy and ethical considerations, and instead relied on simulated scenarios to gauge the LLM's effectiveness. The results indicate that LLMs hold significant promise as supportive tools in therapy, demonstrating strengths in empathetic engagement and adaptability in conversation. However, challenges in achieving the depth of personalization and emotional understanding characteristic of human therapists were noted. The study also highlights the importance of ethical considerations in the application of AI in therapeutic contexts. This research provides valuable insights into the potential and limitations of using LLMs in psychological counseling for autistic adolescents. It lays the groundwork for future explorations into AI's role in mental health care, emphasizing the need for ongoing development to enhance the capabilities of these models in therapeutic settings.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
The Impact of Generative Artificial Intelligence
Authors:
Kaichen Zhang,
Ohchan Kwon,
Hui Xiong
Abstract:
The rise of generative artificial intelligence (AI) has sparked concerns about its potential influence on unemployment and market depression. This study addresses this concern by examining the impact of generative AI on product markets. To overcome the challenge of causal inference, given the inherent limitations of conducting controlled experiments, this paper identifies an unanticipated and sudd…
▽ More
The rise of generative artificial intelligence (AI) has sparked concerns about its potential influence on unemployment and market depression. This study addresses this concern by examining the impact of generative AI on product markets. To overcome the challenge of causal inference, given the inherent limitations of conducting controlled experiments, this paper identifies an unanticipated and sudden leak of a highly proficient image-generative AI as a novel instance of a "natural experiment". This AI leak spread rapidly, significantly reducing the cost of generating anime-style images compared to other styles, creating an opportunity for comparative assessment. We collect real-world data from an artwork outsourcing platform. Surprisingly, our results show that while generative AI lowers average prices, it substantially boosts order volume and overall revenue. This counterintuitive finding suggests that generative AI confers benefits upon artists rather than detriments. The study further offers theoretical economic explanations to elucidate this unexpected phenomenon. By furnishing empirical evidence, this paper dispels the notion that generative AI might engender depression, instead underscoring its potential to foster market prosperity. These findings carry significant implications for practitioners, policymakers, and the broader AI community.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Computing pivot-minors
Authors:
Konrad K. Dabrowski,
François Dross,
Jisu Jeong,
Mamadou Moustapha Kanté,
O-joung Kwon,
Sang-il Oum,
Daniël Paulusma
Abstract:
A graph $G$ contains a graph $H$ as a pivot-minor if $H$ can be obtained from $G$ by applying a sequence of vertex deletions and edge pivots. Pivot-minors play an important role in the study of rank-width. Pivot-minors have mainly been studied from a structural perspective. In this paper we perform the first systematic computational complexity study of pivot-minors. We first prove that the Pivot-M…
▽ More
A graph $G$ contains a graph $H$ as a pivot-minor if $H$ can be obtained from $G$ by applying a sequence of vertex deletions and edge pivots. Pivot-minors play an important role in the study of rank-width. Pivot-minors have mainly been studied from a structural perspective. In this paper we perform the first systematic computational complexity study of pivot-minors. We first prove that the Pivot-Minor problem, which asks if a given graph $G$ contains a pivot-minor isomorphic to a given graph $H$, is NP-complete. If $H$ is not part of the input, we denote the problem by $H$-Pivot-Minor. We give a certifying polynomial-time algorithm for $H$-Pivot-Minor when (1) $H$ is an induced subgraph of $P_3+tP_1$ for some integer $t\geq 0$, (2) $H=K_{1,t}$ for some integer $t\geq 1$, or (3) $|V(H)|\leq 4$ except when $H \in \{K_4,C_3+ P_1\}$. Let ${\cal F}_H$ be the set of induced-subgraph-minimal graphs that contain a pivot-minor isomorphic to $H$. To prove the above statement, we either show that there is an integer $c_H$ such that all graphs in ${\cal F}_H$ have at most $c_H$ vertices, or we determine ${\cal F}_H$ precisely, for each of the above cases.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
VisAlign: Dataset for Measuring the Degree of Alignment between AI and Humans in Visual Perception
Authors:
Jiyoung Lee,
Seungho Kim,
Seunghyun Won,
Joonseok Lee,
Marzyeh Ghassemi,
James Thorne,
Jaeseok Choi,
O-Kil Kwon,
Edward Choi
Abstract:
AI alignment refers to models acting towards human-intended goals, preferences, or ethical principles. Given that most large-scale deep learning models act as black boxes and cannot be manually controlled, analyzing the similarity between models and humans can be a proxy measure for ensuring AI safety. In this paper, we focus on the models' visual perception alignment with humans, further referred…
▽ More
AI alignment refers to models acting towards human-intended goals, preferences, or ethical principles. Given that most large-scale deep learning models act as black boxes and cannot be manually controlled, analyzing the similarity between models and humans can be a proxy measure for ensuring AI safety. In this paper, we focus on the models' visual perception alignment with humans, further referred to as AI-human visual alignment. Specifically, we propose a new dataset for measuring AI-human visual alignment in terms of image classification, a fundamental task in machine perception. In order to evaluate AI-human visual alignment, a dataset should encompass samples with various scenarios that may arise in the real world and have gold human perception labels. Our dataset consists of three groups of samples, namely Must-Act (i.e., Must-Classify), Must-Abstain, and Uncertain, based on the quantity and clarity of visual information in an image and further divided into eight categories. All samples have a gold human perception label; even Uncertain (severely blurry) sample labels were obtained via crowd-sourcing. The validity of our dataset is verified by sampling theory, statistical theories related to survey design, and experts in the related fields. Using our dataset, we analyze the visual alignment and reliability of five popular visual perception models and seven abstention methods. Our code and data is available at https://github.com/jiyounglee-0523/VisAlign.
△ Less
Submitted 20 October, 2023; v1 submitted 3 August, 2023;
originally announced August 2023.
-
Towards Visualization Thumbnail Designs that Entice Reading Data-driven Articles
Authors:
Hwiyeon Kim,
Joohee Kim,
Yunha Han,
Hwajung Hong,
Oh-Sang Kwon,
Young-Woo Park,
Niklas Elmqvist,
Sungahn Ko,
Bum Chul Kwon
Abstract:
As online news increasingly include data journalism, there is a corresponding increase in the incorporation of visualization in article thumbnail images. However, little research exists on the design rationale for visualization thumbnails, such as resizing, crop**, simplifying, and embellishing charts that appear within the body of the associated article. Therefore, in this paper we aim to under…
▽ More
As online news increasingly include data journalism, there is a corresponding increase in the incorporation of visualization in article thumbnail images. However, little research exists on the design rationale for visualization thumbnails, such as resizing, crop**, simplifying, and embellishing charts that appear within the body of the associated article. Therefore, in this paper we aim to understand these design choices and determine what makes a visualization thumbnail inviting and interpretable. To this end, we first survey visualization thumbnails collected online and discuss visualization thumbnail practices with data journalists and news graphics designers. Based on the survey and discussion results, we then define a design space for visualization thumbnails and conduct a user study with four types of visualization thumbnails derived from the design space. The study results indicate that different chart components play different roles in attracting reader attention and enhancing reader understandability of the visualization thumbnails. We also find various thumbnail design strategies for effectively combining the charts' components, such as a data summary with highlights and data labels, and a visual legend with text labels and Human Recognizable Objects (HROs), into thumbnails. Ultimately, we distill our findings into design implications that allow effective visualization thumbnail designs for data-rich news articles. Our work can thus be seen as a first step toward providing structured guidance on how to design compelling thumbnails for data stories.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Renderable Neural Radiance Map for Visual Navigation
Authors:
Obin Kwon,
Jeongho Park,
Songhwai Oh
Abstract:
We propose a novel type of map for visual navigation, a renderable neural radiance map (RNR-Map), which is designed to contain the overall visual information of a 3D environment. The RNR-Map has a grid form and consists of latent codes at each pixel. These latent codes are embedded from image observations, and can be converted to the neural radiance field which enables image rendering given a came…
▽ More
We propose a novel type of map for visual navigation, a renderable neural radiance map (RNR-Map), which is designed to contain the overall visual information of a 3D environment. The RNR-Map has a grid form and consists of latent codes at each pixel. These latent codes are embedded from image observations, and can be converted to the neural radiance field which enables image rendering given a camera pose. The recorded latent codes implicitly contain visual information about the environment, which makes the RNR-Map visually descriptive. This visual information in RNR-Map can be a useful guideline for visual localization and navigation. We develop localization and navigation frameworks that can effectively utilize the RNR-Map. We evaluate the proposed frameworks on camera tracking, visual localization, and image-goal navigation. Experimental results show that the RNR-Map-based localization framework can find the target location based on a single query image with fast speed and competitive accuracy compared to other baselines. Also, this localization framework is robust to environmental changes, and even finds the most visually similar places when a query image from a different environment is given. The proposed navigation framework outperforms the existing image-goal navigation methods in difficult scenarios, under odometry and actuation noises. The navigation framework shows 65.7% success rate in curved scenarios of the NRNS dataset, which is an improvement of 18.6% over the current state-of-the-art. Project page: https://rllab-snu.github.io/projects/RNR-Map/
△ Less
Submitted 19 April, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
A new width parameter of graphs based on edge cuts: $α$-edge-crossing width
Authors:
Yeonsu Chang,
O-joung Kwon,
Myounghwan Lee
Abstract:
We introduce graph width parameters, called $α$-edge-crossing width and edge-crossing width. These are defined in terms of the number of edges crossing a bag of a tree-cut decomposition. They are motivated by edge-cut width, recently introduced by Brand et al. (WG 2022). We show that edge-crossing width is equivalent to the known parameter tree-partition-width. On the other hand, $α$-edge-crossing…
▽ More
We introduce graph width parameters, called $α$-edge-crossing width and edge-crossing width. These are defined in terms of the number of edges crossing a bag of a tree-cut decomposition. They are motivated by edge-cut width, recently introduced by Brand et al. (WG 2022). We show that edge-crossing width is equivalent to the known parameter tree-partition-width. On the other hand, $α$-edge-crossing width is a new parameter; tree-cut width and $α$-edge-crossing width are incomparable, and they both lie between tree-partition-width and edge-cut width.
We provide an algorithm that, for a given $n$-vertex graph $G$ and integers $k$ and $α$, in time $2^{O((α+k)\log (α+k))}n^2$ either outputs a tree-cut decomposition certifying that the $α$-edge-crossing width of $G$ is at most $2α^2+5k$ or confirms that the $α$-edge-crossing width of $G$ is more than $k$. As applications, for every fixed $α$, we obtain FPT algorithms for the List Coloring and Precoloring Extension problems parameterized by $α$-edge-crossing width. They were known to be W[1]-hard parameterized by tree-partition-width, and FPT parameterized by edge-cut width, and we close the complexity gap between these two parameters.
△ Less
Submitted 25 July, 2023; v1 submitted 9 February, 2023;
originally announced February 2023.
-
Image-Coupled Volume Propagation for Stereo Matching
Authors:
Oh-Hun Kwon,
Eduard Zell
Abstract:
Several leading methods on public benchmarks for depth-from-stereo rely on memory-demanding 4D cost volumes and computationally intensive 3D convolutions for feature matching. We suggest a new way to process the 4D cost volume where we merge two different concepts in one deeply integrated framework to achieve a symbiotic relationship. A feature matching part is responsible for identifying matching…
▽ More
Several leading methods on public benchmarks for depth-from-stereo rely on memory-demanding 4D cost volumes and computationally intensive 3D convolutions for feature matching. We suggest a new way to process the 4D cost volume where we merge two different concepts in one deeply integrated framework to achieve a symbiotic relationship. A feature matching part is responsible for identifying matching pixels pairs along the baseline while a concurrent image volume part is inspired by depth-from-mono CNNs. However, instead of predicting depth directly from image features, it provides additional context to resolve ambiguities during pixel matching. More technically, the processing of the 4D cost volume is separated into a 2D propagation and a 3D propagation part. Starting from feature maps of the left image, the 2D propagation assists the 3D propagation part of the cost volume at different layers by adding visual features to the geometric context. By combining both parts, we can safely reduce the scale of 3D convolution layers in the matching part without sacrificing accuracy. Experiments demonstrate that our end-to-end trained CNN is ranked 2nd on KITTI2012 and ETH3D benchmarks while being significantly faster than the 1st-ranked method. Furthermore, we notice that the coupling of image and matching-volume improves fine-scale details as demonstrated by our qualitative analysis.
△ Less
Submitted 30 December, 2022;
originally announced January 2023.
-
A Comprehensive Survey of Transformers for Computer Vision
Authors:
Sonain Jamil,
Md. Jalil Piran,
Oh-** Kwon
Abstract:
As a special type of transformer, Vision Transformers (ViTs) are used to various computer vision applications (CV), such as image recognition. There are several potential problems with convolutional neural networks (CNNs) that can be solved with ViTs. For image coding tasks like compression, super-resolution, segmentation, and denoising, different variants of the ViTs are used. The purpose of this…
▽ More
As a special type of transformer, Vision Transformers (ViTs) are used to various computer vision applications (CV), such as image recognition. There are several potential problems with convolutional neural networks (CNNs) that can be solved with ViTs. For image coding tasks like compression, super-resolution, segmentation, and denoising, different variants of the ViTs are used. The purpose of this survey is to present the first application of ViTs in CV. The survey is the first of its kind on ViTs for CVs to the best of our knowledge. In the first step, we classify different CV applications where ViTs are applicable. CV applications include image classification, object detection, image segmentation, image compression, image super-resolution, image denoising, and anomaly detection. Our next step is to review the state-of-the-art in each category and list the available models. Following that, we present a detailed analysis and comparison of each model and list its pros and cons. After that, we present our insights and lessons learned for each category. Moreover, we discuss several open research challenges and future research directions.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
Blank Collapse: Compressing CTC emission for the faster decoding
Authors:
Minkyu Jung,
Ohhyeok Kwon,
Seunghyun Seo,
Soonshin Seo
Abstract:
Connectionist Temporal Classification (CTC) model is a very efficient method for modeling sequences, especially for speech data. In order to use CTC model as an Automatic Speech Recognition (ASR) task, the beam search decoding with an external language model like n-gram LM is necessary to obtain reasonable results. In this paper we analyze the blank label in CTC beam search deeply and propose a ve…
▽ More
Connectionist Temporal Classification (CTC) model is a very efficient method for modeling sequences, especially for speech data. In order to use CTC model as an Automatic Speech Recognition (ASR) task, the beam search decoding with an external language model like n-gram LM is necessary to obtain reasonable results. In this paper we analyze the blank label in CTC beam search deeply and propose a very simple method to reduce the amount of calculation resulting in faster beam search decoding speed. With this method, we can get up to 78% faster decoding speed than ordinary beam search decoding with a very small loss of accuracy in LibriSpeech datasets. We prove this method is effective not only practically by experiments but also theoretically by mathematical reasoning. We also observe that this reduction is more obvious if the accuracy of the model is higher.
△ Less
Submitted 26 June, 2023; v1 submitted 30 October, 2022;
originally announced October 2022.
-
Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion Image Manipulation
Authors:
Chaerin Kong,
DongHyeon Jeon,
Ohjoon Kwon,
Nojun Kwak
Abstract:
Fashion attribute editing is a task that aims to convert the semantic attributes of a given fashion image while preserving the irrelevant regions. Previous works typically employ conditional GANs where the generator explicitly learns the target attributes and directly execute the conversion. These approaches, however, are neither scalable nor generic as they operate only with few limited attribute…
▽ More
Fashion attribute editing is a task that aims to convert the semantic attributes of a given fashion image while preserving the irrelevant regions. Previous works typically employ conditional GANs where the generator explicitly learns the target attributes and directly execute the conversion. These approaches, however, are neither scalable nor generic as they operate only with few limited attributes and a separate generator is required for each dataset or attribute set. Inspired by the recent advancement of diffusion models, we explore the classifier-guided diffusion that leverages the off-the-shelf diffusion model pretrained on general visual semantics such as Imagenet. In order to achieve a generic editing pipeline, we pose this as multi-attribute image manipulation task, where the attribute ranges from item category, fabric, pattern to collar and neckline. We empirically show that conventional methods fail in our challenging setting, and study efficient adaptation scheme that involves recently introduced attention-pooling technique to obtain a multi-attribute classifier guidance. Based on this, we present a mask-free fashion attribute editing framework that leverages the classifier logits and the cross-attention map for manipulation. We empirically demonstrate that our framework achieves convincing sample quality and attribute alignments.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Topological Semantic Graph Memory for Image-Goal Navigation
Authors:
Nuri Kim,
Obin Kwon,
Hwiyeon Yoo,
Yunho Choi,
Jeongho Park,
Songhwai Oh
Abstract:
A novel framework is proposed to incrementally collect landmark-based graph memory and use the collected memory for image goal navigation. Given a target image to search, an embodied robot utilizes semantic memory to find the target in an unknown environment. % The semantic graph memory is collected from a panoramic observation of an RGB-D camera without knowing the robot's pose. In this paper, we…
▽ More
A novel framework is proposed to incrementally collect landmark-based graph memory and use the collected memory for image goal navigation. Given a target image to search, an embodied robot utilizes semantic memory to find the target in an unknown environment. % The semantic graph memory is collected from a panoramic observation of an RGB-D camera without knowing the robot's pose. In this paper, we present a topological semantic graph memory (TSGM), which consists of (1) a graph builder that takes the observed RGB-D image to construct a topological semantic graph, (2) a cross graph mixer module that takes the collected nodes to get contextual information, and (3) a memory decoder that takes the contextual memory as an input to find an action to the target. On the task of image goal navigation, TSGM significantly outperforms competitive baselines by +5.0-9.0% on the success rate and +7.0-23.5% on SPL, which means that the TSGM finds efficient paths. Additionally, we demonstrate our method on a mobile robot in real-world image goal scenarios.
△ Less
Submitted 17 September, 2022;
originally announced September 2022.
-
Unified almost linear kernels for generalized covering and packing problems on nowhere dense classes
Authors:
Jungho Ahn,
**ha Kim,
O-joung Kwon
Abstract:
Let $\mathcal{F}$ be a family of graphs, and let $p,r$ be nonnegative integers. The \textsc{$(p,r,\mathcal{F})$-Covering} problem asks whether for a graph $G$ and an integer $k$, there exists a set $D$ of at most $k$ vertices in $G$ such that $G^p\setminus N_G^r[D]$ has no induced subgraph isomorphic to a graph in $\mathcal{F}$, where $G^p$ is the $p$-th power of $G$. The \textsc{…
▽ More
Let $\mathcal{F}$ be a family of graphs, and let $p,r$ be nonnegative integers. The \textsc{$(p,r,\mathcal{F})$-Covering} problem asks whether for a graph $G$ and an integer $k$, there exists a set $D$ of at most $k$ vertices in $G$ such that $G^p\setminus N_G^r[D]$ has no induced subgraph isomorphic to a graph in $\mathcal{F}$, where $G^p$ is the $p$-th power of $G$. The \textsc{$(p,r,\mathcal{F})$-Packing} problem asks whether for a graph $G$ and an integer $k$, $G^p$ has $k$ induced subgraphs $H_1,\ldots,H_k$ such that each $H_i$ is isomorphic to a graph in $\mathcal{F}$, and for distinct $i,j\in \{1, \ldots, k\}$, the distance between $V(H_i)$ and $V(H_j)$ in $G$ is larger than $r$.
We show that for every fixed nonnegative integers $p,r$ and every fixed nonempty finite family $\mathcal{F}$ of connected graphs, the \textsc{$(p,r,\mathcal{F})$-Covering} problem with $p\leq2r+1$ and the \textsc{$(p,r,\mathcal{F})$-Packing} problem with $p\leq2\lfloor r/2\rfloor+1$ admit almost linear kernels on every nowhere dense class of graphs, and admit linear kernels on every class of graphs with bounded expansion, parameterized by the solution size $k$. We obtain the same kernels for their annotated variants. As corollaries, we prove that \textsc{Distance-$r$ Vertex Cover}, \textsc{Distance-$r$ Matching}, \textsc{$\mathcal{F}$-Free Vertex Deletion}, and \textsc{Induced-$\mathcal{F}$-Packing} for any fixed finite family $\mathcal{F}$ of connected graphs admit almost linear kernels on every nowhere dense class of graphs and linear kernels on every class of graphs with bounded expansion. Our results extend the results for \textsc{Distance-$r$ Dominating Set} by Drange et al. (STACS 2016) and Eickmeyer et al. (ICALP 2017), and the result for \textsc{Distance-$r$ Independent Set} by Pilipczuk and Siebertz (EJC 2021).
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
Building Korean Sign Language Augmentation (KoSLA) Corpus with Data Augmentation Technique
Authors:
Changnam An,
Eunkyung Han,
Dongmyeong Noh,
Ohkyoon Kwon,
Sumi Lee,
Hyunshim Han
Abstract:
We present an efficient framework of corpus for sign language translation. Aided with a simple but dramatic data augmentation technique, our method converts text into annotated forms with minimum information loss. Sign languages are composed of manual signals, non-manual signals, and iconic features. According to professional sign language interpreters, non-manual signals such as facial expression…
▽ More
We present an efficient framework of corpus for sign language translation. Aided with a simple but dramatic data augmentation technique, our method converts text into annotated forms with minimum information loss. Sign languages are composed of manual signals, non-manual signals, and iconic features. According to professional sign language interpreters, non-manual signals such as facial expressions and gestures play an important role in conveying exact meaning. By considering the linguistic features of sign language, our proposed framework is a first and unique attempt to build a multimodal sign language augmentation corpus (hereinafter referred to as the KoSLA corpus) containing both manual and non-manual modalities. The corpus we built demonstrates confident results in the hospital context, showing improved performance with augmented datasets. To overcome data scarcity, we resorted to data augmentation techniques such as synonym replacement to boost the efficiency of our translation model and available data, while maintaining grammatical and semantic structures of sign language. For the experimental support, we verify the effectiveness of data augmentation technique and usefulness of our corpus by performing a translation task between normal sentences and sign language annotations on two tokenizers. The result was convincing, proving that the BLEU scores with the KoSLA corpus were significant.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems
Authors:
Hyun-Wook Yoon,
Ohsung Kwon,
Hoyeon Lee,
Ryuichi Yamamoto,
Eunwoo Song,
Jae-Min Kim,
Min-Jae Hwang
Abstract:
This paper proposes an effective emotional text-to-speech (TTS) system with a pre-trained language model (LM)-based emotion prediction method. Unlike conventional systems that require auxiliary inputs such as manually defined emotion classes, our system directly estimates emotion-related attributes from the input text. Specifically, we utilize generative pre-trained transformer (GPT)-3 to jointly…
▽ More
This paper proposes an effective emotional text-to-speech (TTS) system with a pre-trained language model (LM)-based emotion prediction method. Unlike conventional systems that require auxiliary inputs such as manually defined emotion classes, our system directly estimates emotion-related attributes from the input text. Specifically, we utilize generative pre-trained transformer (GPT)-3 to jointly predict both an emotion class and its strength in representing emotions coarse and fine properties, respectively. Then, these attributes are combined in the emotional embedding space and used as conditional features of the TTS model for generating output speech signals. Consequently, the proposed system can produce emotional speech only from text without any auxiliary inputs. Furthermore, because the GPT-3 enables to capture emotional context among the consecutive sentences, the proposed method can effectively handle the paragraph-level generation of emotional speech.
△ Less
Submitted 30 June, 2022; v1 submitted 30 June, 2022;
originally announced June 2022.
-
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder
Authors:
Eunwoo Song,
Ryuichi Yamamoto,
Ohsung Kwon,
Chan-Ho Song,
Min-Jae Hwang,
Suhyeon Oh,
Hyun-Wook Yoon,
**-Seob Kim,
Jae-Min Kim
Abstract:
Recent advances in synthetic speech quality have enabled us to train text-to-speech (TTS) systems by using synthetic corpora. However, merely increasing the amount of synthetic data is not always advantageous for improving training efficiency. Our aim in this study is to selectively choose synthetic data that are beneficial to the training process. In the proposed method, we first adopt a variatio…
▽ More
Recent advances in synthetic speech quality have enabled us to train text-to-speech (TTS) systems by using synthetic corpora. However, merely increasing the amount of synthetic data is not always advantageous for improving training efficiency. Our aim in this study is to selectively choose synthetic data that are beneficial to the training process. In the proposed method, we first adopt a variational autoencoder whose posterior distribution is utilized to extract latent features representing acoustic similarity between the recorded and synthetic corpora. By using those learned features, we then train a ranking support vector machine (RankSVM) that is well known for effectively ranking relative attributes among binary classes. By setting the recorded and synthetic ones as two opposite classes, RankSVM is used to determine how the synthesized speech is acoustically similar to the recorded data. Then, synthetic TTS data, whose distribution is close to the recorded data, are selected from large-scale synthetic corpora. By using these data for retraining the TTS model, the synthetic quality can be significantly improved. Objective and subjective evaluation results show the superiority of the proposed method over the conventional methods.
△ Less
Submitted 29 June, 2022;
originally announced June 2022.
-
An Empirical Study on the Relationship Between the Number of Coordinated Views and Visual Analysis
Authors:
Juyoung Oh,
Chunggi Lee,
Hwiyeon Kim,
Kihwan Kim,
Osang Kwon,
Eric D. Ragan,
Bum Chul Kwon,
Sungahn Ko
Abstract:
Coordinated Multiple views (CMVs) are a visualization technique that simultaneously presents multiple visualizations in separate but linked views. There are many studies that report the advantages (e.g., usefulness for finding hidden relationships) and disadvantages (e.g., cognitive load) of CMVs. But little empirical work exists on the impact of the number of views on visual anlaysis results and…
▽ More
Coordinated Multiple views (CMVs) are a visualization technique that simultaneously presents multiple visualizations in separate but linked views. There are many studies that report the advantages (e.g., usefulness for finding hidden relationships) and disadvantages (e.g., cognitive load) of CMVs. But little empirical work exists on the impact of the number of views on visual anlaysis results and processes, which results in uncertainty in the relationship between the view number and visual anlaysis. In this work, we aim at investigating the relationship between the number of coordinated views and users analytic processes and results. To achieve the goal, we implemented a CMV tool for visual anlaysis. We also provided visualization duplication in the tool to help users easily create a desired number of visualization views on-the-fly. We conducted a between-subject study with 44 participants, where we asked participants to solve five analytic problems using the visual tool. Through quantitative and qualitative analysis, we discovered the positive correlation between the number of views and analytic results. We also found that visualization duplication encourages users to create more views and to take various analysis strategies. Based on the results, we provide implications and limitations of our study.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Reduced bandwidth: a qualitative strengthening of twin-width in minor-closed classes (and beyond)
Authors:
Édouard Bonnet,
O-joung Kwon,
David R. Wood
Abstract:
In a reduction sequence of a graph, vertices are successively identified until the graph has one vertex. At each step, when identifying $u$ and $v$, each edge incident to exactly one of $u$ and $v$ is coloured red. Bonnet, Kim, Thomassé and Watrigant [J. ACM 2022] defined the twin-width of a graph $G$ to be the minimum integer $k$ such that there is a reduction sequence of $G$ in which every red g…
▽ More
In a reduction sequence of a graph, vertices are successively identified until the graph has one vertex. At each step, when identifying $u$ and $v$, each edge incident to exactly one of $u$ and $v$ is coloured red. Bonnet, Kim, Thomassé and Watrigant [J. ACM 2022] defined the twin-width of a graph $G$ to be the minimum integer $k$ such that there is a reduction sequence of $G$ in which every red graph has maximum degree at most $k$. For any graph parameter $f$, we define the reduced $f$ of a graph $G$ to be the minimum integer $k$ such that there is a reduction sequence of $G$ in which every red graph has $f$ at most $k$. Our focus is on graph classes with bounded reduced bandwidth, which implies and is stronger than bounded twin-width (reduced maximum degree). We show that every proper minor-closed class has bounded reduced bandwidth, which is qualitatively stronger than an analogous result of Bonnet et al.\ for bounded twin-width. In many instances, we also make quantitative improvements. For example, all previous upper bounds on the twin-width of planar graphs were at least $2^{1000}$. We show that planar graphs have reduced bandwidth at most $466$ and twin-width at most $583$. Our bounds for graphs of Euler genus $γ$ are $O(γ)$. Lastly, we show that fixed powers of graphs in a proper minor-closed class have bounded reduced bandwidth (irrespective of the degree of the vertices). In particular, we show that map graphs of Euler genus $γ$ have reduced bandwidth $O(γ^4)$. Lastly, we separate twin-width and reduced bandwidth by showing that any infinite class of expanders excluding a fixed complete bipartite subgraph has unbounded reduced bandwidth, while there are bounded-degree expanders with twin-width at most 6.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
Image-to-Graph Transformers for Chemical Structure Recognition
Authors:
Sanghyun Yoo,
Ohyun Kwon,
Hoshik Lee
Abstract:
For several decades, chemical knowledge has been published in written text, and there have been many attempts to make it accessible, for example, by transforming such natural language text to a structured format. Although the discovered chemical itself commonly represented in an image is the most important part, the correct recognition of the molecular structure from the image in literature still…
▽ More
For several decades, chemical knowledge has been published in written text, and there have been many attempts to make it accessible, for example, by transforming such natural language text to a structured format. Although the discovered chemical itself commonly represented in an image is the most important part, the correct recognition of the molecular structure from the image in literature still remains a hard problem since they are often abbreviated to reduce the complexity and drawn in many different styles. In this paper, we present a deep learning model to extract molecular structures from images. The proposed model is designed to transform the molecular image directly into the corresponding graph, which makes it capable of handling non-atomic symbols for abbreviations. Also, by end-to-end learning approach it can fully utilize many open image-molecule pair data from various sources, and hence it is more robust to image style variation than other tools. The experimental results show that the proposed model outperforms the existing models with 17.1 % and 12.8 % relative improvement for well-known benchmark datasets and large molecular images that we collected from literature, respectively.
△ Less
Submitted 19 February, 2022;
originally announced February 2022.
-
Raw Produce Quality Detection with Shifted Window Self-Attention
Authors:
Oh Joon Kwon,
Byungsoo Kim,
Youngduck Choi
Abstract:
Global food insecurity is expected to worsen in the coming decades with the accelerated rate of climate change and the rapidly increasing population. In this vein, it is important to remove inefficiencies at every level of food production. The recent advances in deep learning can help reduce such inefficiencies, yet their application has not yet become mainstream throughout the industry, inducing…
▽ More
Global food insecurity is expected to worsen in the coming decades with the accelerated rate of climate change and the rapidly increasing population. In this vein, it is important to remove inefficiencies at every level of food production. The recent advances in deep learning can help reduce such inefficiencies, yet their application has not yet become mainstream throughout the industry, inducing economic costs at a massive scale. To this point, modern techniques such as CNNs (Convolutional Neural Networks) have been applied to RPQD (Raw Produce Quality Detection) tasks. On the other hand, Transformer's successful debut in the vision among other modalities led us to expect a better performance with these Transformer-based models in RPQD. In this work, we exclusively investigate the recent state-of-the-art Swin (Shifted Windows) Transformer which computes self-attention in both intra- and inter-window fashion. We compare Swin Transformer against CNN models on four RPQD image datasets, each containing different kinds of raw produce: fruits and vegetables, fish, pork, and beef. We observe that Swin Transformer not only achieves better or competitive performance but also is data- and compute-efficient, making it ideal for actual deployment in real-world setting. To the best of our knowledge, this is the first large-scale empirical study on RPQD task, which we hope will gain more attention in future works.
△ Less
Submitted 24 December, 2021;
originally announced December 2021.
-
A Multi-Layout Design for Immersive Visualization of Network Data
Authors:
David Bauer,
Chengbo Zheng,
Oh-Hyun Kwon,
Kwan-Liu Ma
Abstract:
Visualization plays a vital role in making sense of complex network data. Recent studies have shown the potential of using extended reality (XR) for the immersive exploration of networks. The additional depth cues offered by XR help users perform better in certain tasks when compared to using traditional desktop setups. However, prior works on immersive network visualization rely on mostly static…
▽ More
Visualization plays a vital role in making sense of complex network data. Recent studies have shown the potential of using extended reality (XR) for the immersive exploration of networks. The additional depth cues offered by XR help users perform better in certain tasks when compared to using traditional desktop setups. However, prior works on immersive network visualization rely on mostly static graph layouts to present the data to the user. This poses a problem since there is no optimal layout for all possible tasks. The choice of layout heavily depends on the type of network and the task at hand. We introduce a multi-layout approach that allows users to effectively explore hierarchical network data in immersive space. The resulting system leverages different layout techniques and interactions to efficiently use the available space in VR and provide an optimal view of the data depending on the task and the level of detail required to solve it. To evaluate our approach, we have conducted a user study comparing it against the state of the art for immersive network visualization. Participants performed tasks at varying spatial scopes. The results show that our approach outperforms the baseline in spatially focused scenarios as well as when the whole network needs to be considered.
△ Less
Submitted 26 January, 2023; v1 submitted 19 December, 2021;
originally announced December 2021.
-
Augment & Valuate : A Data Enhancement Pipeline for Data-Centric AI
Authors:
Youngjune Lee,
Oh Joon Kwon,
Haeju Lee,
Joonyoung Kim,
Kangwook Lee,
Kee-Eung Kim
Abstract:
Data scarcity and noise are important issues in industrial applications of machine learning. However, it is often challenging to devise a scalable and generalized approach to address the fundamental distributional and semantic properties of dataset with black box models. For this reason, data-centric approaches are crucial for the automation of machine learning operation pipeline. In order to serv…
▽ More
Data scarcity and noise are important issues in industrial applications of machine learning. However, it is often challenging to devise a scalable and generalized approach to address the fundamental distributional and semantic properties of dataset with black box models. For this reason, data-centric approaches are crucial for the automation of machine learning operation pipeline. In order to serve as the basis for this automation, we suggest a domain-agnostic pipeline for refining the quality of data in image classification problems. This pipeline contains data valuation, cleansing, and augmentation. With an appropriate combination of these methods, we could achieve 84.711% test accuracy (ranked #6, Honorable Mention in the Most Innovative) in the Data-Centric AI competition only with the provided dataset.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
VAC-CNN: A Visual Analytics System for Comparative Studies of Deep Convolutional Neural Networks
Authors:
Xiwei Xuan,
Xiaoyu Zhang,
Oh-Hyun Kwon,
Kwan-Liu Ma
Abstract:
The rapid development of Convolutional Neural Networks (CNNs) in recent years has triggered significant breakthroughs in many machine learning (ML) applications. The ability to understand and compare various CNN models available is thus essential. The conventional approach with visualizing each model's quantitative features, such as classification accuracy and computational complexity, is not suff…
▽ More
The rapid development of Convolutional Neural Networks (CNNs) in recent years has triggered significant breakthroughs in many machine learning (ML) applications. The ability to understand and compare various CNN models available is thus essential. The conventional approach with visualizing each model's quantitative features, such as classification accuracy and computational complexity, is not sufficient for a deeper understanding and comparison of the behaviors of different models. Moreover, most of the existing tools for assessing CNN behaviors only support comparison between two models and lack the flexibility of customizing the analysis tasks according to user needs. This paper presents a visual analytics system, VAC-CNN (Visual Analytics for Comparing CNNs), that supports the in-depth inspection of a single CNN model as well as comparative studies of two or more models. The ability to compare a larger number of (e.g., tens of) models especially distinguishes our system from previous ones. With a carefully designed model visualization and explaining support, VAC-CNN facilitates a highly interactive workflow that promptly presents both quantitative and qualitative information at each analysis stage. We demonstrate VAC-CNN's effectiveness for assisting novice ML practitioners in evaluating and comparing multiple CNN models through two use cases and one preliminary evaluation study using the image classification tasks on the ImageNet dataset.
△ Less
Submitted 14 January, 2022; v1 submitted 25 October, 2021;
originally announced October 2021.
-
A Deep Generative Model for Reordering Adjacency Matrices
Authors:
Oh-Hyun Kwon,
Chiun-How Kao,
Chun-houh Chen,
Kwan-Liu Ma
Abstract:
Depending on the node ordering, an adjacency matrix can highlight distinct characteristics of a graph. Deriving a "proper" node ordering is thus a critical step in visualizing a graph as an adjacency matrix. Users often try multiple matrix reorderings using different methods until they find one that meets the analysis goal. However, this trial-and-error approach is laborious and disorganized, whic…
▽ More
Depending on the node ordering, an adjacency matrix can highlight distinct characteristics of a graph. Deriving a "proper" node ordering is thus a critical step in visualizing a graph as an adjacency matrix. Users often try multiple matrix reorderings using different methods until they find one that meets the analysis goal. However, this trial-and-error approach is laborious and disorganized, which is especially challenging for novices. This paper presents a technique that enables users to effortlessly find a matrix reordering they want. Specifically, we design a generative model that learns a latent space of diverse matrix reorderings of the given graph. We also construct an intuitive user interface from the learned latent space by creating a map of various matrix reorderings. We demonstrate our approach through quantitative and qualitative evaluations of the generated reorderings and learned latent spaces. The results show that our model is capable of learning a latent space of diverse matrix reorderings. Most existing research in this area generally focused on develo** algorithms that can compute "better" matrix reorderings for particular circumstances. This paper introduces a fundamentally new approach to matrix visualization of a graph, where a machine learning model learns to generate diverse matrix reorderings of a graph.
△ Less
Submitted 7 March, 2022; v1 submitted 10 October, 2021;
originally announced October 2021.
-
A Unifying Framework for Characterizing and Computing Width Measures
Authors:
Eduard Eiben,
Robert Ganian,
Thekla Hamm,
Lars Jaffke,
O-Joung Kwon
Abstract:
Algorithms for computing or approximating optimal decompositions for decompositional parameters such as treewidth or clique-width have so far traditionally been tailored to specific width parameters. Moreover, for mim-width, no efficient algorithms for computing good decompositions were known, even under highly restrictive parameterizations. In this work we identify F-branchwidth as a class of gen…
▽ More
Algorithms for computing or approximating optimal decompositions for decompositional parameters such as treewidth or clique-width have so far traditionally been tailored to specific width parameters. Moreover, for mim-width, no efficient algorithms for computing good decompositions were known, even under highly restrictive parameterizations. In this work we identify F-branchwidth as a class of generic decompositional parameters that can capture mim-width, treewidth, clique-width as well as other measures. We show that while there is an infinite number of F-branchwidth parameters, only a handful of these are asymptotically distinct. We then develop fixed-parameter and kernelization algorithms (under several structural parameterizations) that can compute every possible F-branchwidth, providing a unifying framework that can efficiently obtain near-optimal tree-decompositions, k-expressions, as well as optimal mim-width decompositions.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
HisVA: A Visual Analytics System for Studying History
Authors:
Dongyun Han,
Gorakh Parsad,
Hwiyeon Kim,
Jaekyom Shim,
Oh-Sang Kwon,
Kyung A Son,
Jooyoung Lee,
Isaac Cho,
Sungahn Ko
Abstract:
Studying history involves many difficult tasks. Examples include searching for proper data in a large event space, understanding stories of historical events by time and space, and finding relationships among events that may not be apparent. Instructors who extensively use well-organized and well-argued materials (e.g., textbooks and online resources) can lead students to a narrow perspective in u…
▽ More
Studying history involves many difficult tasks. Examples include searching for proper data in a large event space, understanding stories of historical events by time and space, and finding relationships among events that may not be apparent. Instructors who extensively use well-organized and well-argued materials (e.g., textbooks and online resources) can lead students to a narrow perspective in understanding history and prevent spontaneous investigation of historical events, with the students asking their own questions. In this work, we proposed HisVA, a visual analytics system that allows the efficient exploration of historical events from Wikipedia using three views: event, map, and resource. HisVA provides an effective event exploration space, where users can investigate relationships among historical events by reviewing and linking them in terms of space and time. To evaluate our system, we present two usage scenarios, a user study with a qualitative analysis of user exploration strategies, and %expert feedback with in-class deployment results.
△ Less
Submitted 2 June, 2021; v1 submitted 1 June, 2021;
originally announced June 2021.
-
On the Erdős-Pósa property for long holes in $C_4$-free graphs
Authors:
Tony Huynh,
O-joung Kwon
Abstract:
We prove that there exists a function $f(k)=\mathcal{O}(k^2 \log k)$ such that for every $C_4$-free graph $G$ and every $k \in \mathbb{N}$, $G$ either contains $k$ vertex-disjoint holes of length at least $6$, or a set $X$ of at most $f(k)$ vertices such that $G-X$ has no hole of length at least $6$. This answers a question of Kim and Kwon [Erdős-Pósa property of chordless cycles and its applicati…
▽ More
We prove that there exists a function $f(k)=\mathcal{O}(k^2 \log k)$ such that for every $C_4$-free graph $G$ and every $k \in \mathbb{N}$, $G$ either contains $k$ vertex-disjoint holes of length at least $6$, or a set $X$ of at most $f(k)$ vertices such that $G-X$ has no hole of length at least $6$. This answers a question of Kim and Kwon [Erdős-Pósa property of chordless cycles and its applications. JCTB 2020].
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Classes of intersection digraphs with good algorithmic properties
Authors:
Lars Jaffke,
O-joung Kwon,
Jan Arne Telle
Abstract:
An intersection digraph is a digraph where every vertex $v$ is represented by an ordered pair $(S_v, T_v)$ of sets such that there is an edge from $v$ to $w$ if and only if $S_v$ and $T_w$ intersect. An intersection digraph is reflexive if $S_v\cap T_v\neq \emptyset$ for every vertex $v$. Compared to well-known undirected intersection graphs like interval graphs and permutation graphs, not many al…
▽ More
An intersection digraph is a digraph where every vertex $v$ is represented by an ordered pair $(S_v, T_v)$ of sets such that there is an edge from $v$ to $w$ if and only if $S_v$ and $T_w$ intersect. An intersection digraph is reflexive if $S_v\cap T_v\neq \emptyset$ for every vertex $v$. Compared to well-known undirected intersection graphs like interval graphs and permutation graphs, not many algorithmic applications on intersection digraphs have been developed. Motivated by the successful story on algorithmic applications of intersection graphs using a graph width parameter called mim-width, we introduce its directed analogue called `bi-mim-width' and prove that various classes of reflexive intersection digraphs have bounded bi-mim-width. In particular, we show that as a natural extension of $H$-graphs, reflexive $H$-digraphs have linear bi-mim-width at most $12|E(H)|$, which extends a bound on the linear mim-width of $H$-graphs [On the Tractability of Optimization Problems on $H$-Graphs. Algorithmica 2020]. For applications, we introduce a novel framework of directed versions of locally checkable problems, that streamlines the definitions and the study of many problems in the literature and facilitates their common algorithmic treatment. We obtain unified polynomial-time algorithms for these problems on digraphs of bounded bi-mim-width, when a branch decomposition is given. Locally checkable problems include Kernel, Dominating Set, and Directed $H$-Homomorphism.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss
Authors:
Eunwoo Song,
Ryuichi Yamamoto,
Min-Jae Hwang,
**-Seob Kim,
Ohsung Kwon,
Jae-Min Kim
Abstract:
This paper proposes a spectral-domain perceptual weighting technique for Parallel WaveGAN-based text-to-speech (TTS) systems. The recently proposed Parallel WaveGAN vocoder successfully generates waveform sequences using a fast non-autoregressive WaveNet model. By employing multi-resolution short-time Fourier transform (MR-STFT) criteria with a generative adversarial network, the light-weight conv…
▽ More
This paper proposes a spectral-domain perceptual weighting technique for Parallel WaveGAN-based text-to-speech (TTS) systems. The recently proposed Parallel WaveGAN vocoder successfully generates waveform sequences using a fast non-autoregressive WaveNet model. By employing multi-resolution short-time Fourier transform (MR-STFT) criteria with a generative adversarial network, the light-weight convolutional networks can be effectively trained without any distillation process. To further improve the vocoding performance, we propose the application of frequency-dependent weighting to the MR-STFT loss function. The proposed method penalizes perceptually-sensitive errors in the frequency domain; thus, the model is optimized toward reducing auditory noise in the synthesized speech. Subjective listening test results demonstrate that our proposed method achieves 4.21 and 4.26 TTS mean opinion scores for female and male Korean speakers, respectively.
△ Less
Submitted 18 January, 2021;
originally announced January 2021.
-
Crossover-SGD: A gossip-based communication in distributed deep learning for alleviating large mini-batch problem and enhancing scalability
Authors:
Sangho Yeo,
Minho Bae,
Minjoong Jeong,
Oh-kyoung Kwon,
Sangyoon Oh
Abstract:
Distributed deep learning is an effective way to reduce the training time of deep learning for large datasets as well as complex models. However, the limited scalability caused by network overheads makes it difficult to synchronize the parameters of all workers. To resolve this problem, gossip-based methods that demonstrates stable scalability regardless of the number of workers have been proposed…
▽ More
Distributed deep learning is an effective way to reduce the training time of deep learning for large datasets as well as complex models. However, the limited scalability caused by network overheads makes it difficult to synchronize the parameters of all workers. To resolve this problem, gossip-based methods that demonstrates stable scalability regardless of the number of workers have been proposed. However, to use gossip-based methods in general cases, the validation accuracy for a large mini-batch needs to be verified. To verify this, we first empirically study the characteristics of gossip methods in a large mini-batch problem and observe that the gossip methods preserve higher validation accuracy than AllReduce-SGD(Stochastic Gradient Descent) when the number of batch sizes is increased and the number of workers is fixed. However, the delayed parameter propagation of the gossip-based models decreases validation accuracy in large node scales. To cope with this problem, we propose Crossover-SGD that alleviates the delay propagation of weight parameters via segment-wise communication and load balancing random network topology. We also adapt hierarchical communication to limit the number of workers in gossip-based communication methods. To validate the effectiveness of our proposed method, we conduct empirical experiments and observe that our Crossover-SGD shows higher node scalability than SGP(Stochastic Gradient Push).
△ Less
Submitted 17 October, 2022; v1 submitted 30 December, 2020;
originally announced December 2020.
-
Visually Grounding Language Instruction for History-Dependent Manipulation
Authors:
Hyemin Ahn,
Obin Kwon,
Kyoungdo Kim,
Jaeyeon Jeong,
Howoong Jun,
Hongjung Lee,
Dongheui Lee,
Songhwai Oh
Abstract:
This paper emphasizes the importance of a robot's ability to refer to its task history, especially when it executes a series of pick-and-place manipulations by following language instructions given one by one. The advantage of referring to the manipulation history can be categorized into two folds: (1) the language instructions omitting details but using expressions referring to the past can be in…
▽ More
This paper emphasizes the importance of a robot's ability to refer to its task history, especially when it executes a series of pick-and-place manipulations by following language instructions given one by one. The advantage of referring to the manipulation history can be categorized into two folds: (1) the language instructions omitting details but using expressions referring to the past can be interpreted, and (2) the visual information of objects occluded by previous manipulations can be inferred. For this, we introduce a history-dependent manipulation task which objective is to visually ground a series of language instructions for proper pick-and-place manipulations by referring to the past. We also suggest a relevant dataset and model which can be a baseline, and show that our model trained with the proposed dataset can also be applied to the real world based on the CycleGAN. Our dataset and code are publicly available on the project website: https://sites.google.com/view/history-dependent-manipulation.
△ Less
Submitted 14 March, 2022; v1 submitted 16 December, 2020;
originally announced December 2020.
-
The canonical directed tree decomposition and its applications to the directed disjoint paths problem
Authors:
Archontia C. Giannopoulou,
Ken-ichi Kawarabayashi,
Stephan Kreutzer,
O-joung Kwon
Abstract:
The canonical tree-decomposition theorem, given by Robertson and Seymour in their seminal graph minors series, turns out to be one of the most important tool in structural and algorithmic graph theory. In this paper, we provide the canonical tree decomposition theorem for digraphs. More precisely, we construct directed tree-decompositions of digraphs that distinguish all their tangles of order…
▽ More
The canonical tree-decomposition theorem, given by Robertson and Seymour in their seminal graph minors series, turns out to be one of the most important tool in structural and algorithmic graph theory. In this paper, we provide the canonical tree decomposition theorem for digraphs. More precisely, we construct directed tree-decompositions of digraphs that distinguish all their tangles of order $k$, for any fixed integer $k$, in polynomial time. As an application of this canonical tree-decomposition theorem, we provide the following result for the directed disjoint paths problem:
For every fixed $k$ there is a polynomial-time algorithm which, on input $G$, and source and terminal vertices $(s_1, t_1), \dots, (s_k, t_k)$, either
1. determines that there is no set of pairwise vertex-disjoint paths connecting each source $s_i$ to its terminal $t_i$, or
2.finds a half-integral solution, i.e., outputs paths $P_1, \dots, P_k$ such that $P_i$ links $s_i$ to $t_i$, so that every vertex of the graph is contained in at most two paths.
Given known hardness results for the directed disjoint paths problem, our result cannot be improved for general digraphs, neither to fixed-parameter tractability nor to fully vertex-disjoint directed paths. As far as we are aware, this is the first time to obtain a tractable result for the $k$-disjoint paths problem for general digraphs. We expect more applications of our canonical tree-decomposition for directed results.
△ Less
Submitted 28 September, 2020;
originally announced September 2020.
-
Tree pivot-minors and linear rank-width
Authors:
Konrad K. Dabrowski,
François Dross,
Jisu Jeong,
Mamadou Moustapha Kanté,
O-joung Kwon,
Sang-il Oum,
Daniël Paulusma
Abstract:
Tree-width and its linear variant path-width play a central role for the graph minor relation. In particular, Robertson and Seymour (1983) proved that for every tree~$T$, the class of graphs that do not contain $T$ as a minor has bounded path-width. For the pivot-minor relation, rank-width and linear rank-width take over the role from tree-width and path-width. As such, it is natural to examine if…
▽ More
Tree-width and its linear variant path-width play a central role for the graph minor relation. In particular, Robertson and Seymour (1983) proved that for every tree~$T$, the class of graphs that do not contain $T$ as a minor has bounded path-width. For the pivot-minor relation, rank-width and linear rank-width take over the role from tree-width and path-width. As such, it is natural to examine if for every tree~$T$, the class of graphs that do not contain $T$ as a pivot-minor has bounded linear rank-width. We first prove that this statement is false whenever $T$ is a tree that is not a caterpillar. We conjecture that the statement is true if $T$ is a caterpillar. We are also able to give partial confirmation of this conjecture by proving: (1) for every tree $T$, the class of $T$-pivot-minor-free distance-hereditary graphs has bounded linear rank-width if and only if $T$ is a caterpillar; (2) for every caterpillar $T$ on at most four vertices, the class of $T$-pivot-minor-free graphs has bounded linear rank-width. To prove our second result, we only need to consider $T=P_4$ and $T=K_{1,3}$, but we follow a general strategy: first we show that the class of $T$-pivot-minor-free graphs is contained in some class of $(H_1,H_2)$-free graphs, which we then show to have bounded linear rank-width. In particular, we prove that the class of $(K_3,S_{1,2,2})$-free graphs has bounded linear rank-width, which strengthens a known result that this graph class has bounded rank-width.
△ Less
Submitted 11 August, 2021; v1 submitted 2 August, 2020;
originally announced August 2020.
-
Close relatives of Feedback Vertex Set without single-exponential algorithms parameterized by treewidth
Authors:
Benjamin Bergougnoux,
Édouard Bonnet,
Nick Brettell,
O-joung Kwon
Abstract:
The Cut & Count technique and the rank-based approach have lead to single-exponential FPT algorithms parameterized by treewidth, that is, running in time $2^{O(tw)}n^{O(1)}$, for Feedback Vertex Set and connected versions of the classical graph problems (such as Vertex Cover and Dominating Set). We show that Subset Feedback Vertex Set, Subset Odd Cycle Transversal, Restricted Edge-Subset Feedback…
▽ More
The Cut & Count technique and the rank-based approach have lead to single-exponential FPT algorithms parameterized by treewidth, that is, running in time $2^{O(tw)}n^{O(1)}$, for Feedback Vertex Set and connected versions of the classical graph problems (such as Vertex Cover and Dominating Set). We show that Subset Feedback Vertex Set, Subset Odd Cycle Transversal, Restricted Edge-Subset Feedback Edge Set, Node Multiway Cut, and Multiway Cut are unlikely to have such running times. More precisely, we match algorithms running in time $2^{O(tw \log tw)}n^{O(1)}$ with tight lower bounds under the Exponential-Time Hypothesis (ETH), ruling out $2^{o(tw \log tw)}n^{O(1)}$, where $n$ is the number of vertices and $tw$ is the treewidth of the input graph. Our algorithms extend to the weighted case, while our lower bounds also hold for the larger parameter pathwidth and do not require weights. We also show that, in contrast to Odd Cycle Transversal, there is no $2^{o(tw \log tw)}n^{O(1)}$-time algorithm for Even Cycle Transversal under the ETH.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
Very simple statistical evidence that AlphaGo has exceeded human limits in playing GO game
Authors:
Okyu Kwon
Abstract:
Deep learning technology is making great progress in solving the challenging problems of artificial intelligence, hence machine learning based on artificial neural networks is in the spotlight again. In some areas, artificial intelligence based on deep learning is beyond human capabilities. It seemed extremely difficult for a machine to beat a human in a Go game, but AlphaGo has shown to beat a pr…
▽ More
Deep learning technology is making great progress in solving the challenging problems of artificial intelligence, hence machine learning based on artificial neural networks is in the spotlight again. In some areas, artificial intelligence based on deep learning is beyond human capabilities. It seemed extremely difficult for a machine to beat a human in a Go game, but AlphaGo has shown to beat a professional player in the game. By looking at the statistical distribution of the distance in which the Go stones are laid in succession, we find a clear trace that Alphago has surpassed human abilities. The AlphaGo than professional players and professional players than ordinary players shows the laying of stones in the distance becomes more frequent. In addition, AlphaGo shows a much more pronounced difference than that of ordinary players and professional players.
△ Less
Submitted 24 February, 2020;
originally announced February 2020.
-
Well-partitioned chordal graphs: obstruction set and disjoint paths
Authors:
Jungho Ahn,
Lars Jaffke,
O-joung Kwon,
Paloma T. Lima
Abstract:
We introduce a new subclass of chordal graphs that generalizes split graphs, which we call well-partitioned chordal graphs. Split graphs are graphs that admit a partition of the vertex set into cliques that can be arranged in a star structure, the leaves of which are of size one. Well-partitioned chordal graphs are a generalization of this concept in the following two ways. First, the cliques in t…
▽ More
We introduce a new subclass of chordal graphs that generalizes split graphs, which we call well-partitioned chordal graphs. Split graphs are graphs that admit a partition of the vertex set into cliques that can be arranged in a star structure, the leaves of which are of size one. Well-partitioned chordal graphs are a generalization of this concept in the following two ways. First, the cliques in the partition can be arranged in a tree structure, and second, each clique is of arbitrary size. We provide a characterization of well-partitioned chordal graphs by forbidden induced subgraphs, and give a polynomial-time algorithm that given any graph, either finds an obstruction, or outputs a partition of its vertex set that asserts that the graph is well-partitioned chordal. We demonstrate the algorithmic use of this graph class by showing that two variants of the problem of finding pairwise disjoint paths between k given pairs of vertices is in FPT parameterized by k on well-partitioned chordal graphs, while on chordal graphs, these problems are only known to be in XP. From the other end, we observe that there are problems that are polynomial-time solvable on split graphs, but become NP-complete on well-partitioned chordal graphs.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
A Study of Mental Maps in Immersive Network Visualization
Authors:
Joseph Kotlarek,
Oh-Hyun Kwon,
Kwan-Liu Ma,
Peter Eades,
Andreas Kerren,
Karsten Klein,
Falk Schreiber
Abstract:
The visualization of a network influences the quality of the mental map that the viewer develops to understand the network. In this study, we investigate the effects of a 3D immersive visualization environment compared to a traditional 2D desktop environment on the comprehension of a network's structure. We compare the two visualization environments using three tasks--interpreting network structur…
▽ More
The visualization of a network influences the quality of the mental map that the viewer develops to understand the network. In this study, we investigate the effects of a 3D immersive visualization environment compared to a traditional 2D desktop environment on the comprehension of a network's structure. We compare the two visualization environments using three tasks--interpreting network structure, memorizing a set of nodes, and identifying the structural changes--commonly used for evaluating the quality of a mental map in network visualization. The results show that participants were able to interpret network structure more accurately when viewing the network in an immersive environment, particularly for larger networks. However, we found that 2D visualizations performed better than immersive visualization for tasks that required spatial memory.
△ Less
Submitted 17 January, 2020;
originally announced January 2020.
-
Spinneret: Aiding Creative Ideation through Non-Obvious Concept Associations
Authors:
Suyun "Sandra" Bae,
Oh-Hyun Kwon,
Senthil Chandrasegaran,
Kwan-Liu Ma
Abstract:
Mind map** is a popular way to explore a design space in creative thinking exercises, allowing users to form associations between concepts. Yet, most existing digital tools for mind map** focus on authoring and organization, with little support for addressing the challenges of mind map** such as stagnation and design fixation. We present Spinneret, a functional approach to aid mind map** b…
▽ More
Mind map** is a popular way to explore a design space in creative thinking exercises, allowing users to form associations between concepts. Yet, most existing digital tools for mind map** focus on authoring and organization, with little support for addressing the challenges of mind map** such as stagnation and design fixation. We present Spinneret, a functional approach to aid mind map** by providing suggestions based on a knowledge graph. Spinneret uses biased random walks to explore the knowledge graph in the neighborhood of an existing concept node in the mind map, and provides "suggestions" for the user to add to the mind map. A comparative study with a baseline mind-map** tool reveals that participants created more diverse and distinct concepts with Spinneret, and reported that the suggestions inspired them to think of ideas they would otherwise not have explored.
△ Less
Submitted 8 January, 2020;
originally announced January 2020.
-
Bonn Activity Maps: Dataset Description
Authors:
Julian Tanke,
Oh-Hun Kwon,
Patrick Stotko,
Radu Alexandru Rosu,
Michael Weinmann,
Hassan Errami,
Sven Behnke,
Maren Bennewitz,
Reinhard Klein,
Andreas Weber,
Angela Yao,
Juergen Gall
Abstract:
The key prerequisite for accessing the huge potential of current machine learning techniques is the availability of large databases that capture the complex relations of interest. Previous datasets are focused on either 3D scene representations with semantic information, tracking of multiple persons and recognition of their actions, or activity recognition of a single person in captured 3D environ…
▽ More
The key prerequisite for accessing the huge potential of current machine learning techniques is the availability of large databases that capture the complex relations of interest. Previous datasets are focused on either 3D scene representations with semantic information, tracking of multiple persons and recognition of their actions, or activity recognition of a single person in captured 3D environments. We present Bonn Activity Maps, a large-scale dataset for human tracking, activity recognition and anticipation of multiple persons. Our dataset comprises four different scenes that have been recorded by time-synchronized cameras each only capturing the scene partially, the reconstructed 3D models with semantic annotations, motion trajectories for individual people including 3D human poses as well as human activity annotations. We utilize the annotations to generate activity likelihoods on the 3D models called activity maps.
△ Less
Submitted 13 December, 2019;
originally announced December 2019.
-
A polynomial kernel for $3$-leaf power deletion
Authors:
Jungho Ahn,
Eduard Eiben,
O-joung Kwon,
Sang-il Oum
Abstract:
For a non-negative integer $\ell$, the $\ell$-leaf power of a tree $T$ is a simple graph $G$ on the leaves of $T$ such that two vertices are adjacent in $G$ if and only if their distance in $T$ is at most $\ell$. We provide a polynomial kernel for the problem of deciding whether we can delete at most $k$ vertices to make an input graph a $3$-leaf power of some tree. More specifically, we present a…
▽ More
For a non-negative integer $\ell$, the $\ell$-leaf power of a tree $T$ is a simple graph $G$ on the leaves of $T$ such that two vertices are adjacent in $G$ if and only if their distance in $T$ is at most $\ell$. We provide a polynomial kernel for the problem of deciding whether we can delete at most $k$ vertices to make an input graph a $3$-leaf power of some tree. More specifically, we present a polynomial-time algorithm for an input instance $(G,k)$ for the problem to output an equivalent instance $(G',k')$ such that $k'\leq k$ and $G'$ has at most $O(k^{14})$ vertices.
△ Less
Submitted 23 October, 2023; v1 submitted 11 November, 2019;
originally announced November 2019.
-
Obstructions for bounded shrub-depth and rank-depth
Authors:
O-joung Kwon,
Rose McCarty,
Sang-il Oum,
Paul Wollan
Abstract:
Shrub-depth and rank-depth are dense analogues of the tree-depth of a graph. It is well known that a graph has large tree-depth if and only if it has a long path as a subgraph. We prove an analogous statement for shrub-depth and rank-depth, which was conjectured by Hliněný, Kwon, Obdržálek, and Ordyniak [Tree-depth and vertex-minors, European J.~Combin. 2016]. Namely, we prove that a graph has lar…
▽ More
Shrub-depth and rank-depth are dense analogues of the tree-depth of a graph. It is well known that a graph has large tree-depth if and only if it has a long path as a subgraph. We prove an analogous statement for shrub-depth and rank-depth, which was conjectured by Hliněný, Kwon, Obdržálek, and Ordyniak [Tree-depth and vertex-minors, European J.~Combin. 2016]. Namely, we prove that a graph has large rank-depth if and only if it has a vertex-minor isomorphic to a long path. This implies that for every integer $t$, the class of graphs with no vertex-minor isomorphic to the path on $t$ vertices has bounded shrub-depth.
△ Less
Submitted 18 January, 2021; v1 submitted 1 November, 2019;
originally announced November 2019.
-
Measuring what Matters: A Hybrid Approach to Dynamic Programming with Treewidth
Authors:
Eduard Eiben,
Robert Ganian,
Thekla Hamm,
O-joung Kwon
Abstract:
We develop a framework for applying treewidth-based dynamic programming on graphs with "hybrid structure", i.e., with parts that may not have small treewidth but instead possess other structural properties. Informally, this is achieved by defining a refinement of treewidth which only considers parts of the graph that do not belong to a pre-specified tractable graph class. Our approach allows us to…
▽ More
We develop a framework for applying treewidth-based dynamic programming on graphs with "hybrid structure", i.e., with parts that may not have small treewidth but instead possess other structural properties. Informally, this is achieved by defining a refinement of treewidth which only considers parts of the graph that do not belong to a pre-specified tractable graph class. Our approach allows us to not only generalize existing fixed-parameter algorithms exploiting treewidth, but also fixed-parameter algorithms which use the size of a modulator as their parameter. As the flagship application of our framework, we obtain a parameter that combines treewidth and rank-width to obtain fixed-parameter algorithms for Chromatic Number, Hamiltonian Cycle, and Max-Cut.
△ Less
Submitted 27 August, 2019;
originally announced August 2019.
-
Improving Neural Networks by Adopting Amplifying and Attenuating Neurons
Authors:
Seongmun Jung,
Oh Joon Kwon
Abstract:
In the present study, an amplifying neuron and attenuating neuron, which can be easily implemented into neural networks without any significant additional computational effort, are proposed. The activated output value is squared for the amplifying neuron, while the value becomes its reciprocal for the attenuating one. Theoretically, the order of neural networks increases when the amplifying neuron…
▽ More
In the present study, an amplifying neuron and attenuating neuron, which can be easily implemented into neural networks without any significant additional computational effort, are proposed. The activated output value is squared for the amplifying neuron, while the value becomes its reciprocal for the attenuating one. Theoretically, the order of neural networks increases when the amplifying neuron is placed in the hidden layer. The performance assessments of neural networks were conducted to verify that the amplifying and attenuating neurons enhance the performance of neural networks. From the numerical experiments, it was revealed that the neural networks that contain the amplifying and attenuating neurons yield more accurate results, compared to those without them.
△ Less
Submitted 27 May, 2019; v1 submitted 23 May, 2019;
originally announced May 2019.
-
Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems
Authors:
Ohsung Kwon,
Eunwoo Song,
Jae-Min Kim,
Hong-Goo Kang
Abstract:
In this paper, we propose a high-quality generative text-to-speech (TTS) system using an effective spectrum and excitation estimation method. Our previous research verified the effectiveness of the ExcitNet-based speech generation model in a parametric TTS framework. However, the challenge remains to build a high-quality speech synthesis system because auxiliary conditional features estimated by a…
▽ More
In this paper, we propose a high-quality generative text-to-speech (TTS) system using an effective spectrum and excitation estimation method. Our previous research verified the effectiveness of the ExcitNet-based speech generation model in a parametric TTS framework. However, the challenge remains to build a high-quality speech synthesis system because auxiliary conditional features estimated by a simple deep neural network often contain large prediction errors, and the errors are inevitably propagated throughout the autoregressive generation process of the ExcitNet vocoder. To generate more natural speech signals, we exploited a sequence-to-sequence (seq2seq) acoustic model with an attention-based generative network (e.g., Tacotron 2) to estimate the condition parameters of the ExcitNet vocoder. Because the seq2seq acoustic model accurately estimates spectral parameters, and because the ExcitNet model effectively generates the corresponding time-domain excitation signals, combining these two models can synthesize natural speech signals. Furthermore, we verified the merit of the proposed method in producing expressive speech segments by adopting a global style token-based emotion embedding method. The experimental results confirmed that the proposed system significantly outperforms the systems with a similarly configured conventional WaveNet vocoder and our best prior parametric TTS counterpart.
△ Less
Submitted 21 May, 2019;
originally announced May 2019.