Skip to main content

Showing 1–50 of 195 results for author: Kang, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13502  [pdf, other

    cs.CL cs.SD eess.AS

    ManWav: The First Manchu ASR Model

    Authors: Jean Seo, Minha Kang, Sungjoo Byun, Sangah Lee

    Abstract: This study addresses the widening gap in Automatic Speech Recognition (ASR) research between high resource and extremely low resource languages, with a particular focus on Manchu, a critically endangered language. Manchu exemplifies the challenges faced by marginalized linguistic communities in accessing state-of-the-art technologies. In a pioneering effort, we introduce the first-ever Manchu ASR… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: ACL2024/Field Matters

  2. arXiv:2406.06004  [pdf, other

    cs.CV cs.AI cs.CL

    FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model

    Authors: Yebin Lee, Imseong Park, Myungjoo Kang

    Abstract: Most existing image captioning evaluation metrics focus on assigning a single numerical score to a caption by comparing it with reference captions. However, these methods do not provide an explanation for the assigned score. Moreover, reference captions are expensive to acquire. In this paper, we propose FLEUR, an explainable reference-free metric to introduce explainability into image captioning… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL (Main) 2024

  3. arXiv:2406.01960  [pdf, other

    cs.LG cs.AI

    Certifiably Byzantine-Robust Federated Conformal Prediction

    Authors: Mintong Kang, Zhen Lin, Jimeng Sun, Cao Xiao, Bo Li

    Abstract: Conformal prediction has shown impressive capacity in constructing statistically rigorous prediction sets for machine learning models with exchangeable data samples. The siloed datasets, coupled with the escalating privacy concerns related to local data sharing, have inspired recent innovations extending conformal prediction into federated environments with distributed data samples. However, this… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  4. arXiv:2405.19346  [pdf, other

    eess.SP cs.AI cs.LG

    Subject-Adaptive Transfer Learning Using Resting State EEG Signals for Cross-Subject EEG Motor Imagery Classification

    Authors: Sion An, Myeongkyun Kang, Soopil Kim, Philip Chikontwe, Li Shen, Sang Hyun Park

    Abstract: Electroencephalography (EEG) motor imagery (MI) classification is a fundamental, yet challenging task due to the variation of signals between individuals i.e., inter-subject variability. Previous approaches try to mitigate this using task-specific (TS) EEG signals from the target subject in training. However, recording TS EEG signals requires time and limits its applicability in various fields. In… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Early Accepted at MICCAI 2024

  5. arXiv:2405.16803  [pdf, other

    cs.CV

    TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing

    Authors: Xinyu Zhang, Mengxue Kang, Fei Wei, Shuang Xu, Yuhe Liu, Lin Ma

    Abstract: As the field of image generation rapidly advances, traditional diffusion models and those integrated with multimodal large language models (LLMs) still encounter limitations in interpreting complex prompts and preserving image consistency pre and post-editing. To tackle these challenges, we present an innovative image editing framework that employs the robust Chain-of-Thought (CoT) reasoning and l… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  6. arXiv:2405.13954  [pdf, other

    cs.LG cs.AI cs.CL

    What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions

    Authors: Sang Keun Choe, Hwijeen Ahn, Juhan Bae, Kewen Zhao, Minsoo Kang, Youngseog Chung, Adithya Pratapa, Willie Neiswanger, Emma Strubell, Teruko Mitamura, Jeff Schneider, Eduard Hovy, Roger Grosse, Eric Xing

    Abstract: Large language models (LLMs) are trained on a vast amount of human-written data, but data providers often remain uncredited. In response to this issue, data valuation (or data attribution), which quantifies the contribution or value of each data to the model output, has been discussed as a potential solution. Nevertheless, applying existing data valuation methods to recent LLMs and their vast trai… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  7. arXiv:2405.05967  [pdf, other

    cs.CV cs.GR cs.LG

    Distilling Diffusion Models into Conditional GANs

    Authors: Minguk Kang, Richard Zhang, Connelly Barnes, Sylvain Paris, Suha Kwak, Jaesik Park, Eli Shechtman, Jun-Yan Zhu, Taesung Park

    Abstract: We propose a method to distill a complex multistep diffusion model into a single-step conditional GAN student model, dramatically accelerating inference, while preserving image quality. Our approach interprets diffusion distillation as a paired image-to-image translation task, using noise-to-image pairs of the diffusion model's ODE trajectory. For efficient regression loss computation, we propose… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: Project page: https://mingukkang.github.io/Diffusion2GAN/

  8. Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs under Power Caps

    Authors: Eishi Arima, Minjoon Kang, Issa Saba, Josef Weidendorfer, Carsten Trinitis, Martin Schulz

    Abstract: CPU-GPU heterogeneous systems are now commonly used in HPC (High-Performance Computing). However, improving the utilization and energy-efficiency of such systems is still one of the most critical issues. As one single program typically cannot fully utilize all resources within a node/chip, co-scheduling (or co-locating) multiple programs with complementary resource requirements is a promising solu… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Journal ref: ICPP Workshops '22: Workshop Proceedings of the 51st International Conference on Parallel Processing, August 2022, Article No.: 9

  9. arXiv:2404.17598  [pdf, other

    cs.IR cs.AI cs.LG cs.SI

    Revealing and Utilizing In-group Favoritism for Graph-based Collaborative Filtering

    Authors: Hoin Jung, Hyunsoo Cho, Myungje Choi, Joowon Lee, Jung Ho Park, Myungjoo Kang

    Abstract: When it comes to a personalized item recommendation system, It is essential to extract users' preferences and purchasing patterns. Assuming that users in the real world form a cluster and there is common favoritism in each cluster, in this work, we introduce Co-Clustering Wrapper (CCW). We compute co-clusters of users and items with co-clustering algorithms and add CF subnetworks for each cluster… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 7 pages, 6 figures

  10. arXiv:2404.16015  [pdf, other

    physics.comp-ph cs.AI math.NA

    Neural Operators Learn the Local Physics of Magnetohydrodynamics

    Authors: Taeyoung Kim, Youngsoo Ha, Myungjoo Kang

    Abstract: Magnetohydrodynamics (MHD) plays a pivotal role in describing the dynamics of plasma and conductive fluids, essential for understanding phenomena such as the structure and evolution of stars and galaxies, and in nuclear fusion for plasma motion through ideal MHD equations. Solving these hyperbolic PDEs requires sophisticated numerical methods, presenting computational challenges due to complex str… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 47 pages, 24 figures

  11. arXiv:2404.14019  [pdf

    cs.CV eess.SP stat.AP

    A Multimodal Feature Distillation with CNN-Transformer Network for Brain Tumor Segmentation with Incomplete Modalities

    Authors: Ming Kang, Fung Fung Ting, Raphaël C. -W. Phan, Zongyuan Ge, Chee-Ming Ting

    Abstract: Existing brain tumor segmentation methods usually utilize multiple Magnetic Resonance Imaging (MRI) modalities in brain tumor images for segmentation, which can achieve better segmentation performance. However, in clinical applications, some modalities are missing due to resource constraints, leading to severe degradation in the performance of methods applying complete modality segmentation. In th… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    MSC Class: 68U10 (Primary) 68T10; 68T07; 62P10 (Secondary) ACM Class: I.4.6; I.5.1; J.3

  12. arXiv:2404.13388  [pdf

    eess.IV cs.CV cs.LG

    Diagnosis of Multiple Fundus Disorders Amidst a Scarcity of Medical Experts Via Self-supervised Machine Learning

    Authors: Yong Liu, Mengtian Kang, Shuo Gao, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Arokia Nathan, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Luigi Occhipinti

    Abstract: Fundus diseases are major causes of visual impairment and blindness worldwide, especially in underdeveloped regions, where the shortage of ophthalmologists hinders timely diagnosis. AI-assisted fundus image analysis has several advantages, such as high accuracy, reduced workload, and improved accessibility, but it requires a large amount of expert-annotated data to build reliable models. To addres… ▽ More

    Submitted 23 April, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

  13. arXiv:2404.13386  [pdf

    eess.IV cs.CV cs.LG

    SSVT: Self-Supervised Vision Transformer For Eye Disease Diagnosis Based On Fundus Images

    Authors: Jiaqi Wang, Mengtian Kang, Yong Liu, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Shuo Gao, Luigi G. Occhipinti

    Abstract: Machine learning-based fundus image diagnosis technologies trigger worldwide interest owing to their benefits such as reducing medical resource power and providing objective evaluation results. However, current methods are commonly based on supervised methods, bringing in a heavy workload to biomedical staff and hence suffering in expanding effective databases. To address this issue, in this artic… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: ISBI 2024

  14. arXiv:2403.19985  [pdf, other

    cs.CV

    Stable Surface Regularization for Fast Few-Shot NeRF

    Authors: Byeongin Joung, Byeong-Uk Lee, Jaesung Choe, Ukcheol Shin, Minjun Kang, Taeyeop Lee, In So Kweon, Kuk-** Yoon

    Abstract: This paper proposes an algorithm for synthesizing novel views under few-shot setup. The main concept is to develop a stable surface regularization technique called Annealing Signed Distance Function (ASDF), which anneals the surface in a coarse-to-fine manner to accelerate convergence speed. We observe that the Eikonal loss - which is a widely known geometric regularization - requires dense traini… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 3DV 2024

  15. arXiv:2403.14963  [pdf, other

    cs.CR

    Enabling Physical Localization of Uncooperative Cellular Devices

    Authors: Taekkyung Oh, Sangwook Bae, Junho Ahn, Yonghwa Lee, Dinh-Tuan Hoang, Min Suk Kang, Nils Ole Tippenhauer, Yongdae Kim

    Abstract: In cellular networks, it can become necessary for authorities to physically locate user devices for tracking criminals or illegal devices. While cellular operators can provide authorities with cell information the device is cam** on, fine-grained localization is still required. Therefore, the authorized agents trace the device by monitoring its uplink signals. However, tracking the uplink signal… ▽ More

    Submitted 25 March, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

  16. arXiv:2403.11570  [pdf, other

    cs.CV

    LogicalDefender: Discovering, Extracting, and Utilizing Common-Sense Knowledge

    Authors: Yuhe Liu, Mengxue Kang, Zengchang Qin, Xiangxiang Chu

    Abstract: Large text-to-image models have achieved astonishing performance in synthesizing diverse and high-quality images guided by texts. With detail-oriented conditioning control, even finer-grained spatial control can be achieved. However, some generated images still appear unreasonable, even with plentiful object features and a harmonious style. In this paper, we delve into the underlying causes and fi… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  17. arXiv:2403.11348  [pdf, other

    cs.LG cs.AI stat.ML

    COLEP: Certifiably Robust Learning-Reasoning Conformal Prediction via Probabilistic Circuits

    Authors: Mintong Kang, Nezihe Merve Gürel, Linyi Li, Bo Li

    Abstract: Conformal prediction has shown spurring performance in constructing statistically rigorous prediction sets for arbitrary black-box machine learning models, assuming the data is exchangeable. However, even small adversarial perturbations during the inference can violate the exchangeability assumption, challenge the coverage guarantees, and result in a subsequent decline in empirical coverage. In th… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted to ICLR 2024

  18. arXiv:2403.02680  [pdf, other

    cs.CR

    A Dual-Level Cancelable Framework for Palmprint Verification and Hack-Proof Data Storage

    Authors: Ziyuan Yang, Ming Kang, Andrew Beng ** Teoh, Chengrui Gao, Wen Chen, Bob Zhang, Yi Zhang

    Abstract: In recent years, palmprints have been widely used for individual verification. The rich privacy information in palmprint data necessitates its protection to ensure security and privacy without sacrificing system performance. Existing systems often use cancelable technologies to protect templates, but these technologies ignore the potential risk of data leakage. Upon breaching the system and gainin… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  19. arXiv:2403.01137  [pdf, other

    cs.CV cs.GR eess.IV

    Neural radiance fields-based holography [Invited]

    Authors: Minsung Kang, Fan Wang, Kai Kumano, Tomoyoshi Ito, Tomoyoshi Shimobaba

    Abstract: This study presents a novel approach for generating holograms based on the neural radiance fields (NeRF) technique. Generating three-dimensional (3D) data is difficult in hologram computation. NeRF is a state-of-the-art technique for 3D light-field reconstruction from 2D images based on volume rendering. The NeRF can rapidly predict new-view images that do not include a training dataset. In this s… ▽ More

    Submitted 9 May, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

  20. arXiv:2402.05443  [pdf, other

    cs.LG cs.CV

    Scalable Wasserstein Gradient Flow for Generative Modeling through Unbalanced Optimal Transport

    Authors: Jaemoo Choi, Jaewoong Choi, Myungjoo Kang

    Abstract: Wasserstein Gradient Flow (WGF) describes the gradient dynamics of probability density within the Wasserstein space. WGF provides a promising approach for conducting optimization over the probability distributions. Numerically approximating the continuous WGF requires the time discretization method. The most well-known method for this is the JKO scheme. In this regard, previous WGF models employ t… ▽ More

    Submitted 3 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: 22 pages, 11 figures

  21. arXiv:2402.03181  [pdf, other

    cs.AI cs.CL cs.IR

    C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models

    Authors: Mintong Kang, Nezihe Merve Gürel, Ning Yu, Dawn Song, Bo Li

    Abstract: Despite the impressive capabilities of large language models (LLMs) across diverse applications, they still suffer from trustworthiness issues, such as hallucinations and misalignments. Retrieval-augmented language models (RAG) have been proposed to enhance the credibility of generations by grounding external knowledge, but the theoretical understandings of their generation risks remains unexplore… ▽ More

    Submitted 4 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024

  22. arXiv:2401.17629  [pdf, other

    cs.CV cs.LG

    Spatial-and-Frequency-aware Restoration method for Images based on Diffusion Models

    Authors: Kyungsung Lee, Donggyu Lee, Myungjoo Kang

    Abstract: Diffusion models have recently emerged as a promising framework for Image Restoration (IR), owing to their ability to produce high-quality reconstructions and their compatibility with established methods. Existing methods for solving noisy inverse problems in IR, considers the pixel-wise data-fidelity. In this paper, we propose SaFaRI, a spatial-and-frequency-aware diffusion model for IR with Gaus… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  23. arXiv:2401.16886  [pdf

    cs.CV eess.SP stat.AP

    CAFCT: Contextual and Attentional Feature Fusions of Convolutional Neural Networks and Transformer for Liver Tumor Segmentation

    Authors: Ming Kang, Chee-Ming Ting, Fung Fung Ting, Raphaël Phan

    Abstract: Medical image semantic segmentation techniques can help identify tumors automatically from computed tomography (CT) scans. In this paper, we propose a Contextual and Attentional feature Fusions enhanced Convolutional Neural Network (CNN) and Transformer hybrid network (CAFCT) model for liver tumor segmentation. In the proposed model, three other modules are introduced in the network architecture:… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    MSC Class: 68T07; 68T10; 68U10; 62P10 ACM Class: I.4.6; I.5.1; J.3

  24. Catch-Up Mix: Catch-Up Class for Struggling Filters in CNN

    Authors: Minsoo Kang, Minkoo Kang, Suhyun Kim

    Abstract: Deep learning has made significant advances in computer vision, particularly in image classification tasks. Despite their high accuracy on training data, deep learning models often face challenges related to complexity and overfitting. One notable concern is that the model often relies heavily on a limited subset of filters for making predictions. This dependency can result in compromised generali… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Published at AAAI2024, Equal contribution of first two authors

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 38(3), 2024, 2705-2713

  25. arXiv:2401.11228  [pdf, other

    cs.CV

    Unifying Visual and Vision-Language Tracking via Contrastive Learning

    Authors: Yinchao Ma, Yuyang Tang, Wenfei Yang, Tianzhu Zhang, **peng Zhang, Mengxue Kang

    Abstract: Single object tracking aims to locate the target object in a video sequence according to the state specified by different modal references, including the initial bounding box (BBOX), natural language (NL), or both (NL+BBOX). Due to the gap between different modalities, most existing trackers are designed for single or partial of these reference settings and overspecialize on the specific modality.… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  26. arXiv:2401.01783  [pdf, other

    math.NA cs.LG

    Approximating Numerical Fluxes Using Fourier Neural Operators for Hyperbolic Conservation Laws

    Authors: Taeyoung Kim, Myungjoo Kang

    Abstract: Traditionally, classical numerical schemes have been employed to solve partial differential equations (PDEs) using computational methods. Recently, neural network-based methods have emerged. Despite these advancements, neural network-based methods, such as physics-informed neural networks (PINNs) and neural operators, exhibit deficiencies in robustness and generalization. To address these issues,… ▽ More

    Submitted 13 May, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: 39 pages, 16 figures

  27. arXiv:2312.13783  [pdf, other

    cs.CV cs.AI cs.LG

    Few Shot Part Segmentation Reveals Compositional Logic for Industrial Anomaly Detection

    Authors: Soopil Kim, Sion An, Philip Chikontwe, Myeongkyun Kang, Ehsan Adeli, Kilian M. Pohl, Sang Hyun Park

    Abstract: Logical anomalies (LA) refer to data violating underlying logical constraints e.g., the quantity, arrangement, or composition of components within an image. Detecting accurately such anomalies requires models to reason about various component types through segmentation. However, curation of pixel-level annotations for semantic segmentation is both time-consuming and expensive. Although there are s… ▽ More

    Submitted 15 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted in AAAI2024

  28. arXiv:2312.06458  [pdf

    cs.CV eess.SP stat.AP

    ASF-YOLO: A Novel YOLO Model with Attentional Scale Sequence Fusion for Cell Instance Segmentation

    Authors: Ming Kang, Chee-Ming Ting, Fung Fung Ting, Raphaël C. -W. Phan

    Abstract: We propose a novel Attentional Scale Sequence Fusion based You Only Look Once (YOLO) framework (ASF-YOLO) which combines spatial and scale features for accurate and fast cell instance segmentation. Built on the YOLO segmentation framework, we employ the Scale Sequence Feature Fusion (SSFF) module to enhance the multi-scale information extraction capability of the network, and the Triple Feature En… ▽ More

    Submitted 10 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    MSC Class: 68U10 (Primary) 68T10; 68T07; 62P10 (Secondary) ACM Class: I.4.6; I.5.1; J.3

    Journal ref: Image Vis. Comput. 147 (2024) 105057

  29. arXiv:2312.04906  [pdf, other

    cs.CL

    Ophtha-LLaMA2: A Large Language Model for Ophthalmology

    Authors: Huan Zhao, Qian Ling, Yi Pan, Tianyang Zhong, **-Yu Hu, Junjie Yao, Fengqian Xiao, Zhenxiang Xiao, Yutong Zhang, San-Hua Xu, Shi-Nan Wu, Min Kang, Zihao Wu, Zhengliang Liu, Xi Jiang, Tianming Liu, Yi Shao

    Abstract: In recent years, pre-trained large language models (LLMs) have achieved tremendous success in the field of Natural Language Processing (NLP). Prior studies have primarily focused on general and generic domains, with relatively less research on specialized LLMs in the medical field. The specialization and high accuracy requirements for diagnosis in the medical field, as well as the challenges in co… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  30. arXiv:2312.03395  [pdf, other

    cs.RO cs.AI cs.LG

    Diffused Task-Agnostic Milestone Planner

    Authors: Mineui Hong, Minjae Kang, Songhwai Oh

    Abstract: Addressing decision-making problems using sequence modeling to predict future trajectories shows promising results in recent years. In this paper, we take a step further to leverage the sequence predictive method in wider areas such as long-term planning, vision-based control, and multi-task decision-making. To this end, we propose a method to utilize a diffusion-based generative sequence model to… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 37th Conference on Neural Information Processing Systems

  31. arXiv:2311.17492  [pdf, other

    cs.CL

    Mergen: The First Manchu-Korean Machine Translation Model Trained on Augmented Data

    Authors: Jean Seo, Sungjoo Byun, Minha Kang, Sangah Lee

    Abstract: The Manchu language, with its roots in the historical Manchurian region of Northeast China, is now facing a critical threat of extinction, as there are very few speakers left. In our efforts to safeguard the Manchu language, we introduce Mergen, the first-ever attempt at a Manchu-Korean Machine Translation (MT) model. To develop this model, we utilize valuable resources such as the Manwen Laodang(… ▽ More

    Submitted 12 January, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: emnlp2023/mrl2023

  32. arXiv:2311.16538  [pdf, other

    cs.LG cs.CR

    Federated Learning with Diffusion Models for Privacy-Sensitive Vision Tasks

    Authors: Ye Lin Tun, Chu Myaet Thwal, Ji Su Yoon, Sun Moo Kang, Chaoning Zhang, Choong Seon Hong

    Abstract: Diffusion models have shown great potential for vision-related tasks, particularly for image generation. However, their training is typically conducted in a centralized manner, relying on data collected from publicly available sources. This approach may not be feasible or practical in many domains, such as the medical field, which involves privacy concerns over data collection. Despite the challen… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  33. arXiv:2311.16124  [pdf, other

    cs.CR cs.AI

    DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification

    Authors: Mintong Kang, Dawn Song, Bo Li

    Abstract: Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples and achieve state-of-the-art robustness. Recent studies show that even advanced attacks cannot break such defenses effectively, since the purification process induces an extremely deep computational graph which poses the potential problem of gradient obfuscation, high memory cost… ▽ More

    Submitted 3 January, 2024; v1 submitted 27 October, 2023; originally announced November 2023.

    Comments: Accepted to NeurIPS 2023

  34. arXiv:2311.05844  [pdf, other

    cs.CV cs.AI cs.CL cs.MM cs.SD eess.AS

    Face-StyleSpeech: Improved Face-to-Voice latent map** for Natural Zero-shot Speech Synthesis from a Face Image

    Authors: Minki Kang, Wooseok Han, Eunho Yang

    Abstract: Generating a voice from a face image is crucial for develo** virtual humans capable of interacting using their unique voices, without relying on pre-recorded human speech. In this paper, we propose Face-StyleSpeech, a zero-shot Text-To-Speech (TTS) synthesis model that generates natural speech conditioned on a face image rather than reference speech. We hypothesize that learning both speaker ide… ▽ More

    Submitted 25 September, 2023; originally announced November 2023.

    Comments: Submitted to ICASSP 2024

  35. arXiv:2311.04512  [pdf, other

    cs.CV cs.AI

    FFINet: Future Feedback Interaction Network for Motion Forecasting

    Authors: Miao Kang, Shengqi Wang, San** Zhou, Ke Ye, **g**g Jiang, Nanning Zheng

    Abstract: Motion forecasting plays a crucial role in autonomous driving, with the aim of predicting the future reasonable motions of traffic agents. Most existing methods mainly model the historical interactions between agents and the environment, and predict multi-modal trajectories in a feedforward process, ignoring potential trajectory changes caused by future interactions between agents. In this paper,… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 11 pages, 8 figures, 12 tables

  36. arXiv:2311.04287  [pdf, other

    cs.CV cs.LG

    Holistic Evaluation of Text-To-Image Models

    Authors: Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, Percy Liang

    Abstract: The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023. First three authors contributed equally

  37. arXiv:2311.03029  [pdf, other

    cs.RO

    Obstacle- and Occlusion-Responsive Visual Tracking Control for Redundant Manipulators using Reachability Measure

    Authors: Mincheul Kang, Junhyoung Ha

    Abstract: A vision system attached to a manipulator excels at tracing a moving target object while effectively handling obstacles, overcoming limitations arising from the camera's confined field of view and occluded line of sight. Meanwhile, the manipulator may encounter certain challenges, including restricted motion due to kinematic constraints and the risk of colliding with external obstacles. These chal… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  38. arXiv:2310.20095  [pdf, other

    cs.CV cs.CG math-ph

    $p$-Poisson surface reconstruction in curl-free flow from point clouds

    Authors: Yesom Park, Taekyung Lee, Jooyoung Hahn, Myungjoo Kang

    Abstract: The aim of this paper is the reconstruction of a smooth surface from an unorganized point cloud sampled by a closed surface, with the preservation of geometric shapes, without any further information other than the point cloud. Implicit neural representations (INRs) have recently emerged as a promising approach to surface reconstruction. However, the reconstruction quality of existing methods reli… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: 21 pages, accepted for Advances in Neural Information Processing Systems, 2023

  39. arXiv:2310.16112  [pdf, other

    cs.CV

    Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge

    Authors: Gregory Holste, Yiliang Zhou, Song Wang, Ajay Jaiswal, Mingquan Lin, Sherry Zhuge, Yuzhe Yang, Dongkyun Kim, Trong-Hieu Nguyen-Mau, Minh-Triet Tran, Jaehyup Jeong, Wongi Park, Jongbin Ryu, Feng Hong, Arsh Verma, Yosuke Yamagishi, Changhyun Kim, Hyeryeong Seo, Myungjoo Kang, Leo Anthony Celi, Zhiyong Lu, Ronald M. Summers, George Shih, Zhangyang Wang, Yifan Peng

    Abstract: Many real-world image recognition problems, such as diagnostic medical imaging exams, are "long-tailed" $\unicode{x2013}$ there are a few common findings followed by many more relatively rare conditions. In chest radiography, diagnosis is both a long-tailed and multi-label problem, as patients often present with multiple findings simultaneously. While researchers have begun to study the problem of… ▽ More

    Submitted 1 April, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: Update after major revision

  40. arXiv:2310.12836  [pdf, other

    cs.CL cs.LG

    Knowledge-Augmented Language Model Verification

    Authors: **heon Baek, Soyeong Jeong, Minki Kang, Jong C. Park, Sung Ju Hwang

    Abstract: Recent Language Models (LMs) have shown impressive capabilities in generating texts with the knowledge internalized in parameters. Yet, LMs often generate the factually incorrect responses to the given queries, since their knowledge may be inaccurate, incomplete, and outdated. To address this problem, previous works propose to augment LMs with the knowledge retrieved from an external knowledge sou… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

  41. arXiv:2310.02611  [pdf, other

    cs.LG cs.CV

    Analyzing and Improving Optimal-Transport-based Adversarial Networks

    Authors: Jaemoo Choi, Jaewoong Choi, Myungjoo Kang

    Abstract: Optimal Transport (OT) problem aims to find a transport plan that bridges two distributions while minimizing a given cost function. OT theory has been widely utilized in generative modeling. In the beginning, OT distance has been used as a measure for assessing the distance between data and generated distributions. Recently, OT transport map between data and prior distributions has been utilized a… ▽ More

    Submitted 7 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: 27 pages, 17 figures

  42. arXiv:2309.12585  [pdf

    cs.CV eess.SP stat.AP

    BGF-YOLO: Enhanced YOLOv8 with Multiscale Attentional Feature Fusion for Brain Tumor Detection

    Authors: Ming Kang, Chee-Ming Ting, Fung Fung Ting, Raphaël C. -W. Phan

    Abstract: You Only Look Once (YOLO)-based object detectors have shown remarkable accuracy for automated brain tumor detection. In this paper, we develop a novel BGF-YOLO architecture by incorporating Bi-level Routing Attention (BRA), Generalized feature pyramid networks (GFPN), and Fourth detecting head into YOLOv8. BGF-YOLO contains an attention mechanism to focus more on important features, and feature py… ▽ More

    Submitted 25 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    MSC Class: 68U10 (Primary) 68T10; 68T07; 62P10 (Secondary) ACM Class: I.4.6; I.5.1; J.3

  43. arXiv:2309.10006  [pdf

    physics.soc-ph cs.AI

    The Optimized path for the public transportation of Incheon in South Korea

    Authors: Soroor Malekmohammadi faradunbeh, Hongle Li, Mangkyu Kang, Choongjae Iim

    Abstract: Path-finding is one of the most popular subjects in the field of computer science. Pathfinding strategies determine a path from a given coordinate to another. The focus of this paper is on finding the optimal path for the bus transportation system based on passenger demand. This study is based on bus stations in Incheon, South Korea, and we show that our modified A* algorithm performs better than… ▽ More

    Submitted 17 September, 2023; originally announced September 2023.

    Comments: 5 pages, 4 figures, 2 tables, presented at Proc. of the Interdisciplinary Conference on Mechanics, Computers and Electrics (ICMECE 2022) 6-7 October 2022, Barcelona, Spain

  44. arXiv:2307.16412  [pdf

    cs.CV eess.SP stat.AP stat.ML

    RCS-YOLO: A Fast and High-Accuracy Object Detector for Brain Tumor Detection

    Authors: Ming Kang, Chee-Ming Ting, Fung Fung Ting, Raphaël C. -W. Phan

    Abstract: With an excellent balance between speed and accuracy, cutting-edge YOLO frameworks have become one of the most efficient algorithms for object detection. However, the performance of using YOLO networks is scarcely investigated in brain tumor detection. We propose a novel YOLO architecture with Reparameterized Convolution based on channel Shuffle (RCS-YOLO). We present RCS and a One-Shot Aggregatio… ▽ More

    Submitted 3 October, 2023; v1 submitted 31 July, 2023; originally announced July 2023.

    MSC Class: 68U10 (Primary) 68T10; 68T07; 62P10 (Secondary) ACM Class: I.4.6; I.5.1; J.3

    Journal ref: In MICCAI 2023 LNCS vol. 14223 600-610 (2023)

  45. arXiv:2307.05469  [pdf, other

    cs.IR cs.LG

    AdaptiveRec: Adaptively Construct Pairs for Contrastive Learning in Sequential Recommendation

    Authors: Jaeheyoung Jeon, Jung Hyun Ryu, Jewoong Cho, Myungjoo Kang

    Abstract: This paper presents a solution to the challenges faced by contrastive learning in sequential recommendation systems. In particular, it addresses the issue of false negative, which limits the effectiveness of recommendation algorithms. By introducing an advanced approach to contrastive learning, the proposed method improves the quality of item embeddings and mitigates the problem of falsely categor… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: 8 pages, 8 figures

  46. arXiv:2307.04377  [pdf, other

    cs.SD eess.AS

    HCLAS-X: Hierarchical and Cascaded Lyrics Alignment System Using Multimodal Cross-Correlation

    Authors: Minsung Kang, Soochul Park, Keunwoo Choi

    Abstract: In this work, we address the challenge of lyrics alignment, which involves aligning the lyrics and vocal components of songs. This problem requires the alignment of two distinct modalities, namely text and audio. To overcome this challenge, we propose a model that is trained in a supervised manner, utilizing the cross-correlation matrix of latent representations between vocals and lyrics. Our syst… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  47. arXiv:2307.04292  [pdf, other

    eess.AS cs.AI

    A Demand-Driven Perspective on Generative Audio AI

    Authors: Sangshin Oh, Minsung Kang, Hyeongi Moon, Keunwoo Choi, Ben Sangbae Chon

    Abstract: To achieve successful deployment of AI research, it is crucial to understand the demands of the industry. In this paper, we present the results of a survey conducted with professional audio engineers, in order to determine research priorities and define various research tasks. We also summarize the current challenges in audio quality and controllability based on the survey. Our analysis emphasizes… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: 10 pages, 7 figures

  48. arXiv:2306.16612  [pdf, other

    cs.CV cs.AI cs.LG

    GuidedMixup: An Efficient Mixup Strategy Guided by Saliency Maps

    Authors: Minsoo Kang, Suhyun Kim

    Abstract: Data augmentation is now an essential part of the image training process, as it effectively prevents overfitting and makes the model more robust against noisy datasets. Recent mixing augmentation strategies have advanced to generate the mixup mask that can enrich the saliency information, which is a supervisory signal. However, these methods incur a significant computational burden to optimize the… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Published at AAAI2023 (Oral)

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, 37(1), 2023, 1096-1104

  49. arXiv:2306.14590  [pdf

    cs.CV eess.SP stat.AP stat.ML

    CST-YOLO: A Novel Method for Blood Cell Detection Based on Improved YOLOv7 and CNN-Swin Transformer

    Authors: Ming Kang, Chee-Ming Ting, Fung Fung Ting, Raphaël Phan

    Abstract: Blood cell detection is a typical small-scale object detection problem in computer vision. In this paper, we propose a CST-YOLO model for blood cell detection based on YOLOv7 architecture and enhance it with the CNN-Swin Transformer (CST), which is a new attempt at CNN-Transformer fusion. We also introduce three other useful modules: Weighted Efficient Layer Aggregation Networks (W-ELAN), Multisca… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    MSC Class: 68T07; 68T10; 68U10; 62P10 ACM Class: I.4.6; I.5.1; J.3

  50. arXiv:2306.13020  [pdf

    eess.IV cs.AI cs.CV

    Toward Automated Detection of Microbleeds with Anatomical Scale Localization: A Complete Clinical Diagnosis Support Using Deep Learning

    Authors: Jun-Ho Kim, Young Noh, Haejoon Lee, Seul Lee, Woo-Ram Kim, Koung Mi Kang, Eung Yeop Kim, Mohammed A. Al-masni, Dong-Hyun Kim

    Abstract: Cerebral Microbleeds (CMBs) are chronic deposits of small blood products in the brain tissues, which have explicit relation to various cerebrovascular diseases depending on their anatomical location, including cognitive decline, intracerebral hemorrhage, and cerebral infarction. However, manual detection of CMBs is a time-consuming and error-prone process because of their sparse and tiny structura… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: 16 pages, 10 figures,3 tables