Skip to main content

Showing 1–50 of 200 results for author: Cho, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01158  [pdf, other

    cs.CL

    Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation

    Authors: Takyoung Kim, Kyungjae Lee, Young Rok Jang, Ji Yong Cho, Gangwoo Kim, Minseok Cho, Moontae Lee

    Abstract: Interactions with billion-scale large language models typically yield long-form responses due to their extensive parametric capacities, along with retrieval-augmented features. While detailed responses provide insightful viewpoint of a specific subject, they frequently generate redundant and less engaging content that does not meet user interests. In this work, we focus on the role of query outlin… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Work in progress. Resources are available at https://github.com/youngerous/qtree

  2. arXiv:2406.17869  [pdf, other

    cs.CV

    Burst Image Super-Resolution with Base Frame Selection

    Authors: Sanghyun Kim, Min Jung Lee, Woohyeok Kim, Deunsol Jung, Jaesung Rim, Sunghyun Cho, Minsu Cho

    Abstract: Burst image super-resolution has been a topic of active research in recent years due to its ability to obtain a high-resolution image by using complementary information between multiple frames in the burst. In this work, we explore using burst shots with non-uniform exposures to confront real-world practical scenarios by introducing a new benchmark dataset, dubbed Non-uniformly Exposed Burst Image… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: CVPR2024W NTIRE accepted

  3. arXiv:2405.05329  [pdf, other

    cs.DC cs.AI cs.CL

    KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation

    Authors: Minsik Cho, Mohammad Rastegari, Devang Naik

    Abstract: Large Language Model or LLM inference has two phases, the prompt (or prefill) phase to output the first token and the extension (or decoding) phase to the generate subsequent tokens. In this work, we propose an efficient parallelization scheme, KV-Runahead to accelerate the prompt phase. The key observation is that the extension phase generates tokens faster than the prompt phase because of key-va… ▽ More

    Submitted 13 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

    Comments: preprint for ICML 2024

  4. arXiv:2405.03892  [pdf, other

    cs.LG cs.AI

    Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows

    Authors: Minjae Cho, Jonathan P. How, Chuangchuang Sun

    Abstract: Despite notable successes of Reinforcement Learning (RL), the prevalent use of an online learning paradigm prevents its widespread adoption, especially in hazardous or costly scenarios. Offline RL has emerged as an alternative solution, learning from pre-collected static datasets. However, this offline learning introduces a new challenge known as distributional shift, degrading the performance whe… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Submitted for review at IEEE: Neural Networks and Learning Systems

  5. arXiv:2404.17419  [pdf, other

    cs.CV

    Multi-view Image Prompted Multi-view Diffusion for Improved 3D Generation

    Authors: Seungwook Kim, Yichun Shi, Kejie Li, Minsu Cho, Peng Wang

    Abstract: Using image as prompts for 3D generation demonstrate particularly strong performances compared to using text prompts alone, for images provide a more intuitive guidance for the 3D generation process. In this work, we delve into the potential of using multiple image prompts, instead of a single image prompt, for 3D generation. Specifically, we build on ImageDream, a novel image-prompt multi-view di… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 5 pages including references, 2 figures, 2 tables

  6. arXiv:2404.11156  [pdf, ps, other

    cs.CV

    Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform

    Authors: Chunghyun Park, Seungwook Kim, Jaesik Park, Minsu Cho

    Abstract: Establishing accurate 3D correspondences between shapes stands as a pivotal challenge with profound implications for computer vision and robotics. However, existing self-supervised methods for this problem assume perfect input shape alignment, restricting their real-world applicability. In this work, we introduce a novel self-supervised Rotation-Invariant 3D correspondence learner with Local Shape… ▽ More

    Submitted 20 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  7. arXiv:2404.10603  [pdf, other

    cs.CV

    Enhancing 3D Fidelity of Text-to-3D using Cross-View Correspondences

    Authors: Seungwook Kim, Kejie Li, Xueqing Deng, Yichun Shi, Minsu Cho, Peng Wang

    Abstract: Leveraging multi-view diffusion models as priors for 3D optimization have alleviated the problem of 3D consistency, e.g., the Janus face problem or the content drift problem, in zero-shot text-to-3D models. However, the 3D geometric fidelity of the output remains an unresolved issue; albeit the rendered 2D views are realistic, the underlying geometry may contain errors such as unreasonable concavi… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 25 pages, 22 figures, accepted to CVPR 2024

  8. arXiv:2404.09451  [pdf, other

    cs.CV

    Contrastive Mean-Shift Learning for Generalized Category Discovery

    Authors: Sua Choi, Dahyun Kang, Minsu Cho

    Abstract: We address the problem of generalized category discovery (GCD) that aims to partition a partially labeled collection of images; only a small part of the collection is labeled and the total number of target classes is unknown. To address this generalized image clustering problem, we revisit the mean-shift algorithm, i.e., a classic, powerful technique for mode seeking, and incorporate it into a con… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024

  9. arXiv:2404.06511  [pdf, other

    cs.CV cs.AI cs.LG

    MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

    Authors: Juhong Min, Shyamal Buch, Arsha Nagrani, Minsu Cho, Cordelia Schmid

    Abstract: This paper addresses the task of video question answering (videoQA) via a decomposed multi-stage, modular reasoning framework. Previous modular methods have shown promise with a single planning stage ungrounded in visual content. However, through a simple and effective baseline, we find that such systems can lead to brittle behavior in practice for challenging videoQA settings. Thus, unlike tradit… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  10. arXiv:2404.03924  [pdf, other

    cs.CV

    Learning Correlation Structures for Vision Transformers

    Authors: Man** Kim, Paul Hongsuck Seo, Cordelia Schmid, Minsu Cho

    Abstract: We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query interactions of attention. StructSA generates attention maps by recognizing space-time structures of key-query correlations via convolution and uses them to dynamically aggregate local contexts of value features. This effectively leverages ri… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  11. arXiv:2404.01842  [pdf, other

    cs.CV

    Semi-Supervised Domain Adaptation for Wildfire Detection

    Authors: JooYoung Jang, Youngseo Cha, Jisu Kim, SooHyung Lee, Geonu Lee, Minkook Cho, Young Hwang, Nojun Kwak

    Abstract: Recently, both the frequency and intensity of wildfires have increased worldwide, primarily due to climate change. In this paper, we propose a novel protocol for wildfire detection, leveraging semi-supervised Domain Adaptation for object detection, accompanied by a corresponding dataset designed for use by both academics and industries. Our dataset encompasses 30 times more diverse labeled scenes… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 16 pages, 5 figures, 22 tables

  12. arXiv:2404.00060  [pdf, other

    q-fin.ST cs.AI cs.LG

    Temporal Graph Networks for Graph Anomaly Detection in Financial Networks

    Authors: Ye** Kim, Youngbin Lee, Minyoung Choe, Sungju Oh, Yongjae Lee

    Abstract: This paper explores the utilization of Temporal Graph Networks (TGN) for financial anomaly detection, a pressing need in the era of fintech and digitized financial transactions. We present a comprehensive framework that leverages TGN, capable of capturing dynamic changes in edges within financial networks, for fraud detection. Our study compares TGN's performance against static Graph Neural Networ… ▽ More

    Submitted 27 March, 2024; originally announced April 2024.

    Comments: Presented at the AAAI 2024 Workshop on AI in Finance for Social Impact (https://sites.google.com/view/aifin-aaai2024)

  13. arXiv:2402.11477  [pdf, other

    cs.CY

    Studying Differential Mental Health Expressions in India

    Authors: Khushi Shelat, Sunny Rai, Devansh R Jain, Kishen Sivabalan, Young Min Cho, Maitreyi Redkar, Samindara Sawant, Sharath Chandra Guntuku

    Abstract: Psychosocial stressors and the symptomatology of mental disorders vary across cultures. However, current understandings of mental health expressions on social media are predominantly derived from studies in WEIRD (Western, Educated, Industrialized, Rich, and Democratic) contexts. In this paper, we analyze mental health posts on Reddit made by individuals in India, to identify variations in online… ▽ More

    Submitted 16 June, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  14. arXiv:2401.05659  [pdf, other

    cs.HC cs.SE

    Engineering Adaptive Information Graphics for Disabled Communities: A Case Study with Public Space Indoor Maps

    Authors: Anuradha Madugalla, Yutan Huang, John Grundy, Min Hee Cho, Lasith Koswatta Gamage, Tristan Leao, Sam Thiele

    Abstract: Most software applications contain graphics such as charts, diagrams and maps. Currently, these graphics are designed with a ``one size fits all" approach and do not cater to the needs of people with disabilities. Therefore, when using software with graphics, a colour-impaired user may struggle to interpret graphics with certain colours, and a person with dyslexia may struggle to read the text lab… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  15. arXiv:2312.11514  [pdf, other

    cs.CL cs.AI cs.LG

    LLM in a flash: Efficient Large Language Model Inference with Limited Memory

    Authors: Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, Karen Khatamifard, Minsik Cho, Carlo C Del Mundo, Mohammad Rastegari, Mehrdad Farajtabar

    Abstract: Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their substantial computational and memory requirements present challenges, especially for devices with limited DRAM capacity. This paper tackles the challenge of efficiently running LLMs that exceed the available DRAM capacity by storing the model parameter… ▽ More

    Submitted 4 January, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: preprint

  16. arXiv:2312.10230  [pdf, other

    cs.AI cs.LG

    Constrained Meta-Reinforcement Learning for Adaptable Safety Guarantee with Differentiable Convex Programming

    Authors: Minjae Cho, Chuangchuang Sun

    Abstract: Despite remarkable achievements in artificial intelligence, the deployability of learning-enabled systems in high-stakes real-world environments still faces persistent challenges. For example, in safety-critical domains like autonomous driving, robotic manipulation, and healthcare, it is crucial not only to achieve high performance but also to comply with given constraints. Furthermore, adaptabili… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  17. arXiv:2312.07315  [pdf, other

    cs.CV

    NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image

    Authors: Yoonwoo Jeong, **woo Lee, Chiheon Kim, Minsu Cho, Doyup Lee

    Abstract: Transfer learning of large-scale Text-to-Image (T2I) models has recently shown impressive potential for Novel View Synthesis (NVS) of diverse objects from a single image. While previous methods typically train large models on multi-view datasets for NVS, fine-tuning the whole parameters of T2I models not only demands a high cost but also reduces the generalization capacity of T2I models in generat… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Project Page: https://postech-cvlab.github.io/nvsadapter/

  18. arXiv:2312.04594  [pdf, other

    cs.CR cs.AI cs.LG

    FedGeo: Privacy-Preserving User Next Location Prediction with Federated Learning

    Authors: Chung Park, Taekyoon Choi, Taesan Kim, Mincheol Cho, Junui Hong, Minsung Choi, Jaegul Choo

    Abstract: A User Next Location Prediction (UNLP) task, which predicts the next location that a user will move to given his/her trajectory, is an indispensable task for a wide range of applications. Previous studies using large-scale trajectory datasets in a single server have achieved remarkable performance in UNLP task. However, in real-world applications, legal and ethical issues have been raised regardin… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted at 31st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL 2023)

  19. arXiv:2312.04266  [pdf, other

    cs.CV

    Activity Grammars for Temporal Action Segmentation

    Authors: Dayoung Gong, Joonseok Lee, Deunsol Jung, Suha Kwak, Minsu Cho

    Abstract: Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties. The task of temporal action segmentation, which aims at translating an untrimmed activity video into a sequence of action segments, remains challenging for this reason. This paper addresses the problem by introducing an effective act… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Accepted to NeurIPS 2023

  20. arXiv:2312.02878  [pdf, other

    cs.CV

    Towards More Practical Group Activity Detection: A New Benchmark and Model

    Authors: Dongkeun Kim, Youngkil Song, Minsu Cho, Suha Kwak

    Abstract: Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video. While GAD has been studied recently, there is still much room for improvement in both dataset and methodology due to their limited capability to address practical GAD scenarios. To resolve these issues, we first present a new dataset, dubbed Café. U… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Project page: https://cvlab.postech.ac.kr/research/CAFE

  21. arXiv:2312.01133  [pdf, other

    stat.ML cs.LG

    $t^3$-Variational Autoencoder: Learning Heavy-tailed Data with Student's t and Power Divergence

    Authors: Juno Kim, Jaehyuk Kwon, Mincheol Cho, Hyunjong Lee, Joong-Ho Won

    Abstract: The variational autoencoder (VAE) typically employs a standard normal prior as a regularizer for the probabilistic latent encoder. However, the Gaussian tail often decays too quickly to effectively accommodate the encoded points, failing to preserve crucial structures hidden in the data. In this paper, we explore the use of heavy-tailed models to combat over-regularization. Drawing upon insights f… ▽ More

    Submitted 3 March, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: ICLR 2024; 27 pages, 7 figures, 8 tables

  22. arXiv:2311.13188  [pdf, other

    cs.AI cs.LG

    Cracking the Code of Negative Transfer: A Cooperative Game Theoretic Approach for Cross-Domain Sequential Recommendation

    Authors: Chung Park, Taesan Kim, Taekyoon Choi, Junui Hong, Yelim Yu, Mincheol Cho, Kyunam Lee, Sungil Ryu, Hyungjun Yoon, Minsung Choi, Jaegul Choo

    Abstract: This paper investigates Cross-Domain Sequential Recommendation (CDSR), a promising method that uses information from multiple domains (more than three) to generate accurate and diverse recommendations, and takes into account the sequential nature of user interactions. The effectiveness of these systems often depends on the complex interplay among the multiple domains. In this dynamic landscape, th… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Accepted at 32nd ACM International Conference on Information and Knowledge Management (CIKM 2023)

  23. arXiv:2311.04336  [pdf, other

    cs.CV

    Efficient Semantic Matching with Hypercolumn Correlation

    Authors: Seungwook Kim, Juhong Min, Minsu Cho

    Abstract: Recent studies show that leveraging the match-wise relationships within the 4D correlation map yields significant improvements in establishing semantic correspondences - but at the cost of increased computation and latency. In this work, we focus on the aspect that the performance improvements of recent methods can also largely be attributed to the usage of multi-scale correlation maps, which hold… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Accepted to WACV 2024. 17 pages including references and supplementary

  24. arXiv:2310.19778  [pdf, other

    cs.HC cs.AI

    Human-AI collaboration is not very collaborative yet: A taxonomy of interaction patterns in AI-assisted decision making from a systematic review

    Authors: Catalina Gomez, Sue Min Cho, Shichang Ke, Chien-Ming Huang, Mathias Unberath

    Abstract: Leveraging Artificial Intelligence (AI) in decision support systems has disproportionately focused on technological advancements, often overlooking the alignment between algorithmic outputs and human expectations. A human-centered perspective attempts to alleviate this concern by designing AI solutions for seamless integration with existing processes. Determining what information AI should provide… ▽ More

    Submitted 18 March, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: 25 pages; 2 figures

  25. arXiv:2310.17017  [pdf, other

    cs.CL cs.AI

    An Integrative Survey on Mental Health Conversational Agents to Bridge Computer Science and Medical Perspectives

    Authors: Young Min Cho, Sunny Rai, Lyle Ungar, João Sedoc, Sharath Chandra Guntuku

    Abstract: Mental health conversational agents (a.k.a. chatbots) are widely studied for their potential to offer accessible support to those experiencing mental health challenges. Previous surveys on the topic primarily consider papers published in either computer science or medicine, leading to a divide in understanding and hindering the sharing of beneficial knowledge between both domains. To bridge this g… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted in EMNLP 2023 Main Conference, camera ready

  26. arXiv:2310.14504  [pdf, other

    cs.CV

    ADoPT: LiDAR Spoofing Attack Detection Based on Point-Level Temporal Consistency

    Authors: Minkyoung Cho, Yulong Cao, Zixiang Zhou, Z. Morley Mao

    Abstract: Deep neural networks (DNNs) are increasingly integrated into LiDAR (Light Detection and Ranging)-based perception systems for autonomous vehicles (AVs), requiring robust performance under adversarial conditions. We aim to address the challenge of LiDAR spoofing attacks, where attackers inject fake objects into LiDAR data and fool AVs to misinterpret their environment and make erroneous decisions.… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

    Comments: BMVC 2023 (17 pages, 13 figures, and 1 table)

  27. arXiv:2310.07174  [pdf, other

    cs.LG stat.ML

    Generalized Neural Sorting Networks with Error-Free Differentiable Swap Functions

    Authors: Jungtaek Kim, Jeongbeen Yoon, Minsu Cho

    Abstract: Sorting is a fundamental operation of all computer systems, having been a long-standing significant research topic. Beyond the problem formulation of traditional sorting algorithms, we consider sorting problems for more abstract yet expressive inputs, e.g., multi-digit images and image fragments, through a neural sorting network. To learn a map** from a high-dimensional input to an ordinal varia… ▽ More

    Submitted 13 March, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted at the 12th International Conference on Learning Representations (ICLR 2024)

  28. Streaming Anchor Loss: Augmenting Supervision with Temporal Significance

    Authors: Utkarsh Oggy Sarawgi, John Berkowitz, Vineet Garg, Arnav Kundu, Minsik Cho, Sai Srujana Buddi, Saurabh Adya, Ahmed Tewfik

    Abstract: Streaming neural network models for fast frame-wise responses to various speech and sensory signals are widely adopted on resource-constrained platforms. Hence, increasing the learning capacity of such streaming models (i.e., by adding more parameters) to improve the predictive power may not be viable for real-world tasks. In this work, we propose a new loss, Streaming Anchor Loss (SAL), to better… ▽ More

    Submitted 18 April, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Published at IEEE ICASSP 2024, please see https://ieeexplore.ieee.org/abstract/document/10447222

    ACM Class: I.2.6; I.5.1; I.5.4; I.6.5

    Journal ref: In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6110-6114). IEEE

  29. arXiv:2310.05624  [pdf, other

    cs.LG cs.CV

    Locality-Aware Generalizable Implicit Neural Representation

    Authors: Doyup Lee, Chiheon Kim, Minsu Cho, Wook-Shin Han

    Abstract: Generalizable implicit neural representation (INR) enables a single continuous function, i.e., a coordinate-based neural network, to represent multiple data instances by modulating its weights or intermediate features using latent codes. However, the expressive power of the state-of-the-art modulation is limited due to its inability to localize and capture fine-grained details of data entities suc… ▽ More

    Submitted 12 October, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: 19 pages, 12 figures

  30. arXiv:2310.04604  [pdf, other

    cs.CR cs.LG

    PriViT: Vision Transformers for Fast Private Inference

    Authors: Naren Dhyani, Jianqiao Mo, Minsu Cho, Ameya Joshi, Siddharth Garg, Brandon Reagen, Chinmay Hegde

    Abstract: The Vision Transformer (ViT) architecture has emerged as the backbone of choice for state-of-the-art deep models for computer vision applications. However, ViTs are ill-suited for private inference using secure multi-party computation (MPC) protocols, due to the large number of non-polynomial operations (self-attention, feed-forward rectifiers, layer normalization). We propose PriViT, a gradient b… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

    Comments: 18 pages, 14 figures

  31. arXiv:2310.00867  [pdf, other

    cs.CL cs.AI

    Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications

    Authors: Duc N. M Hoang, Minsik Cho, Thomas Merth, Mohammad Rastegari, Zhangyang Wang

    Abstract: Compressing Large Language Models (LLMs) often leads to reduced performance, especially for knowledge-intensive tasks. In this work, we dive into how compression damages LLMs' inherent knowledge and the possible remedies. We start by proposing two conjectures on the nature of the damage: one is certain knowledge being forgotten (or erased) after LLM compression, hence necessitating the compressed… ▽ More

    Submitted 16 February, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

  32. arXiv:2309.14786  [pdf, other

    cs.CV

    Treating Motion as Option with Output Selection for Unsupervised Video Object Segmentation

    Authors: Suhwan Cho, Minhyeok Lee, Jungho Lee, MyeongAh Cho, Sangyoun Lee

    Abstract: Unsupervised video object segmentation (VOS) is a task that aims to detect the most salient object in a video without external guidance about the object. To leverage the property that salient objects usually have distinctive movements compared to the background, recent methods collaboratively use motion cues extracted from optical flow maps with appearance cues extracted from RGB images. However,… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

  33. arXiv:2309.00964  [pdf, other

    cs.LG cs.AI

    eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

    Authors: Minsik Cho, Keivan A. Vahid, Qichen Fu, Saurabh Adya, Carlo C Del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal

    Abstract: Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, the size of LLMs (i.e., billions of parameters) requires highly effective compression to fit into storage-limited devices. Among many compression techniques, wei… ▽ More

    Submitted 13 September, 2023; v1 submitted 2 September, 2023; originally announced September 2023.

    Comments: preprint

  34. arXiv:2308.14783  [pdf, other

    cs.LG cs.DC cs.IT

    Distributed Dual Coordinate Ascent with Imbalanced Data on a General Tree Network

    Authors: Myung Cho, Lifeng Lai, Weiyu Xu

    Abstract: In this paper, we investigate the impact of imbalanced data on the convergence of distributed dual coordinate ascent in a tree network for solving an empirical loss minimization problem in distributed machine learning. To address this issue, we propose a method called delayed generalized distributed dual coordinate ascent that takes into account the information of the imbalanced data, and provide… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: To be published in IEEE 2023 Workshop on Machine Learning for Signal Processing (MLSP)

  35. arXiv:2308.06472  [pdf, other

    cs.SD cs.LG eess.AS

    Flexible Keyword Spotting based on Homogeneous Audio-Text Embedding

    Authors: Kumari Nishu, Minsik Cho, Paul Dixon, Devang Naik

    Abstract: Spotting user-defined/flexible keywords represented in text frequently uses an expensive text encoder for joint analysis with an audio encoder in an embedding space, which can suffer from heterogeneous modality representation (i.e., large mismatch) and increased complexity. In this work, we propose a novel architecture to efficiently detect arbitrary keywords based on an audio-compliant text encod… ▽ More

    Submitted 12 August, 2023; originally announced August 2023.

  36. arXiv:2307.03407  [pdf, other

    cs.CV

    Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation

    Authors: Dahyun Kang, Piotr Koniusz, Minsu Cho, Naila Murray

    Abstract: We address the task of weakly-supervised few-shot image classification and segmentation, by leveraging a Vision Transformer (ViT) pretrained with self-supervision. Our proposed method takes token representations from the self-supervised ViT and leverages their correlations, via self-attention, to produce classification and segmentation predictions through separate task heads. Our model is able to… ▽ More

    Submitted 7 July, 2023; originally announced July 2023.

    Comments: Accepted at CVPR 2023

    Journal ref: CVPR 2023

  37. arXiv:2306.11406  [pdf, other

    cs.CV cs.LG

    Stable and Consistent Prediction of 3D Characteristic Orientation via Invariant Residual Learning

    Authors: Seungwook Kim, Chunghyun Park, Yoonwoo Jeong, Jaesik Park, Minsu Cho

    Abstract: Learning to predict reliable characteristic orientations of 3D point clouds is an important yet challenging problem, as different point clouds of the same class may have largely varying appearances. In this work, we introduce a novel method to decouple the shape geometry and semantics of the input point cloud to achieve both stability and consistency. The proposed method integrates shape-geometry-… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted to ICML 2023

  38. arXiv:2306.05245  [pdf, other

    eess.AS cs.LG cs.SD

    Matching Latent Encoding for Audio-Text based Keyword Spotting

    Authors: Kumari Nishu, Minsik Cho, Devang Naik

    Abstract: Using audio and text embeddings jointly for Keyword Spotting (KWS) has shown high-quality results, but the key challenge of how to semantically align two embeddings for multi-word keywords of different sequence lengths remains largely unsolved. In this paper, we propose an audio-text-based end-to-end model architecture for flexible keyword spotting (KWS), which builds upon learned audio and text e… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

  39. Classification of Edge-dependent Labels of Nodes in Hypergraphs

    Authors: Minyoung Choe, Sunwoo Kim, Jaemin Yoo, Kijung Shin

    Abstract: A hypergraph is a data structure composed of nodes and hyperedges, where each hyperedge is an any-sized subset of nodes. Due to the flexibility in hyperedge size, hypergraphs represent group interactions (e.g., co-authorship by more than two authors) more naturally and accurately than ordinary graphs. Interestingly, many real-world systems modeled as hypergraphs contain edge-dependent node labels,… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted to KDD 2023

  40. How Transitive Are Real-World Group Interactions? -- Measurement and Reproduction

    Authors: Sunwoo Kim, Fanchen Bu, Minyoung Choe, Jaemin Yoo, Kijung Shin

    Abstract: Many real-world interactions (e.g., researcher collaborations and email communication) occur among multiple entities. These group interactions are naturally modeled as hypergraphs. In graphs, transitivity is helpful to understand the connections between node pairs sharing a neighbor, and it has extensive applications in various domains. Hypergraphs, an extension of graphs, are designed to represen… ▽ More

    Submitted 25 October, 2023; v1 submitted 4 June, 2023; originally announced June 2023.

    Comments: Published in KDD 2023. 12 pages, 7 figures, and 11 tables

  41. arXiv:2305.11203  [pdf, other

    cs.LG cs.AI cs.CV

    PDP: Parameter-free Differentiable Pruning is All You Need

    Authors: Minsik Cho, Saurabh Adya, Devang Naik

    Abstract: DNN pruning is a popular way to reduce the size of a model, improve the inference latency, and minimize the power consumption on DNN accelerators. However, existing approaches might be too complex, expensive or ineffective to apply to a variety of vision/language tasks, DNN architectures and to honor structured pruning constraints. In this paper, we propose an efficient yet effective train-time pr… ▽ More

    Submitted 17 November, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Journal ref: NeurIPS 2023

  42. arXiv:2304.04997  [pdf, other

    cs.CV cs.AI

    Relational Context Learning for Human-Object Interaction Detection

    Authors: Sanghyun Kim, Deunsol Jung, Minsu Cho

    Abstract: Recent state-of-the-art methods for HOI detection typically build on transformer architectures with two decoder branches, one for human-object pair detection and the other for interaction classification. Such disentangled transformers, however, may suffer from insufficient context exchange between the branches and lead to a lack of context information for relational reasoning, which is critical in… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Comments: accepted to CVPR2023

  43. arXiv:2304.03495  [pdf, other

    cs.CV

    Devil's on the Edges: Selective Quad Attention for Scene Graph Generation

    Authors: Deunsol Jung, Sanghyun Kim, Won Hwa Kim, Minsu Cho

    Abstract: Scene graph generation aims to construct a semantic graph structure from an image such that its nodes and edges respectively represent objects and their relationships. One of the major challenges for the task lies in the presence of distracting objects and relationships in images; contextual reasoning is strongly distracted by irrelevant objects or backgrounds and, more importantly, a vast number… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: Accepted at CVPR 2023; Project page at https://cvlab.postech.ac.kr/research/SQUAT/

  44. arXiv:2303.15472  [pdf, other

    cs.CV

    Learning Rotation-Equivariant Features for Visual Correspondence

    Authors: Jongmin Lee, Byung** Kim, Seungwook Kim, Minsu Cho

    Abstract: Extracting discriminative local features that are invariant to imaging variations is an integral part of establishing correspondences between images. In this work, we introduce a self-supervised learning framework to extract discriminative rotation-invariant descriptors using group-equivariant CNNs. Thanks to employing group-equivariant CNNs, our method effectively learns to obtain rotation-equiva… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023, Project webpage at http://cvlab.postech.ac.kr/research/RELF

  45. arXiv:2303.08253  [pdf, other

    cs.LG cs.CV cs.PF eess.IV

    R2 Loss: Range Restriction Loss for Model Compression and Quantization

    Authors: Arnav Kundu, Chungkuk Yoo, Srijan Mishra, Minsik Cho, Saurabh Adya

    Abstract: Model quantization and compression is widely used techniques to reduce usage of computing resource at inference time. While state-of-the-art works have been achieved reasonable accuracy with higher bit such as 4bit or 8bit, but still it is challenging to quantize/compress a model further, e.g., 1bit or 2bit. To overcome the challenge, we focus on outliers in weights of a pre-trained model which di… ▽ More

    Submitted 11 February, 2024; v1 submitted 14 March, 2023; originally announced March 2023.

  46. arXiv:2303.00451  [pdf, other

    cs.MA cs.AI

    A Variational Approach to Mutual Information-Based Coordination for Multi-Agent Reinforcement Learning

    Authors: Woojun Kim, Whiyoung Jung, Myungsik Cho, Youngchul Sung

    Abstract: In this paper, we propose a new mutual information framework for multi-agent reinforcement learning to enable multiple agents to learn coordinated behaviors by regularizing the accumulated return with the simultaneous mutual information between multi-agent actions. By introducing a latent variable to induce nonzero mutual information between multi-agent actions and applying a variational bound, we… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: arXiv admin note: text overlap with arXiv:2006.02732

  47. arXiv:2302.01568  [pdf, other

    cs.LG cs.DC

    DynaMIX: Resource Optimization for DNN-Based Real-Time Applications on a Multi-Tasking System

    Authors: Minkyoung Cho, Kang G. Shin

    Abstract: As deep neural networks (DNNs) prove their importance and feasibility, more and more DNN-based apps, such as detection and classification of objects, have been developed and deployed on autonomous vehicles (AVs). To meet their growing expectations and requirements, AVs should "optimize" use of their limited onboard computing resources for multiple concurrent in-vehicle apps while satisfying their… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

    Comments: 13 pages, 9 figures, 5 tables

  48. arXiv:2302.00319  [pdf, other

    cs.LG cs.AI q-bio.QM

    Development of deep biological ages aware of morbidity and mortality based on unsupervised and semi-supervised deep learning approaches

    Authors: Seong-Eun Moon, Ji Won Yoon, Shinyoung Joo, Yoohyung Kim, Jae Hyun Bae, Seokho Yoon, Haanju Yoo, Young Min Cho

    Abstract: Background: While deep learning technology, which has the capability of obtaining latent representations based on large-scale data, can be a potential solution for the discovery of a novel aging biomarker, existing deep learning methods for biological age estimation usually depend on chronological ages and lack of consideration of mortality and morbidity that are the most significant outcomes of a… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  49. arXiv:2301.13729  [pdf, other

    cs.DC eess.SY

    Low-rank LQR Optimal Control Design over Wireless Communication Networks

    Authors: Myung Cho, Abdallah Abdallah, Mohammad Rasouli

    Abstract: This paper considers a LQR optimal control design problem for distributed control systems with multi-agents. To control large-scale distributed systems such as smart-grid and multi-agent robotic systems over wireless communication networks, it is desired to design a feedback controller by considering various constraints on communication such as limited power, limited energy, or limited communicati… ▽ More

    Submitted 31 January, 2023; originally announced January 2023.

    Comments: 10 pages

  50. arXiv:2301.00310  [pdf, other

    cs.SI cs.DB

    Graphlets over Time: A New Lens for Temporal Network Analysis

    Authors: Deukryeol Yoon, Dong** Lee, Minyoung Choe, Kijung Shin

    Abstract: Graphs are widely used for modeling various types of interactions, such as email communications and online discussions. Many of such real-world graphs are temporal, and specifically, they grow over time with new nodes and edges. Counting the instances of each graphlet (i.e., an induced subgraph isomorphism class) has been successful in characterizing local structures of graphs, with many applica… ▽ More

    Submitted 3 January, 2023; v1 submitted 31 December, 2022; originally announced January 2023.

    Comments: 13 pages, 7 figures