Skip to main content

Showing 1–50 of 66 results for author: Kang, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11252  [pdf, other

    cs.CV

    Mining Open Semantics from CLIP: A Relation Transition Perspective for Few-Shot Learning

    Authors: Cilin Yan, Haochen Wang, Xiaolong Jiang, Yao Hu, Xu Tang, Guoliang Kang, Efstratios Gavves

    Abstract: Contrastive Vision-Language Pre-training(CLIP) demonstrates impressive zero-shot capability. The key to improve the adaptation of CLIP to downstream task with few exemplars lies in how to effectively model and transfer the useful knowledge embedded in CLIP. Previous work mines the knowledge typically based on the limited visual samples and close-set semantics (i.e., within target category set of d… ▽ More

    Submitted 28 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2405.20233  [pdf, other

    cs.LG cs.AI

    Grokfast: Accelerated Grokking by Amplifying Slow Gradients

    Authors: Jaerin Lee, Bong Gyun Kang, Kihoon Kim, Kyoung Mu Lee

    Abstract: One puzzling artifact in machine learning dubbed grokking is where delayed generalization is achieved tenfolds of iterations after near perfect overfitting to the training data. Focusing on the long delay itself on behalf of machine learning practitioners, our goal is to accelerate generalization of a model under grokking phenomenon. By regarding a series of gradients of a parameter over training… ▽ More

    Submitted 5 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 17 pages, 13 figures. Typo fixed. Project page: https://jaerinlee.com/research/grokfast

  3. arXiv:2404.16685  [pdf, other

    cs.CV cs.AI

    Multi-scale HSV Color Feature Embedding for High-fidelity NIR-to-RGB Spectrum Translation

    Authors: Huiyu Zhai, Mo Chen, Xingxing Yang, Gusheng Kang

    Abstract: The NIR-to-RGB spectral domain translation is a formidable task due to the inherent spectral map** ambiguities within NIR inputs and RGB outputs. Thus, existing methods fail to reconcile the tension between maintaining texture detail fidelity and achieving diverse color variations. In this paper, we propose a Multi-scale HSV Color Feature Embedding Network (MCFNet) that decomposes the map** pr… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  4. arXiv:2404.15190  [pdf, other

    cs.AI cs.CL cs.CV cs.RO

    Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following

    Authors: Suyeon Shin, Su** jeon, Junghyun Kim, Gi-Cheon Kang, Byoung-Tak Zhang

    Abstract: Embodied Instruction Following (EIF) is the task of executing natural language instructions by navigating and interacting with objects in 3D environments. One of the primary challenges in EIF is compositional task planning, which is often addressed with supervised or in-context learning with labeled data. To this end, we introduce the Socratic Planner, the first zero-shot planning method that infe… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 14 pages, 6 figures

    MSC Class: 68T01 (Primary) 68T40; 68T50; 68T45 (Secondary)

  5. arXiv:2404.04913  [pdf, other

    cs.CV

    CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis

    Authors: Gyeong** Kang, Younggeun Lee, Seungjun Oh, Eunbyung Park

    Abstract: Neural Radiance Fields (NeRF) have achieved huge success in effectively capturing and representing 3D objects and scenes. However, several factors have impeded its further proliferation as next-generation 3D media. To establish a ubiquitous presence in everyday media formats, such as images and videos, it is imperative to devise a solution that effectively fulfills three key objectives: fast encod… ▽ More

    Submitted 28 May, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Project page: https://gynjn.github.io/Codec-NeRF/

  6. arXiv:2404.00021  [pdf, other

    cs.HC cs.CE cs.CY cs.PF

    Evaluatology: The Science and Engineering of Evaluation

    Authors: Jianfeng Zhan, Lei Wang, Wanling Gao, Hongxiao Li, Chenxi Wang, Yunyou Huang, Yatao Li, Zhengxin Yang, Guoxin Kang, Chunjie Luo, Hainan Ye, Shaopeng Dai, Zhifei Zhang

    Abstract: Evaluation is a crucial aspect of human existence and plays a vital role in various fields. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant repercussions. This article aims to formally introduce the discipline of evaluatology, which encompasses the science… ▽ More

    Submitted 19 March, 2024; originally announced April 2024.

    Comments: 29 pages, 16 figures, and 2 tables

  7. arXiv:2403.15049  [pdf, other

    cs.CV cs.AI

    Continual Vision-and-Language Navigation

    Authors: Seongjun Jeong, Gi-Cheon Kang, Seongho Choi, Joochan Kim, Byoung-Tak Zhang

    Abstract: Vision-and-Language Navigation (VLN) agents navigate to a destination using natural language instructions and the visual information they observe. Existing methods for training VLN agents presuppose fixed datasets, leading to a significant limitation: the introduction of new environments necessitates retraining with previously encountered environments to preserve their knowledge. This makes it dif… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  8. arXiv:2401.16808  [pdf, other

    cs.LG cs.AI

    Encoding Temporal Statistical-space Priors via Augmented Representation

    Authors: Insu Choi, Woosung Koh, Gimin Kang, Yuntae Jang, Woo Chang Kim

    Abstract: Modeling time series data remains a pervasive issue as the temporal dimension is inherent to numerous domains. Despite significant strides in time series forecasting, high noise-to-signal ratio, non-normality, non-stationarity, and lack of data continue challenging practitioners. In response, we leverage a simple representation augmentation technique to overcome these challenges. Our augmented rep… ▽ More

    Submitted 3 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: pre-print

  9. arXiv:2312.14611  [pdf, other

    cs.CV

    Tuning-Free Inversion-Enhanced Control for Consistent Image Editing

    Authors: Xiaoyue Duan, Shuhao Cui, Guoliang Kang, Baochang Zhang, Zhengcong Fei, Mingyuan Fan, Junshi Huang

    Abstract: Consistent editing of real images is a challenging task, as it requires performing non-rigid edits (e.g., changing postures) to the main objects in the input image without changing their identity or attributes. To guarantee consistent attributes, some existing methods fine-tune the entire model or the textual embedding for structural consistency, but they are time-consuming and fail to perform non… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  10. arXiv:2311.13326  [pdf, other

    cs.LG cs.AI q-fin.PM

    Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series

    Authors: Woosung Koh, Insu Choi, Yuntae Jang, Gimin Kang, Woo Chang Kim

    Abstract: Curriculum learning and imitation learning have been leveraged extensively in the robotics domain. However, minimal research has been done on leveraging these ideas on control tasks over highly stochastic time-series data. Here, we theoretically and empirically explore these approaches in a representative control task over complex time-series data. We implement the fundamental ideas of curriculum… ▽ More

    Submitted 12 January, 2024; v1 submitted 22 November, 2023; originally announced November 2023.

    Comments: AAAI 2024 AI4TS Workshop Oral

  11. arXiv:2311.00353  [pdf, other

    cs.CV

    LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation

    Authors: Yuxiang Bao, Di Qiu, Guoliang Kang, Baochang Zhang, Bo **, Kaiye Wang, Pengfei Yan

    Abstract: Leveraging the generative ability of image diffusion models offers great potential for zero-shot video-to-video translation. The key lies in how to maintain temporal consistency across generated video frames by image diffusion models. Previous methods typically adopt cross-frame attention, \emph{i.e.,} sharing the \textit{key} and \textit{value} tokens across attentions of different frames, to enc… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  12. arXiv:2310.19202  [pdf

    q-bio.QM cs.LG eess.SP

    Improved Motor Imagery Classification Using Adaptive Spatial Filters Based on Particle Swarm Optimization Algorithm

    Authors: Xiong Xiong, Ying Wang, Tianyuan Song, **guo Huang, Guixia Kang

    Abstract: As a typical self-paced brain-computer interface (BCI) system, the motor imagery (MI) BCI has been widely applied in fields such as robot control, stroke rehabilitation, and assistance for patients with stroke or spinal cord injury. Many studies have focused on the traditional spatial filters obtained through the common spatial pattern (CSP) method. However, the CSP method can only obtain fixed sp… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: 25 pages, 8 figures

  13. arXiv:2310.19198  [pdf

    q-bio.QM cs.LG eess.SP

    Enhancing Motor Imagery Decoding in Brain Computer Interfaces using Riemann Tangent Space Map** and Cross Frequency Coupling

    Authors: Xiong Xiong, Li Su, **guo Huang, Guixia Kang

    Abstract: Objective: Motor Imagery (MI) serves as a crucial experimental paradigm within the realm of Brain Computer Interfaces (BCIs), aiming to decoding motor intentions from electroencephalogram (EEG) signals. Method: Drawing inspiration from Riemannian geometry and Cross-Frequency Coupling (CFC), this paper introduces a novel approach termed Riemann Tangent Space Map** using Dichotomous Filter Bank wi… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: 22 pages, 7 figures

  14. arXiv:2310.12547  [pdf, other

    cs.RO cs.CV cs.LG

    PGA: Personalizing Gras** Agents with Single Human-Robot Interaction

    Authors: Junghyun Kim, Gi-Cheon Kang, Jaein Kim, Seoyun Yang, Minjoon Jung, Byoung-Tak Zhang

    Abstract: Language-Conditioned Robotic Gras** (LCRG) aims to develop robots that comprehend and grasp objects based on natural language instructions. While the ability to understand personal objects like my wallet facilitates more natural interaction with human users, current LCRG systems only allow generic language instructions, e.g., the black-colored wallet next to the laptop. To this end, we introduce… ▽ More

    Submitted 19 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: 8 pages, under review

  15. arXiv:2309.07759  [pdf, other

    cs.CL cs.RO

    PROGrasp: Pragmatic Human-Robot Communication for Object Gras**

    Authors: Gi-Cheon Kang, Junghyun Kim, Jaein Kim, Byoung-Tak Zhang

    Abstract: Interactive Object Gras** (IOG) is the task of identifying and gras** the desired object via human-robot natural language interaction. Current IOG systems assume that a human user initially specifies the target object's category (e.g., bottle). Inspired by pragmatics, where humans often convey their intentions by relying on context to achieve goals, we introduce a new IOG task, Pragmatic-IOG,… ▽ More

    Submitted 5 April, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: ICRA 2024

  16. arXiv:2308.16529  [pdf

    cs.RO cs.AI cs.HC

    Develo** Social Robots with Empathetic Non-Verbal Cues Using Large Language Models

    Authors: Yoon Kyung Lee, Yoonwon Jung, Gyuyi Kang, Sowon Hahn

    Abstract: We propose augmenting the empathetic capacities of social robots by integrating non-verbal cues. Our primary contribution is the design and labeling of four types of empathetic non-verbal cues, abbreviated as SAFE: Speech, Action (gesture), Facial expression, and Emotion, in a social robot. These cues are generated using a Large Language Model (LLM). We developed an LLM-based conversational system… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Journal ref: In Proceedings of 2023 IEEE International Conference on Robot & Human Interactive Communication (RO-MAN)

  17. arXiv:2307.05963  [pdf, other

    cs.RO cs.CV

    GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

    Authors: Junghyun Kim, Gi-Cheon Kang, Jaein Kim, Suyeon Shin, Byoung-Tak Zhang

    Abstract: Language-Guided Robotic Manipulation (LGRM) is a challenging task as it requires a robot to understand human instructions to manipulate everyday objects. Recent approaches in LGRM rely on pre-trained Visual Grounding (VG) models to detect objects without adapting to manipulation environments. This results in a performance drop due to a substantial domain gap between the pre-training and real-world… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: Accepted at IROS2023

  18. arXiv:2307.04422  [pdf, other

    cs.RO eess.SY

    A Versatile Door Opening System with Mobile Manipulator through Adaptive Position-Force Control and Reinforcement Learning

    Authors: Gyuree Kang, Hyunki Seong, Daegyu Lee, D. Hyunchul Shim

    Abstract: The ability of robots to navigate through doors is crucial for their effective operation in indoor environments. Consequently, extensive research has been conducted to develop robots capable of opening specific doors. However, the diverse combinations of door handles and opening directions necessitate a more versatile door opening system for robots to successfully operate in real-world environment… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

  19. arXiv:2307.00965  [pdf, other

    cs.LG cs.AI

    OpenClinicalAI: An Open and Dynamic Model for Alzheimer's Disease Diagnosis

    Authors: Yunyou Huang, Xiaoshuang Liang, Xiangjiang Lu, Xiuxia Miao, Jiyue Xie, Wen**g Liu, Fan Zhang, Guoxin Kang, Li Ma, Suqin Tang, Zhifei Zhang, Jianfeng Zhan

    Abstract: Although Alzheimer's disease (AD) cannot be reversed or cured, timely diagnosis can significantly reduce the burden of treatment and care. Current research on AD diagnosis models usually regards the diagnosis task as a typical classification task with two primary assumptions: 1) All target categories are known a priori; 2) The diagnostic strategy for each patient is consistent, that is, the number… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: Real-world clinical setting,Alzheimer's disease,diagnose,AI,deep learning. arXiv admin note: text overlap with arXiv:2109.04004

  20. Robust Imaging Sonar-based Place Recognition and Localization in Underwater Environments

    Authors: Hogyun Kim, Gilhwan Kang, Seokhwan Jeong, Seungjun Ma, Younggun Cho

    Abstract: Place recognition using SOund Navigation and Ranging (SONAR) images is an important task for simultaneous localization and map**(SLAM) in underwater environments. This paper proposes a robust and efficient imaging SONAR based place recognition, SONAR context, and loop closure method. Unlike previous methods, our approach encodes geometric information based on the characteristics of raw SONAR mea… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: 7 pages, 8 figures

  21. arXiv:2305.11488  [pdf, other

    cs.CV

    AttriCLIP: A Non-Incremental Learner for Incremental Knowledge Learning

    Authors: Runqi Wang, Xiaoyue Duan, Guoliang Kang, Jianzhuang Liu, Shaohui Lin, Songcen Xu, **hu Lv, Baochang Zhang

    Abstract: Continual learning aims to enable a model to incrementally learn knowledge from sequentially arrived data. Previous works adopt the conventional classification architecture, which consists of a feature extractor and a classifier. The feature extractor is shared across sequentially arrived tasks or classes, but one specific group of weights of the classifier corresponding to one new class should be… ▽ More

    Submitted 20 March, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

  22. arXiv:2305.07945  [pdf, other

    cs.IT eess.SP

    Deep Learning-based Data-aided Activity Detection with Extraction Network in Grant-free Sparse Code Multiple Access Systems

    Authors: Minsig Han, Ameha T. Abebe, Chung G. Kang

    Abstract: This letter proposes a deep learning-based data-aided active user detection network (D-AUDN) for grant-free sparse code multiple access (SCMA) systems that leverages both SCMA codebook and Zadoff-Chu preamble for activity detection. Due to disparate data and preamble distribution as well as codebook collision, existing D-AUDNs experience performance degradation when multiple preambles are associat… ▽ More

    Submitted 19 May, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

  23. arXiv:2304.11609  [pdf

    cs.CV

    PiClick: Picking the desired mask from multiple candidates in click-based interactive segmentation

    Authors: Cilin Yan, Haochen Wang, Jie Liu, Xiaolong Jiang, Yao Hu, Xu Tang, Guoliang Kang, Efstratios Gavves

    Abstract: Click-based interactive segmentation aims to generate target masks via human clicking, which facilitates efficient pixel-level annotation and image editing. In such a task, target ambiguity remains a problem hindering the accuracy and efficiency of segmentation. That is, in scenes with rich context, one click may correspond to multiple potential targets, while most previous interactive segmentors… ▽ More

    Submitted 17 June, 2024; v1 submitted 23 April, 2023; originally announced April 2023.

  24. arXiv:2303.05118  [pdf, other

    cs.CV cs.AI cs.LG

    SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model

    Authors: Gengwei Zhang, Liyuan Wang, Guoliang Kang, Ling Chen, Yunchao Wei

    Abstract: The goal of continual learning is to improve the performance of recognition models in learning sequentially arrived data. Although most existing works are established on the premise of learning from scratch, growing efforts have been devoted to incorporating the benefits of pre-training. However, how to adaptively exploit the pre-trained knowledge for each incremental task while maintaining its ge… ▽ More

    Submitted 9 October, 2023; v1 submitted 9 March, 2023; originally announced March 2023.

    Comments: ICCV 2023, code released

  25. arXiv:2302.12954  [pdf, other

    cs.PF

    WPC: Whole-picture Workload Characterization

    Authors: Lei Wang, Kaiyong Yang, Chenxi Wang, Wanling Gao, Chunjie Luo, Fan Zhang, Zhongxin Ge, Li Zhang, Guoxin Kang, Jianfeng Zhan

    Abstract: This article raises an important and challenging workload characterization issue: can we uncover each critical component across the stacks contributing what percentages to any specific bottleneck? The typical critical components include languages, programming frameworks, runtime environments, instruction set architectures (ISA), operating systems (OS), and microarchitecture. Tackling this issue co… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

  26. arXiv:2302.09927  [pdf, other

    cs.DB

    NHtapDB: Native HTAP Databases

    Authors: Guoxin Kang, Lei Wang, Simin Chen, Jianfeng Zhan

    Abstract: Native database (1) provides a near-data machine learning framework to facilitate generating real-time business insight, and predefined change thresholds will trigger online training and deployment of new models, and (2) offers a mixed-format store to guarantee the performance of HTAP workloads, especially the hybrid workloads that consist of OLAP queries in-between online transactions. We make ri… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

  27. arXiv:2212.00721  [pdf, other

    cs.DC cs.NI

    High fusion computers: The IoTs, edges, data centers, and humans-in-the-loop as a computer

    Authors: Wanling Gao, Lei Wang, Mingyu Chen, ** Xiong, Chunjie Luo, Wenli Zhang, Yunyou Huang, Wei** Li, Guoxin Kang, Chen Zheng, Biwei Xie, Shaopeng Dai, Qian He, Hainan Ye, Yungang Bao, Jianfeng Zhan

    Abstract: Emerging and future applications rely heavily upon systems consisting of Internet of Things (IoT), edges, data centers, and humans-in-the-loop. Significantly different from warehouse-scale computers that serve independent concurrent user requests, this new class of computer systems directly interacts with the physical world, considering humans an essential part and performing safety-critical and m… ▽ More

    Submitted 18 November, 2022; originally announced December 2022.

    Comments: This paper has been published in BenchCouncil Transactions on Benchmarks, Standards and Evaluations (TBench). Link: https://www.sciencedirect.com/science/article/pii/S277248592200062X

    Journal ref: BenchCouncil Transactions on Benchmarks, Standards and Evaluations (2022)

  28. arXiv:2211.15180  [pdf, other

    cs.CV

    Rethinking the Number of Shots in Robust Model-Agnostic Meta-Learning

    Authors: Xiaoyue Duan, Guoliang Kang, Runqi Wang, Shumin Han, Song Xue, Tian Wang, Baochang Zhang

    Abstract: Robust Model-Agnostic Meta-Learning (MAML) is usually adopted to train a meta-model which may fast adapt to novel classes with only a few exemplars and meanwhile remain robust to adversarial attacks. The conventional solution for robust MAML is to introduce robustness-promoting regularization during meta-training stage. With such a regularization, previous robust MAML methods simply follow the typ… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  29. arXiv:2210.17302  [pdf, other

    cs.RO eess.SY

    Design, Field Evaluation, and Traffic Analysis of a Competitive Autonomous Driving Model in a Congested Environment

    Authors: Daegyu Lee, Hyunki Seong, Seungil Han, Gyuree Kang, D. Hyunchul Shim, Yoon** Yoon

    Abstract: Recently, numerous studies have investigated cooperative traffic systems using the communication among vehicle-to-everything (V2X). Unfortunately, when multiple autonomous vehicles are deployed while exposed to communication failure, there might be a conflict of ideal conditions between various autonomous vehicles leading to adversarial situation on the roads. In South Korea, virtual and real-worl… ▽ More

    Submitted 6 November, 2022; v1 submitted 31 October, 2022; originally announced October 2022.

  30. arXiv:2208.10725  [pdf, other

    cs.NI

    DRL-based Distributed Resource Allocation for Edge Computing in Cell-Free Massive MIMO Network

    Authors: Fitsum Debebe Tilahun, Ameha Tsegaye Abebe, Chung G. Kang

    Abstract: In this paper, with the aim of addressing the stringent computing and quality-of-service (QoS) requirements of recently introduced advanced multimedia services, we consider a cell-free massive MIMO-enabled mobile edge network. In particular, benefited from the reliable cell-free links to offload intensive computation to the edge server, resource-constrained end-users can augment on-board (local) p… ▽ More

    Submitted 23 August, 2022; originally announced August 2022.

    Comments: 6 pages, 4 figures, conference. arXiv admin note: substantial text overlap with arXiv:2201.09057

  31. arXiv:2208.08128  [pdf, other

    cs.IT eess.SY

    On the Performance of Deep Learning-based Data-aided Active User Detection for GF-SCMA System

    Authors: Minsig Han, Ameha Tsegaye Abebe, Chung G. Kang

    Abstract: The recent works on a deep learning (DL)-based joint design of preamble set for the transmitters and data-aided active user detection (AUD) in the receiver has demonstrated a significant performance improvement for grant-free sparse code multiple access (GF-SCMA) system. The autoencoder for the joint design can be trained only in a given environment, but in an actual situation where the operating… ▽ More

    Submitted 5 September, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

  32. arXiv:2205.12502  [pdf, other

    cs.CV cs.CL cs.LG

    The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training

    Authors: Gi-Cheon Kang, Sungdong Kim, **-Hwa Kim, Donghyun Kwak, Byoung-Tak Zhang

    Abstract: Visual dialog (VisDial) is a task of answering a sequence of questions grounded in an image, using the dialog history as context. Prior work has trained the dialog agents solely on VisDial data via supervised learning or leveraged pre-training on related vision-and-language datasets. This paper presents a semi-supervised learning approach for visually-grounded dialog, called Generative Self-Traini… ▽ More

    Submitted 2 March, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: CVPR 2023

  33. arXiv:2205.10780  [pdf, other

    eess.SY cs.IT cs.LG

    Data-aided Active User Detection with a User Activity Extraction Network for Grant-free SCMA Systems

    Authors: Minsig Han, Ameha T. Abebe, Chung G. Kang

    Abstract: In grant-free sparse code multiple access (GF-SCMA) system, active user detection (AUD) is a major performance bottleneck as it involves complex combinatorial problem, which makes joint design of contention resources for users and AUD at the receiver a crucial but a challenging problem. To this end, we propose autoencoder (AE)-based joint optimization of both preamble generation networks (PGNs) in… ▽ More

    Submitted 8 August, 2022; v1 submitted 22 May, 2022; originally announced May 2022.

  34. OLxPBench: Real-time, Semantically Consistent, and Domain-specific are Essential in Benchmarking, Designing, and Implementing HTAP Systems

    Authors: Guoxin Kang, Lei Wang, Wanling Gao, Fei Tang, Jianfeng Zhan

    Abstract: As real-time analysis of the new data become increasingly compelling, more organizations deploy Hybrid Transactional/Analytical Processing (HTAP) systems to support real-time queries on data recently generated by online transaction processing. This paper argues that real-time queries, semantically consistent schema, and domain-specific workloads are essential in benchmarking, designing, and implem… ▽ More

    Submitted 5 April, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

    Comments: Accepted to ICDE 2022. International Open Benchmark Council (BenchCouncil) sets up the OLxPBench homepage at https://www.benchcouncil.org/olxpbench/

  35. arXiv:2201.09057  [pdf, other

    cs.NI eess.SY

    Multi-Agent Reinforcement Learning for Distributed Joint Communication and Computing Resource Allocation over Cell-Free Massive MIMO-enabled Mobile Edge Computing Network

    Authors: Fitsum Debebe Tilahun, Ameha Tsegaye Abebe, Chung G. Kang

    Abstract: To support the newly introduced multimedia services with ultra-low latency and extensive computation requirements, resource-constrained end user devices should utilize the ubiquitous computing resources available at network edge for augmenting on-board (local) processing with edge computing. In this regard, the capability of cell-free massive MIMO to provide reliable access links by guaranteeing u… ▽ More

    Submitted 1 July, 2023; v1 submitted 3 December, 2021; originally announced January 2022.

  36. arXiv:2201.06618  [pdf, other

    cs.LG cs.CV

    VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer

    Authors: Mengshu Sun, Haoyu Ma, Guoliang Kang, Yifan Jiang, Tianlong Chen, Xiaolong Ma, Zhangyang Wang, Yanzhi Wang

    Abstract: The transformer architectures with attention mechanisms have obtained success in Nature Language Processing (NLP), and Vision Transformers (ViTs) have recently extended the application domains to various vision tasks. While achieving high performance, ViTs suffer from large model size and high computation complexity that hinders the deployment of them on edge devices. To achieve high throughput on… ▽ More

    Submitted 18 February, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

  37. arXiv:2109.04004  [pdf, ps, other

    cs.AI

    OpenClinicalAI: enabling AI to diagnose diseases in real-world clinical settings

    Authors: Yunyou Huang, Nana Wang, Suqin Tang, Li Ma, Tianshu Hao, Zihan Jiang, Fan Zhang, Guoxin Kang, Xiuxia Miao, Xianglong Guan, Ruchang Zhang, Zhifei Zhang, Jianfeng Zhan

    Abstract: This paper quantitatively reveals the state-of-the-art and state-of-the-practice AI systems only achieve acceptable performance on the stringent conditions that all categories of subjects are known, which we call closed clinical settings, but fail to work in real-world clinical settings. Compared to the diagnosis task in the closed setting, real-world clinical settings pose severe challenges, and… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

  38. arXiv:2106.10446  [pdf, other

    cs.CV cs.AI

    Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering

    Authors: Ahjeong Seo, Gi-Cheon Kang, Joonhan Park, Byoung-Tak Zhang

    Abstract: Video Question Answering is a task which requires an AI agent to answer questions grounded in video. This task entails three key challenges: (1) understand the intention of various questions, (2) capturing various elements of the input video (e.g., object, action, causality), and (3) cross-modal grounding between language and vision information. We propose Motion-Appearance Synergistic Networks (M… ▽ More

    Submitted 19 June, 2021; originally announced June 2021.

    Comments: ACL 2021

  39. arXiv:2106.02320  [pdf, other

    cs.CV

    Few-Shot Segmentation via Cycle-Consistent Transformer

    Authors: Gengwei Zhang, Guoliang Kang, Yi Yang, Yunchao Wei

    Abstract: Few-shot segmentation aims to train a segmentation model that can fast adapt to novel classes with few exemplars. The conventional training paradigm is to learn to make predictions on query images conditioned on the features from support images. Previous methods only utilized the semantic-level prototypes of support images as conditional information. These methods cannot utilize all pixel-wise sup… ▽ More

    Submitted 7 March, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: Advances in Neural Information Processing Systems (NeurIPS), 2021. Project: https://github.com/GengDavid/CyCTR

  40. arXiv:2104.00818  [pdf, other

    cs.IT eess.SY

    Deep Learning-based Codebook Design for Code-domain Non-Orthogonal Multiple Access Approaching Single-User Bit Error Rate Performance

    Authors: Minsig Han, Hanchang Seo, Ameha Tsegaye Abebe, Chung G. Kang

    Abstract: A general form of codebook design for code-domain non-orthogonal multiple access (CD-NOMA) can be considered equivalent to an autoencoder (AE)-based constellation design for multi-user multidimensional modulation (MU-MDM). Due to a constrained design space for optimal constellation, e.g., fixed resource map** and equal power allocation to all codebooks, however, existing AE architectures produce… ▽ More

    Submitted 10 October, 2021; v1 submitted 1 April, 2021; originally announced April 2021.

  41. arXiv:2103.11167  [pdf, other

    cs.IT

    Multi-sequence Spreading Random Access (MSRA) for Compressive Sensing-based Grant-free Communication

    Authors: Ameha Tsegaye Abebe, Chung G. Kang

    Abstract: The performance of grant-free random access (GF-RA) is limited by the number of accessible random access resources (RRs) due to the absence of collision resolution. Compressive sensing (CS)-based RA schemes scale up the RRs at the expense of increased non-orthogonality among transmitted signals. This paper presents the design of multi-sequence spreading random access (MSRA) which employs multiple… ▽ More

    Submitted 20 March, 2021; originally announced March 2021.

  42. arXiv:2011.00147  [pdf, other

    cs.CV

    Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

    Authors: Guoliang Kang, Yunchao Wei, Yi Yang, Yueting Zhuang, Alexander G. Hauptmann

    Abstract: Domain adaptive semantic segmentation aims to train a model performing satisfactory pixel-level predictions on the target with only out-of-domain (source) annotations. The conventional solution to this task is to minimize the discrepancy between source and target to enable effective knowledge transfer. Previous domain discrepancy minimization methods are mainly based on the adversarial training. T… ▽ More

    Submitted 30 October, 2020; originally announced November 2020.

    Comments: Accepted by NeurIPS 2020 (oral). Code: https://github.com/kgl-prml/Pixel- Level-Cycle-Association

  43. arXiv:2008.06208  [pdf

    eess.AS cs.CL cs.SD

    Adaptable Multi-Domain Language Model for Transformer ASR

    Authors: Taewoo Lee, Min-Joong Lee, Tae Gyoon Kang, Seokyeoung Jung, Minseok Kwon, Yeona Hong, Jungin Lee, Kyoung-Gu Woo, Ho-Gyeong Kim, Jiseung Jeong, Jihyun Lee, Hosik Lee, Young Sang Choi

    Abstract: We propose an adapter based multi-domain Transformer based language model (LM) for Transformer ASR. The model consists of a big size common LM and small size adapters. The model can perform multi-domain adaptation with only the small size adapters and its related layers. The proposed model can reuse the full fine-tuned LM which is fine-tuned using all layers of an original model. The proposed LM c… ▽ More

    Submitted 10 February, 2021; v1 submitted 14 August, 2020; originally announced August 2020.

    Comments: This paper is accepted for presentation at IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE ICASSP), 2021

  44. Label Propagation Adaptive Resonance Theory for Semi-supervised Continuous Learning

    Authors: Taehyeong Kim, Injune Hwang, Gi-Cheon Kang, Won-Seok Choi, Hyunseo Kim, Byoung-Tak Zhang

    Abstract: Semi-supervised learning and continuous learning are fundamental paradigms for human-level intelligence. To deal with real-world problems where labels are rarely given and the opportunity to access the same data is limited, it is necessary to apply these two paradigms in a joined fashion. In this paper, we propose Label Propagation Adaptive Resonance Theory (LPART) for semi-supervised continuous l… ▽ More

    Submitted 16 April, 2020; originally announced May 2020.

    Comments: 5 pages, 2 figures, 1 table, accepted in ICASSP 2020

  45. Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision

    Authors: Soo-Whan Chung, Hong Goo Kang, Joon Son Chung

    Abstract: The goal of this work is to train discriminative cross-modal embeddings without access to manually annotated data. Recent advances in self-supervised learning have shown that effective representations can be learnt from natural cross-modal synchrony. We build on earlier work to train embeddings that are more discriminative for uni-modal downstream tasks. To this end, we propose a novel training st… ▽ More

    Submitted 6 May, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: Under submission as a conference paper

  46. arXiv:2004.06698  [pdf, other

    cs.CV cs.CL cs.LG

    Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer

    Authors: Gi-Cheon Kang, Junseok Park, Hwaran Lee, Byoung-Tak Zhang, **-Hwa Kim

    Abstract: Visual dialog is a task of answering a sequence of questions grounded in an image using the previous dialog history as context. In this paper, we study how to address two fundamental challenges for this task: (1) reasoning over underlying semantic structures among dialog rounds and (2) identifying several appropriate answers to the given question. To address these challenges, we propose a Sparse G… ▽ More

    Submitted 30 August, 2021; v1 submitted 14 April, 2020; originally announced April 2020.

    Comments: EMNLP 2021 Findings

  47. arXiv:2002.00137  [pdf, other

    cs.CV

    Training-free Monocular 3D Event Detection System for Traffic Surveillance

    Authors: Lijun Yu, Peng Chen, Wenhe Liu, Guoliang Kang, Alexander G. Hauptmann

    Abstract: We focus on the problem of detecting traffic events in a surveillance scenario, including the detection of both vehicle actions and traffic collisions. Existing event detection systems are mostly learning-based and have achieved convincing performance when a large amount of training data is available. However, in real-world scenarios, collecting sufficient labeled training data is expensive and so… ▽ More

    Submitted 31 January, 2020; originally announced February 2020.

    Comments: To be published in 2019 IEEE International Conference on Big Data (Big Data), IEEE

  48. arXiv:1909.08929  [pdf, other

    cs.LG stat.ML

    Automobile Theft Detection by Clustering Owner Driver Data

    Authors: Yong Goo Kang, Kyung Ho Park, Huy Kang Kim

    Abstract: As automobiles become intelligent, automobile theft methods are evolving intelligently. Therefore automobile theft detection has become a major research challenge. Data-mining, biometrics, and additional authentication methods have been proposed to address automobile theft, in previous studies. Among these methods, data-mining can be used to analyze driving characteristics and identify a driver co… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: 15 pages, 7 figures, 3 tables, In Proceedings of the 17th escar Europe 2019

  49. arXiv:1908.01925  [pdf, other

    cs.CV

    Attract or Distract: Exploit the Margin of Open Set

    Authors: Qianyu Feng, Guoliang Kang, Hehe Fan, Yi Yang

    Abstract: Open set domain adaptation aims to diminish the domain shift across domains, with partially shared classes. There exist unknown target samples out of the knowledge of source domain. Compared to the close set setting, how to separate the unknown (unshared) class from the known (shared) ones plays a key role. Whereas, previous methods did not emphasize the semantic structure of the open set data, wh… ▽ More

    Submitted 10 August, 2019; v1 submitted 5 August, 2019; originally announced August 2019.

    Comments: Presented at ICCV 2019

  50. arXiv:1908.00248  [pdf

    eess.SP cs.IT

    Achievable Degrees of Freedom for Closed-form Solution to Interference Alignment and Cancellation in Gaussian Interference Multiple Access Channel

    Authors: Qu Xin, Chung G. Kang

    Abstract: A combined technique of interference alignment (IA) and interference cancellation (IC), known as interference alignment and cancellation (IAC) scheme, has been proposed to improve the total achievable degrees of freedom (DoFs) over IA. Since it is NP-hard to solve the transceiver under a given tuple of DoFs or to maximize the total achievable DoFs in the general system configuration by IA (or IAC)… ▽ More

    Submitted 1 August, 2019; originally announced August 2019.

    Comments: 10 pages