Skip to main content

Showing 1–50 of 275 results for author: Yi, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00888  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Papez: Resource-Efficient Speech Separation with Auditory Working Memory

    Authors: Hyunseok Oh, Juheon Yi, Youngki Lee

    Abstract: Transformer-based models recently reached state-of-the-art single-channel speech separation accuracy; However, their extreme computational load makes it difficult to deploy them in resource-constrained mobile or IoT devices. We thus present Papez, a lightweight and computation-efficient single-channel speech separation model. Papez is based on three key techniques. We first replace the inter-chunk… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 5 pages. Accepted by ICASSP 2023

  2. arXiv:2406.16200  [pdf, other

    cs.LG cs.CR cs.IT eess.SP

    Towards unlocking the mystery of adversarial fragility of neural networks

    Authors: **gchao Gao, Raghu Mudumbai, Xiaodong Wu, Jirong Yi, Catherine Xu, Hui Xie, Weiyu Xu

    Abstract: In this paper, we study the adversarial robustness of deep neural networks for classification tasks. We look at the smallest magnitude of possible additive perturbations that can change the output of a classification algorithm. We provide a matrix-theoretic explanation of the adversarial fragility of deep neural network for classification. In particular, our theoretical results show that neural ne… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 21 pages

  3. arXiv:2406.12260  [pdf, other

    cs.LG cs.AI cs.CL

    Self-Supervised Time-Series Anomaly Detection Using Learnable Data Augmentation

    Authors: Kuk** Choi, Jihun Yi, Jisoo Mok, Sungroh Yoon

    Abstract: Continuous efforts are being made to advance anomaly detection in various manufacturing processes to increase the productivity and safety of industrial sites. Deep learning replaced rule-based methods and recently emerged as a promising method for anomaly detection in diverse industries. However, in the real world, the scarcity of abnormal data and difficulties in obtaining labeled data create lim… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 11 pages, 4 figures, IEEE Transactions on Emerging Topics in Computational Intelligence

  4. arXiv:2406.09664  [pdf, other

    cs.SD eess.AS

    Frequency-mix Knowledge Distillation for Fake Speech Detection

    Authors: Cunhang Fan, Shunbo Dong, Jun Xue, Yujie Chen, Jiangyan Yi, Zhao Lv

    Abstract: In the telephony scenarios, the fake speech detection (FSD) task to combat speech spoofing attacks is challenging. Data augmentation (DA) methods are considered effective means to address the FSD task in telephony scenarios, typically divided into time domain and frequency domain stages. While each has its advantages, both can result in information loss. To tackle this issue, we propose a novel DA… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  5. arXiv:2406.09133  [pdf

    cs.CL

    RH-SQL: Refined Schema and Hardness Prompt for Text-to-SQL

    Authors: Jiawen Yi, Guo Chen, Zixiang Shen

    Abstract: Text-to-SQL is a technology that converts natural language queries into the structured query language SQL. A novel research approach that has recently gained attention focuses on methods based on the complexity of SQL queries, achieving notable performance improvements. However, existing methods entail significant storage and training costs, which hampers their practical application. To address th… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 4 pages, 2 figures, 2024 6th International Conference on Electronic Engineering and Informatics (EEI 2024)

  6. arXiv:2406.08863  [pdf, other

    cs.IR cs.AI cs.CV

    Self-supervised Graph Neural Network for Mechanical CAD Retrieval

    Authors: Yuhan Quan, Huan Zhao, **feng Yi, Yuqiang Chen

    Abstract: CAD (Computer-Aided Design) plays a crucial role in mechanical industry, where large numbers of similar-shaped CAD parts are often created. Efficiently reusing these parts is key to reducing design and production costs for enterprises. Retrieval systems are vital for achieving CAD reuse, but the complex shapes of CAD models are difficult to accurately describe using text or keywords, making tradit… ▽ More

    Submitted 17 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  7. arXiv:2406.06999  [pdf, other

    cs.CV

    Teaching with Uncertainty: Unleashing the Potential of Knowledge Distillation in Object Detection

    Authors: Junfei Yi, Jianxu Mao, Tengfei Liu, Mingjie Li, Hanyu Gu, Hui Zhang, Xiaojun Chang, Yaonan Wang

    Abstract: Knowledge distillation (KD) is a widely adopted and effective method for compressing models in object detection tasks. Particularly, feature-based distillation methods have shown remarkable performance. Existing approaches often ignore the uncertainty in the teacher model's knowledge, which stems from data noise and imperfect training. This limits the student model's ability to learn latent knowle… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  8. arXiv:2406.06086  [pdf, other

    cs.SD eess.AS

    RawBMamba: End-to-End Bidirectional State Space Model for Audio Deepfake Detection

    Authors: Yujie Chen, Jiangyan Yi, Jun Xue, Chenglong Wang, Xiaohui Zhang, Shunbo Dong, Siding Zeng, Jianhua Tao, Lv Zhao, Cunhang Fan

    Abstract: Fake artefacts for discriminating between bonafide and fake audio can exist in both short- and long-range segments. Therefore, combining local and global feature information can effectively discriminate between bonafide and fake audio. This paper proposes an end-to-end bidirectional state space model, named RawBMamba, to capture both short- and long-range discriminative information for audio deepf… ▽ More

    Submitted 18 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  9. arXiv:2406.04840  [pdf, other

    cs.SD eess.AS

    TraceableSpeech: Towards Proactively Traceable Text-to-Speech with Watermarking

    Authors: Junzuo Zhou, Jiangyan Yi, Tao Wang, Jianhua Tao, Ye Bai, Chu Yuan Zhang, Yong Ren, Zhengqi Wen

    Abstract: Various threats posed by the progress in text-to-speech (TTS) have prompted the need to reliably trace synthesized speech. However, contemporary approaches to this task involve adding watermarks to the audio separately after generation, a process that hurts both speech quality and watermark imperceptibility. In addition, these approaches are limited in robustness and flexibility. To address these… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: acceped by interspeech 2024

  10. arXiv:2406.03411  [pdf, other

    cs.CV

    Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach

    Authors: Saehyung Lee, Sangwon Yu, Junsung Park, Jihun Yi, Sungroh Yoon

    Abstract: In this paper, we primarily address the issue of dialogue-form context query within the interactive text-to-image retrieval task. Our methodology, PlugIR, actively utilizes the general instruction-following capability of LLMs in two ways. First, by reformulating the dialogue-form context, we eliminate the necessity of fine-tuning a retrieval model on existing visual dialogue data, thereby enabling… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: To appear in ACL 2024 Main

  11. arXiv:2405.08596   

    cs.SD eess.AS

    EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark

    Authors: Xiaohui Zhang, Jiangyan Yi, Jianhua Tao

    Abstract: The rise of advanced large language models such as GPT-4, GPT-4o, and the Claude family has made fake audio detection increasingly challenging. Traditional fine-tuning methods struggle to keep pace with the evolving landscape of synthetic speech, necessitating continual learning approaches that can adapt to new audio while retaining the ability to detect older types. Continual learning, which acts… ▽ More

    Submitted 15 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: This paper need more modification

  12. arXiv:2405.03917  [pdf, other

    cs.LG

    KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization

    Authors: Tianyi Zhang, Jonah Yi, Zhaozhuo Xu, Anshumali Shrivastava

    Abstract: Efficient deployment of Large Language Models (LLMs) requires batching multiple requests together to improve throughput. As the batch size, context length, or model size increases, the size of the key and value (KV) cache can quickly become the main contributor to GPU memory usage and the bottleneck of inference latency. Quantization has emerged as an effective technique for KV cache compression,… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  13. arXiv:2404.17113  [pdf, other

    cs.LG cs.HC

    MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition

    Authors: Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, **ming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao

    Abstract: Multimodal emotion recognition is an important research topic in artificial intelligence. Over the past few decades, researchers have made remarkable progress by increasing dataset size and building more effective architectures. However, due to various reasons (such as complex environments and inaccurate annotations), current systems are hard to meet the demands of practical applications. Therefor… ▽ More

    Submitted 23 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  14. arXiv:2404.16346  [pdf, other

    eess.IV cs.AI cs.CV

    Light-weight Retinal Layer Segmentation with Global Reasoning

    Authors: Xiang He, Weiye Song, Yiming Wang, Fabio Poiesi, Ji Yi, Manishi Desai, Quanqing Xu, Kongzheng Yang, Yi Wan

    Abstract: Automatic retinal layer segmentation with medical images, such as optical coherence tomography (OCT) images, serves as an important tool for diagnosing ophthalmic diseases. However, it is challenging to achieve accurate segmentation due to low contrast and blood flow noises presented in the images. In addition, the algorithm should be light-weight to be deployed for practical clinical applications… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: IEEE Transactions on Instrumentation & Measurement

  15. arXiv:2404.14687  [pdf, other

    cs.MM cs.AI cs.CL cs.CV

    Pegasus-v1 Technical Report

    Authors: Raehyuk Jung, Hyojun Go, Jaehyuk Yi, Jiho Jang, Daniel Kim, Jay Suh, Aiden Lee, Cooper Han, Jae Lee, Jeff Kim, **-Young Kim, Junwan Kim, Kyle Park, Lucas Lee, Mars Ha, Minjoon Seo, Abraham Jo, Ed Park, Hassan Kianinejad, SJ Kim, Tony Moon, Wade Jeong, Andrei Popescu, Esther Kim, EK Yoon , et al. (19 additional authors not shown)

    Abstract: This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's archi… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  16. arXiv:2404.12455  [pdf, ps, other

    cs.RO

    Contingency Model Predictive Control for Bipedal Locomotion on Moving Surfaces with a Linear Inverted Pendulum Model

    Authors: Kuo Chen, Xinyan Huang, Xunjie Chen, **gang Yi

    Abstract: Gait control of legged robotic walkers on dynamically moving surfaces (e.g., ships and vehicles) is challenging due to the limited balance control actuation and unknown surface motion. We present a contingent model predictive control (CMPC) for bipedal walker locomotion on moving surfaces with a linear inverted pendulum (LIP) model. The CMPC is a robust design that is built on regular model predic… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 2024 American Control Conference (ACC 2024)

  17. arXiv:2404.05950  [pdf, other

    cs.LG cs.AI cs.RO

    Efficient Multi-Task Reinforcement Learning via Task-Specific Action Correction

    Authors: **yuan Feng, Min Chen, Zhiqiang Pu, Tenghai Qiu, Jianqiang Yi

    Abstract: Multi-task reinforcement learning (MTRL) demonstrate potential for enhancing the generalization of a robot, enabling it to perform multiple tasks concurrently. However, the performance of MTRL may still be susceptible to conflicts between tasks and negative interference. To facilitate efficient MTRL, we propose Task-Specific Action Correction (TSAC), a general and complementary approach designed f… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  18. arXiv:2404.00667  [pdf, other

    cs.CV

    Weakly-Supervised Cross-Domain Segmentation of Electron Microscopy with Sparse Point Annotation

    Authors: Dafei Qiu, Shan Xiong, Jia** Yi, Jialin Peng

    Abstract: Accurate segmentation of organelle instances from electron microscopy (EM) images plays an essential role in many neuroscience researches. However, practical scenarios usually suffer from high annotation costs, label scarcity, and large domain diversity. While unsupervised domain adaptation (UDA) that assumes no annotation effort on the target data is promising to alleviate these challenges, its p… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  19. arXiv:2403.18057  [pdf, other

    cs.AI

    Prioritized League Reinforcement Learning for Large-Scale Heterogeneous Multiagent Systems

    Authors: Qingxu Fu, Zhiqiang Pu, Min Chen, Tenghai Qiu, Jianqiang Yi

    Abstract: Large-scale heterogeneous multiagent systems feature various realistic factors in the real world, such as agents with diverse abilities and overall system cost. In comparison to homogeneous systems, heterogeneous systems offer significant practical advantages. Nonetheless, they also present challenges for multiagent reinforcement learning, including addressing the non-stationary problem and managi… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  20. arXiv:2403.18056  [pdf, other

    cs.AI

    Self-Clustering Hierarchical Multi-Agent Reinforcement Learning with Extensible Cooperation Graph

    Authors: Qingxu Fu, Tenghai Qiu, Jianqiang Yi, Zhiqiang Pu, Xiaolin Ai

    Abstract: Multi-Agent Reinforcement Learning (MARL) has been successful in solving many cooperative challenges. However, classic non-hierarchical MARL algorithms still cannot address various complex multi-agent problems that require hierarchical cooperative behaviors. The cooperative knowledge and policies learned in non-hierarchical algorithms are implicit and not interpretable, thereby restricting the int… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  21. arXiv:2403.17863  [pdf, other

    cs.DC

    An AI-Native Runtime for Multi-Wearable Environments

    Authors: Chulhong Min, Utku Günay Acer, SiYoung Jang, Sangwon Choi, Diana A. Vasile, Taesik Gong, Juheon Yi, Fahim Kawsar

    Abstract: The miniaturization of AI accelerators is paving the way for next-generation wearable applications within wearable technologies. We introduce Mojito, an AI-native runtime with advanced MLOps designed to facilitate the development and deployment of these applications on wearable devices. It emphasizes the necessity of dynamic orchestration of distributed resources equipped with ultra-low-power AI a… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 7 pages, 4 figures

  22. arXiv:2403.07598  [pdf, other

    cs.CV

    Mondrian: On-Device High-Performance Video Analytics with Compressive Packed Inference

    Authors: Changmin Jeon, Seonjun Kim, Juheon Yi, Youngki Lee

    Abstract: In this paper, we present Mondrian, an edge system that enables high-performance object detection on high-resolution video streams. Many lightweight models and system optimization techniques have been proposed for resource-constrained devices, but they do not fully utilize the potential of the accelerators over dynamic, high-resolution videos. To enable such capability, we devise a novel Compressi… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  23. arXiv:2403.06072   

    cs.IT eess.SP

    Channel Estimation Considerate Precoder Design for Multi-user Massive MIMO-OFDM Systems: The Concept and Fast Algorithms

    Authors: Liu Junkai, Jiang Yi

    Abstract: The sixth-generation (6G) communication networks target peak data rates exceeding 1Tbps, necessitating base stations (BS) to support up to 100 simultaneous data streams. However, sparse pilot allocation to accommodate such streams poses challenges for users' channel estimation. This paper presents Channel Estimation Considerate Precoding (CECP), where BS precoders prioritize facilitating channel e… ▽ More

    Submitted 7 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: The work is supported by HUAWEI cooperation, which is related to the current HUAWEI project. HUAWEI cooperation requires to withdraw the paper

  24. arXiv:2403.05125  [pdf, other

    cs.CV cs.AI

    Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis

    Authors: Muxi Chen, Yi Liu, Jian Yi, Changran Xu, Qiuxia Lai, Hongliang Wang, Tsung-Yi Ho, Qiang Xu

    Abstract: In this paper, we present an empirical study introducing a nuanced evaluation framework for text-to-image (T2I) generative models, applied to human image synthesis. Our framework categorizes evaluations into two distinct groups: first, focusing on image qualities such as aesthetics and realism, and second, examining text conditions through concept coverage and fairness. We introduce an innovative… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  25. arXiv:2403.03460  [pdf, ps, other

    cs.RO

    Foot Shape-Dependent Resistive Force Model for Bipedal Walkers on Granular Terrains

    Authors: Xunjie Chen, Aditya Anikode, **gang Yi, Tao Liu

    Abstract: Legged robots have demonstrated high efficiency and effectiveness in unstructured and dynamic environments. However, it is still challenging for legged robots to achieve rapid and efficient locomotion on deformable, yielding substrates, such as granular terrains. We present an enhanced resistive force model for bipedal walkers on soft granular terrains by introducing effective intrusion depth corr… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: ICRA 2024

  26. arXiv:2403.03408  [pdf, other

    cs.CV

    Scene Depth Estimation from Traditional Oriental Landscape Paintings

    Authors: Sungho Kang, YeongHyeon Park, Hyunkyu Park, Juneho Yi

    Abstract: Scene depth estimation from paintings can streamline the process of 3D sculpture creation so that visually impaired people appreciate the paintings with tactile sense. However, measuring depth of oriental landscape painting images is extremely challenging due to its unique method of depicting depth and poor preservation. To address the problem of scene depth estimation from oriental landscape pain… ▽ More

    Submitted 6 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  27. arXiv:2403.03105  [pdf, ps, other

    cs.RO

    Biomechanical Comparison of Human Walking Locomotion on Solid Ground and Sand

    Authors: Chunchu Zhu, Xunjie Chen, **gang Yi

    Abstract: Current studies on human locomotion focus mainly on solid ground walking conditions. In this paper, we present a biomechanic comparison of human walking locomotion on solid ground and sand. A novel dataset containing 3-dimensional motion and biomechanical data from 20 able-bodied adults for locomotion on solid ground and sand is collected. We present the data collection methods and report the sens… ▽ More

    Submitted 28 June, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 10 pages, 10 figures, submitted to the Journal of Biomechanical Engineering

  28. arXiv:2403.02617  [pdf, ps, other

    cs.RO

    A Reduced-Order Resistive Force Model for Robotic Foot-Mud Interactions

    Authors: Xunjie Chen, **gang Yi, Jerry Shan

    Abstract: Legged robots are well-suited for broad exploration tasks in complex environments with yielding terrain. Understanding robotic foot-terrain interactions is critical for safe locomotion and walking efficiency for legged robots. This paper presents a reduced-order resistive-force model for robotic-foot/mud interactions. We focus on vertical robot locomotion on mud and propose a visco-elasto-plastic… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: AIM 2024

  29. arXiv:2403.01273  [pdf, other

    cs.LG cs.AI cs.CL

    NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

    Authors: Tianyi Zhang, Jonah Wonkyu Yi, Bowen Yao, Zhaozhuo Xu, Anshumali Shrivastava

    Abstract: Large language model inference on Central Processing Units (CPU) is challenging due to the vast quantities of expensive Multiply-Add (MAD) matrix operations in the attention computations. In this paper, we argue that there is a rare gem in modern CPUs, Single-Instruction-Multiple-Data (SIMD) registers, which allow for ultra-low-latency lookups in batch. We leverage this unique capability of CPUs t… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  30. Complete and Near-Optimal Robotic Crack Coverage and Filling in Civil Infrastructure

    Authors: Vishnu Veeraraghavan, Kyle Hunte, **gang Yi, Kaiyan Yu

    Abstract: We present a simultaneous sensor-based inspection and footprint coverage (SIFC) planning and control design with applications to autonomous robotic crack map** and filling. The main challenge of the SIFC problem lies in the coupling of complete sensing (for map**) and robotic footprint (for filling) coverage tasks. Initially, we assume known target information (e.g., crack) and employ classic… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Journal ref: in IEEE Transactions on Robotics, vol. 40, pp. 2850-2867, 2024

  31. arXiv:2402.12765  [pdf, other

    cs.CV

    GOOD: Towards Domain Generalized Orientated Object Detection

    Authors: Qi Bi, Beichen Zhou, **gjun Yi, Wei Ji, Haolan Zhan, Gui-Song Xia

    Abstract: Oriented object detection has been rapidly developed in the past few years, but most of these methods assume the training and testing images are under the same statistical distribution, which is far from reality. In this paper, we propose the task of domain generalized oriented object detection, which intends to explore the generalization of oriented object detectors on arbitrary unseen target dom… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 8 pages, 6 figures

  32. arXiv:2402.10055  [pdf

    eess.IV cs.AI cs.CV

    Robust semi-automatic vessel tracing in the human retinal image by an instance segmentation neural network

    Authors: Siyi Chen, Amir H. Kashani, Ji Yi

    Abstract: The morphology and hierarchy of the vascular systems are essential for perfusion in supporting metabolism. In human retina, one of the most energy-demanding organs, retinal circulation nourishes the entire inner retina by an intricate vasculature emerging and remerging at the optic nerve head (ONH). Thus, tracing the vascular branching from ONH through the vascular tree can illustrate vascular hie… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  33. arXiv:2401.14132  [pdf, other

    cs.CV cs.DC

    Enabling Cross-Camera Collaboration for Video Analytics on Distributed Smart Cameras

    Authors: Chulhong Min, Juheon Yi, Utku Gunay Acer, Fahim Kawsar

    Abstract: Overlap** cameras offer exciting opportunities to view a scene from different angles, allowing for more advanced, comprehensive and robust analysis. However, existing visual analytics systems for multi-camera streams are mostly limited to (i) per-camera processing and aggregation and (ii) workload-agnostic centralized processing architectures. In this paper, we present Argus, a distributed video… ▽ More

    Submitted 26 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: 18 pages, under review

  34. arXiv:2401.11257  [pdf, other

    cs.MA cs.AI

    Measuring Policy Distance for Multi-Agent Reinforcement Learning

    Authors: Tianyi Hu, Zhiqiang Pu, Xiaolin Ai, Tenghai Qiu, Jianqiang Yi

    Abstract: Diversity plays a crucial role in improving the performance of multi-agent reinforcement learning (MARL). Currently, many diversity-based methods have been developed to overcome the drawbacks of excessive parameter sharing in traditional MARL. However, there remains a lack of a general metric to quantify policy differences among agents. Such a metric would not only facilitate the evaluation of the… ▽ More

    Submitted 28 January, 2024; v1 submitted 20 January, 2024; originally announced January 2024.

    Comments: 9 pages, 6 figures

  35. arXiv:2401.08860  [pdf, other

    cs.CV

    Cross-Level Multi-Instance Distillation for Self-Supervised Fine-Grained Visual Categorization

    Authors: Qi Bi, Wei Ji, **gjun Yi, Haolan Zhan, Gui-Song Xia

    Abstract: High-quality annotation of fine-grained visual categories demands great expert knowledge, which is taxing and time consuming. Alternatively, learning fine-grained visual representation from enormous unlabeled images (e.g., species, brands) by self-supervised learning becomes a feasible solution. However, recent researches find that existing self-supervised learning methods are less qualified to re… ▽ More

    Submitted 26 February, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: work in progress

  36. arXiv:2401.03650  [pdf, other

    eess.AS cs.SD eess.SP

    DDD: A Perceptually Superior Low-Response-Time DNN-based Declipper

    Authors: Jayeon Yi, Junghyun Koo, Kyogu Lee

    Abstract: Clip** is a common nonlinear distortion that occurs whenever the input or output of an audio system exceeds the supported range. This phenomenon undermines not only the perception of speech quality but also downstream processes utilizing the disrupted signal. Therefore, a real-time-capable, robust, and low-response-time method for speech declip** (SD) is desired. In this work, we introduce DDD… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: To appear, ICASSP 2024. Demo samples at https://stet-stet.github.io/DDD, repo at https://github.com/stet-stet/DDD

  37. arXiv:2401.03488  [pdf, other

    cs.LG cs.CR eess.SP

    Data-Driven Subsampling in the Presence of an Adversarial Actor

    Authors: Abu Shafin Mohammad Mahdee Jameel, Ahmed P. Mohamed, **ho Yi, Aly El Gamal, Akshay Malhotra

    Abstract: Deep learning based automatic modulation classification (AMC) has received significant attention owing to its potential applications in both military and civilian use cases. Recently, data-driven subsampling techniques have been utilized to overcome the challenges associated with computational complexity and training time for AMC. Beyond these direct advantages of data-driven subsampling, these me… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: Accepted for publication at ICMLCN 2024

  38. arXiv:2312.14197  [pdf, other

    cs.CL cs.AI

    Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models

    Authors: **gwei Yi, Yueqi Xie, Bin Zhu, Emre Kiciman, Guangzhong Sun, Xing Xie, Fangzhao Wu

    Abstract: The integration of large language models (LLMs) with external content has enabled more up-to-date and wide-ranging applications of LLMs, such as Microsoft Copilot. However, this integration has also exposed LLMs to the risk of indirect prompt injection attacks, where an attacker can embed malicious instructions within external content, compromising LLM output and causing responses to deviate from… ▽ More

    Submitted 8 March, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  39. arXiv:2312.10155  [pdf, ps, other

    cs.RO eess.SY

    Gaussian Process-Based Learning Control of Underactuated Balance Robots with an External and Internal Convertible Modeling Structure

    Authors: Feng Han, **gang Yi

    Abstract: External and internal convertible (EIC) form-based motion control is one of the effective designs of simultaneously trajectory tracking and balance for underactuated balance robots. Under certain conditions, the EIC-based control design however leads to uncontrolled robot motion. We present a Gaussian process (GP)-based data-driven learning control for underactuated balance robots with the EIC mod… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  40. arXiv:2312.09651  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection

    Authors: Xiaohui Zhang, Jiangyan Yi, Chenglong Wang, Chuyuan Zhang, Siding Zeng, Jianhua Tao

    Abstract: The rapid evolution of speech synthesis and voice conversion has raised substantial concerns due to the potential misuse of such technology, prompting a pressing need for effective audio deepfake detection mechanisms. Existing detection models have shown remarkable success in discriminating known deepfake audio, but struggle when encountering new attack types. To address this challenge, one of the… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted by the main track The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

  41. arXiv:2312.06632  [pdf, other

    cs.AI

    Control Risk for Potential Misuse of Artificial Intelligence in Science

    Authors: Jiyan He, Weitao Feng, Yaosen Min, **gwei Yi, Kunsheng Tang, Shuai Li, Jie Zhang, Kejiang Chen, Wenbo Zhou, Xing Xie, Weiming Zhang, Nenghai Yu, Shuxin Zheng

    Abstract: The expanding application of Artificial Intelligence (AI) in scientific fields presents unprecedented opportunities for discovery and innovation. However, this growth is not without risks. AI models in science, if misused, can amplify risks like creation of harmful substances, or circumvention of established regulations. In this study, we aim to raise awareness of the dangers of AI misuse in scien… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  42. arXiv:2312.06172  [pdf, other

    cs.CL

    Decoupling SQL Query Hardness Parsing for Text-to-SQL

    Authors: Jiawen Yi, Guo Chen

    Abstract: The fundamental goal of the Text-to-SQL task is to translate natural language question into SQL query. Current research primarily emphasizes the information coupling between natural language questions and schemas, and significant progress has been made in this area. The natural language questions as the primary task requirements source determines the hardness of correspond SQL queries, the correla… ▽ More

    Submitted 29 December, 2023; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: 10 pages, 3 figures

  43. arXiv:2311.13687  [pdf, other

    cs.LG cs.MM cs.SD eess.AS

    Beat-Aligned Spectrogram-to-Sequence Generation of Rhythm-Game Charts

    Authors: Jayeon Yi, Sungho Lee, Kyogu Lee

    Abstract: In the heart of "rhythm games" - games where players must perform actions in sync with a piece of music - are "charts", the directives to be given to players. We newly formulate chart generation as a sequence generation task and train a Transformer using a large dataset. We also introduce tempo-informed preprocessing and training procedures, some of which are suggested to be integral for a success… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: ISMIR 2023 LBD. Demo videos and code at stet-stet.github.io/goct

  44. arXiv:2311.07613  [pdf

    eess.SY cs.LG math.DS

    A Physics-informed Machine Learning-based Control Method for Nonlinear Dynamic Systems with Highly Noisy Measurements

    Authors: Mason Ma, Jiajie Wu, Chase Post, Tony Shi, **gang Yi, Tony Schmitz, Hong Wang

    Abstract: This study presents a physics-informed machine learning-based control method for nonlinear dynamic systems with highly noisy measurements. Existing data-driven control methods that use machine learning for system identification cannot effectively cope with highly noisy measurements, resulting in unstable control performance. To address this challenge, the present study extends current physics-info… ▽ More

    Submitted 11 November, 2023; originally announced November 2023.

  45. arXiv:2311.03965  [pdf, other

    cs.CV

    Fast Sun-aligned Outdoor Scene Relighting based on TensoRF

    Authors: Yeon** Chang, Yearim Kim, Seunghyeon Seo, Jung Yi, Nojun Kwak

    Abstract: In this work, we introduce our method of outdoor scene relighting for Neural Radiance Fields (NeRF) named Sun-aligned Relighting TensoRF (SR-TensoRF). SR-TensoRF offers a lightweight and rapid pipeline aligned with the sun, thereby achieving a simplified workflow that eliminates the need for environment maps. Our sun-alignment strategy is motivated by the insight that shadows, unlike viewpoint-dep… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  46. arXiv:2311.02536  [pdf, other

    cs.CV

    Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models

    Authors: **gru Yi, Burak Uzkent, Oana Ignat, Zili Li, Amanmeet Garg, Xiang Yu, Linda Liu

    Abstract: Grounding-based vision and language models have been successfully applied to low-level vision tasks, aiming to precisely locate objects referred in captions. The effectiveness of grounding representation learning heavily relies on the scale of the training dataset. Despite being a useful data enrichment strategy, data augmentation has received minimal attention in existing vision and language task… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: Accepted to WACV2024

  47. arXiv:2310.19468  [pdf, other

    cs.LG cs.MA stat.ML

    Regret-Minimization Algorithms for Multi-Agent Cooperative Learning Systems

    Authors: Jialin Yi

    Abstract: A Multi-Agent Cooperative Learning (MACL) system is an artificial intelligence (AI) system where multiple learning agents work together to complete a common task. Recent empirical success of MACL systems in various domains (e.g. traffic control, cloud computing, robotics) has sparked active research into the design and analysis of MACL systems for sequential decision making problems. One important… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Thesis submitted to London School of Economics and Political Science for PhD in Statistics

  48. arXiv:2310.18701  [pdf, other

    cs.LG

    Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards

    Authors: Bo Xue, Yimu Wang, Yuanyu Wan, **feng Yi, Lijun Zhang

    Abstract: This paper investigates the problem of generalized linear bandits with heavy-tailed rewards, whose $(1+ε)$-th moment is bounded for some $ε\in (0,1]$. Although there exist methods for generalized linear bandits, most of them focus on bounded or sub-Gaussian rewards and are not well-suited for many real-world scenarios, such as financial markets and web-advertising. To address this issue, we propos… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  49. arXiv:2310.18303  [pdf, other

    cs.RO cs.AI

    Socially Cognizant Robotics for a Technology Enhanced Society

    Authors: Kristin J. Dana, Clinton Andrews, Kostas Bekris, Jacob Feldman, Matthew Stone, Pernille Hemmer, Aaron Mazzeo, Hal Salzman, **gang Yi

    Abstract: Emerging applications of robotics, and concerns about their impact, require the research community to put human-centric objectives front-and-center. To meet this challenge, we advocate an interdisciplinary approach, socially cognizant robotics, which synthesizes technical and social science methods. We argue that this approach follows from the need to empower stakeholder participation (from synchr… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

  50. arXiv:2310.11665  [pdf, other

    cs.RO

    Forward Kinematics of Object Transporting by a Multi-Robot System with a Deformable Sheet

    Authors: Jiawei Hu, Wenhang Liu, **gang Yi, Zhenhua Xiong

    Abstract: We present object handling and transporting by a multi-robot team with a deformable sheet as a carrier. Due to the deformability of the sheet and the high dimension of the whole system, it is challenging to clearly describe all the possible positions of the object on the sheet for a given formation of the multi-robot system. A complete forward kinematics (FK) method is proposed for object handling… ▽ More

    Submitted 23 January, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: 8 pages, 6 figures, has been submitted to IEEE Robotics and Automation Letters