Skip to main content

Showing 1–50 of 119 results for author: Ye, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.18021  [pdf, other

    cs.SD cs.LG eess.AS

    SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR

    Authors: Shuaishuai Ye, Shunfei Chen, Xinhui Hu, Xinkang Xu

    Abstract: In this work, we propose a Switch-Conformer-based MoE system named SC-MoE for unified streaming and non-streaming code-switching (CS) automatic speech recognition (ASR), where we design a streaming MoE layer consisting of three language experts, which correspond to Mandarin, English, and blank, respectively, and equipped with a language identification (LID) network with a Connectionist Temporal Cl… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024; 5 pages, 2 figures

  2. arXiv:2406.11813  [pdf, other

    cs.CL

    How Do Large Language Models Acquire Factual Knowledge During Pretraining?

    Authors: Hoyeon Chang, **ho Park, Seonghyeon Ye, Sohee Yang, Youngkyung Seo, Du-Seong Chang, Minjoon Seo

    Abstract: Despite the recent observation that large language models (LLMs) can store substantial factual knowledge, there is a limited understanding of the mechanisms of how they acquire factual knowledge through pretraining. This work addresses this gap by studying how LLMs acquire factual knowledge during pretraining. The findings reveal several important insights into the dynamics of factual knowledge ac… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    ACM Class: I.2.7

  3. arXiv:2406.08418  [pdf, other

    cs.CV cs.AI

    OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    Authors: Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang **, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang , et al. (15 additional authors not shown)

    Abstract: Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale an… ▽ More

    Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  4. arXiv:2406.05761  [pdf, other

    cs.CL

    The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

    Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Gui** Son, Ye** Cho, Sheikh Shafayat, **heon Baek, Sue Hyun Park, Hyeonbin Hwang, **kyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

    Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Work in Progress

  5. arXiv:2405.17245  [pdf, other

    cs.DC cs.AI cs.LG cs.NI

    Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference

    Authors: Shengyuan Ye, Jiangsu Du, Liekang Zeng, Wenzhong Ou, Xiaowen Chu, Yutong Lu, Xu Chen

    Abstract: Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home. Traditional deployment approaches offload the inference workloads to the remote cloud server, which would induce substantial pressure on the backbone network as well as raise users' privacy concerns. To address that, in-situ inference has been recently recogniz… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE International Conference on Computer Communications 2024

  6. arXiv:2405.05498  [pdf, other

    cs.SD eess.AS

    The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge

    Authors: **gguang Tian, Shuaishuai Ye, Shunfei Chen, Yang Xiang, Zhaohui Yin, Xinhui Hu, Xinkang Xu

    Abstract: This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58\% compared to the official baseline on t… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  7. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  8. arXiv:2405.03567  [pdf, other

    cs.SD cs.AI eess.AS

    Deep Space Separable Distillation for Lightweight Acoustic Scene Classification

    Authors: ShuQi Ye, Yuan Tian

    Abstract: Acoustic scene classification (ASC) is highly important in the real world. Recently, deep learning-based methods have been widely employed for acoustic scene classification. However, these methods are currently not lightweight enough as well as their performance is not satisfactory. To solve these problems, we propose a deep space separable distillation network. Firstly, the network performs high-… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  9. arXiv:2404.17766  [pdf, other

    cs.LG cs.AI cs.DC cs.NI

    Implementation of Big AI Models for Wireless Networks with Collaborative Edge Computing

    Authors: Liekang Zeng, Shengyuan Ye, Xu Chen, Yang Yang

    Abstract: Big Artificial Intelligence (AI) models have emerged as a crucial element in various intelligent applications at the edge, such as voice assistants in smart homes and autonomous robotics in smart factories. Training big AI models, e.g., for personalized fine-tuning and continual model refinement, poses significant challenges to edge devices due to the inherent conflict between limited computing re… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  10. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang **, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  11. arXiv:2404.16418  [pdf, other

    cs.CL

    Instruction Matters, a Simple yet Effective Task Selection Approach in Instruction Tuning for Specific Tasks

    Authors: Changho Lee, Janghoon Han, Seonghyeon Ye, Stanley Jungkyu Choi, Honglak Lee, Kyunghoon Bae

    Abstract: Instruction tuning has shown its ability to not only enhance zero-shot generalization across various tasks but also its effectiveness in improving the performance of specific tasks. A crucial aspect in instruction tuning for a particular task is a strategic selection of related tasks that offer meaningful supervision, thereby enhancing efficiency and preventing performance degradation from irrelev… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 21 pages, 6 figures, 16 tables

  12. arXiv:2404.10346  [pdf, other

    cs.CL

    Self-Explore to Avoid the Pit: Improving the Reasoning Capabilities of Language Models with Fine-grained Rewards

    Authors: Hyeonbin Hwang, Doyoung Kim, Seungone Kim, Seonghyeon Ye, Minjoon Seo

    Abstract: Training on large amounts of rationales (i.e., CoT Fine-tuning) is effective at improving the reasoning capabilities of large language models (LLMs). However, acquiring human-authored rationales or augmenting rationales from proprietary models is costly and not scalable. In this paper, we study the problem of whether LLMs could self-improve their reasoning capabilities. To this end, we propose Sel… ▽ More

    Submitted 16 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: Preprint Under Review

  13. arXiv:2403.12982  [pdf

    cond-mat.mtrl-sci cs.LG physics.chem-ph

    Knowledge-Reuse Transfer Learning Methods in Molecular and Material Science

    Authors: An Chen, Zhilong Wang, Karl Luigi Loza Vidaurre, Yanqiang Han, Simin Ye, Kehao Tao, Shiwei Wang, **g Gao, **** Li

    Abstract: Molecules and materials are the foundation for the development of modern advanced industries such as energy storage systems and semiconductor devices. However, traditional trial-and-error methods or theoretical calculations are highly resource-intensive, and extremely long R&D (Research and Development) periods cannot meet the urgent need for molecules/materials in industrial development. Machine… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 42 pages, 10 figures

  14. arXiv:2403.10809  [pdf, other

    cs.RO

    Efficient Trajectory Forecasting and Generation with Conditional Flow Matching

    Authors: Sean Ye, Matthew Gombolay

    Abstract: Trajectory prediction and generation are vital for autonomous robots navigating dynamic environments. While prior research has typically focused on either prediction or generation, our approach unifies these tasks to provide a versatile framework and achieve state-of-the-art performance. Diffusion models, which are currently state-of-the-art for learned trajectory generation in long-horizon planni… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  15. arXiv:2403.10794  [pdf, other

    cs.RO cs.LG cs.MA

    Diffusion-Reinforcement Learning Hierarchical Motion Planning in Adversarial Multi-agent Games

    Authors: Zixuan Wu, Sean Ye, Manisha Natarajan, Matthew C. Gombolay

    Abstract: Reinforcement Learning- (RL-)based motion planning has recently shown the potential to outperform traditional approaches from autonomous navigation to robot manipulation. In this work, we focus on a motion planning task for an evasive target in a partially observable multi-agent adversarial pursuit-evasion games (PEG). These pursuit-evasion problems are relevant to various applications, such as se… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: This work has been submitted to the IEEE Robotics and Automation Letters (RA-L) for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  16. arXiv:2403.05265  [pdf, other

    cs.AI

    MMoE: Robust Spoiler Detection with Multi-modal Information and Domain-aware Mixture-of-Experts

    Authors: Zinan Zeng, Sen Ye, Zijian Cai, Heng Wang, Yuhan Liu, Haokai Zhang, Minnan Luo

    Abstract: Online movie review websites are valuable for information and discussion about movies. However, the massive spoiler reviews detract from the movie-watching experience, making spoiler detection an important task. Previous methods simply focus on reviews' text content, ignoring the heterogeneity of information in the platform. For instance, the metadata and the corresponding user's information of a… ▽ More

    Submitted 13 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  17. arXiv:2402.17242  [pdf, other

    cs.SI cs.DB

    Scalable Community Search with Accuracy Guarantee on Attributed Graphs

    Authors: Yuxiang Wang, Shuzhan Ye, Xiaoliang Xu, Yuxia Geng, Zhenghe Zhao, Xiangyu Ke, Tianxing Wu

    Abstract: Given an attributed graph $G$ and a query node $q$, \underline{C}ommunity \underline{S}earch over \underline{A}ttributed \underline{G}raphs (CS-AG) aims to find a structure- and attribute-cohesive subgraph from $G$ that contains $q$. Although CS-AG has been widely studied, they still face three challenges. (1) Exact methods based on graph traversal are time-consuming, especially for large graphs.… ▽ More

    Submitted 29 February, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  18. arXiv:2402.14334  [pdf, other

    cs.CL

    INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models

    Authors: Hanseok Oh, Hyunji Lee, Seonghyeon Ye, Haebin Shin, Hansol Jang, Changwook Jun, Minjoon Seo

    Abstract: Despite the critical need to align search targets with users' intention, retrievers often only prioritize query information without delving into the users' intended search context. Enhancing the capability of retrievers to understand intentions and preferences of users, akin to language model instructions, has the potential to yield more aligned search targets. Prior studies restrict the applicati… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  19. arXiv:2401.13267  [pdf, other

    cs.CV

    Dual-modal Dynamic Traceback Learning for Medical Report Generation

    Authors: Shuchang Ye, Mingyuan Meng, Mingjian Li, Dagan Feng, **man Kim

    Abstract: With increasing reliance on medical imaging in clinical practices, automated report generation from medical images is in great demand. Existing report generation methods typically adopt an encoder-decoder deep learning framework to build a uni-directional image-to-report map**. However, such a framework ignores the bi-directional mutual associations between images and reports, thus incurring dif… ▽ More

    Submitted 6 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  20. arXiv:2312.15244  [pdf, ps, other

    cs.IT eess.SP

    Fluid Antenna Array Enhanced Over-the-Air Computation

    Authors: Deyou Zhang, Sicong Ye, Ming Xiao, Kezhi Wang, Marco Di Renzo, Mikael Skoglund

    Abstract: Over-the-air computation (AirComp) has emerged as a promising technology for fast wireless data aggregation by harnessing the superposition property of wireless multiple-access channels. This paper investigates a fluid antenna (FA) array-enhanced AirComp system, employing the new degrees of freedom achieved by antenna movements. Specifically, we jointly optimize the transceiver design and antenna… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

  21. arXiv:2312.04921  [pdf, other

    astro-ph.IM cs.DC

    Integrating the PanDA Workload Management System with the Vera C. Rubin Observatory

    Authors: Edward Karavakis, Wen Guan, Zhaoyu Yang, Tadashi Maeno, Torre Wenaus, Jennifer Adelman-McCarthy, Fernando Barreiro Megino, Kaushik De, Richard Dubois, Michelle Gower, Tim Jenness, Alexei Klimentov, Tatiana Korchuganova, Mikolaj Kowalik, Fa-Hui Lin, Paul Nilsson, Sergey Padolski, Wei Yang, Shuwei Ye

    Abstract: The Vera C. Rubin Observatory will produce an unprecedented astronomical data set for studies of the deep and dynamic universe. Its Legacy Survey of Space and Time (LSST) will image the entire southern sky every three to four days and produce tens of petabytes of raw image data and associated calibration data over the course of the experiment's run. More than 20 terabytes of data must be stored ev… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: 8 pages, 3 figures, 26th International Conference on Computing in High Energy & Nuclear Physics

  22. arXiv:2311.08106  [pdf, other

    cs.CL

    Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models

    Authors: Yu** Kim, Jaehong Yoon, Seonghyeon Ye, Sangmin Bae, Namgyu Ho, Sung Ju Hwang, Se-young Yun

    Abstract: The dynamic nature of knowledge in an ever-changing world presents challenges for language models trained on static data; the model in the real world often requires not only acquiring new knowledge but also overwriting outdated information into updated ones. To study the ability of language models for these time-dependent dynamics in human language, we introduce a novel task, EvolvingQA, a tempora… ▽ More

    Submitted 20 April, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: 15 pages, 10 figures, 5 tables; accepted to NAACL 2024

  23. arXiv:2311.04933  [pdf

    cs.CL cs.AI

    Evaluating Large Language Models in Ophthalmology

    Authors: Jason Holmes, Shuyuan Ye, Yiwei Li, Shi-Nan Wu, Zhengliang Liu, Zihao Wu, **yu Hu, Huan Zhao, Xi Jiang, Wei Liu, Hong Wei, Jie Zou, Tianming Liu, Yi Shao

    Abstract: Purpose: The performance of three different large language models (LLMS) (GPT-3.5, GPT-4, and PaLM2) in answering ophthalmology professional questions was evaluated and compared with that of three different professional populations (medical undergraduates, medical masters, and attending physicians). Methods: A 100-item ophthalmology single-choice test was administered to three different LLMs (GPT-… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  24. arXiv:2310.14049  [pdf, other

    cs.AR

    Post-Layout Simulation Driven Analog Circuit Sizing

    Authors: Xiaohan Gao, Haoyi Zhang, Siyuan Ye, Mingjie Liu, David Z. Pan, Linxiao Shen, Runsheng Wang, Yibo Lin, Ru Huang

    Abstract: Post-layout simulation provides accurate guidance for analog circuit design, but post-layout performance is hard to be directly optimized at early design stages. Prior work on analog circuit sizing often utilizes pre-layout simulation results as the optimization objective. In this work, we propose a post-layout-simulation-driven (post-simulation-driven for short) analog circuit sizing framework th… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

  25. arXiv:2310.04897  [pdf

    cs.CY cs.AI

    Generative AI May Prefer to Present National-level Characteristics of Cities Based on Stereotypical Geographic Impressions at the Continental Level

    Authors: Shan Ye

    Abstract: A simple experiment was conducted to test the ability of the Chinese-based generative artificial intelligence (AI) platform, Wenxin Yige, to render images of urban street views of different countries. The study found that images generated by this AI platform may contain continental-level stereotypes in terms of showing the level of economic development and modernization. Street view images generat… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: 9 pages, 3 figures

  26. arXiv:2310.00434  [pdf, other

    cs.CV cs.GR

    DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models

    Authors: Zhiyao Sun, Tian Lv, Sheng Ye, Matthieu Lin, Jenny Sheng, Yu-Hui Wen, Min**g Yu, Yong-** Liu

    Abstract: The generation of stylistic 3D facial animations driven by speech presents a significant challenge as it requires learning a many-to-many map** between speech, style, and the corresponding natural facial motion. However, existing methods either employ a deterministic model for speech-to-motion map** or encode the style using a one-hot encoding scheme. Notably, the one-hot encoding approach fai… ▽ More

    Submitted 14 May, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

    Comments: SIGGRAPH 2024 (Journal Track). Project page: https://diffposetalk.github.io/

  27. arXiv:2309.11043  [pdf, other

    cs.CV

    Score Mismatching for Generative Modeling

    Authors: Senmao Ye, Fei Liu

    Abstract: We propose a new score-based model with one-step sampling. Previously, score-based models were burdened with heavy computations due to iterative sampling. For substituting the iterative process, we train a standalone generator to compress all the time steps with the gradient backpropagated from the score network. In order to produce meaningful gradients for the generator, the score network is trai… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  28. arXiv:2309.08159  [pdf, other

    cs.CV cs.IR cs.LG

    AdSEE: Investigating the Impact of Image Style Editing on Advertisement Attractiveness

    Authors: Liyao Jiang, Chenglin Li, Haolan Chen, Xiaodong Gao, Xinwang Zhong, Yang Qiu, Shani Ye, Di Niu

    Abstract: Online advertisements are important elements in e-commerce sites, social media platforms, and search engines. With the increasing popularity of mobile browsing, many online ads are displayed with visual information in the form of a cover image in addition to text descriptions to grab the attention of users. Various recent studies have focused on predicting the click rates of online advertisements… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: Accepted to KDD 2023 Applied Data Science Track

  29. arXiv:2309.08097  [pdf, other

    cs.CV

    Detail Reinforcement Diffusion Model: Augmentation Fine-Grained Visual Categorization in Few-Shot Conditions

    Authors: Tianxu Wu, Shuo Ye, Shuhuang Chen, Qinmu Peng, Xinge You

    Abstract: The challenge in fine-grained visual categorization lies in how to explore the subtle differences between different subclasses and achieve accurate discrimination. Previous research has relied on large-scale annotated data and pre-trained deep models to achieve the objective. However, when only a limited amount of samples is available, similar methods may become less effective. Diffusion models ha… ▽ More

    Submitted 15 May, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

    Comments: Accepted by TETCI

  30. arXiv:2309.07640  [pdf, other

    cs.CV

    Indoor Scene Reconstruction with Fine-Grained Details Using Hybrid Representation and Normal Prior Enhancement

    Authors: Sheng Ye, Yubin Hu, Matthieu Lin, Yu-Hui Wen, Wang Zhao, Yong-** Liu, Wen** Wang

    Abstract: The reconstruction of indoor scenes from multi-view RGB images is challenging due to the coexistence of flat and texture-less regions alongside delicate and fine-grained regions. Recent methods leverage neural radiance fields aided by predicted surface normal priors to recover the scene geometry. These methods excel in producing complete and smooth results for floor and wall areas. However, they s… ▽ More

    Submitted 25 December, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

  31. arXiv:2309.06877  [pdf, other

    cs.CV cs.MM

    Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

    Authors: Zhenguang Liu, Xinyang Yu, Ruili Wang, Shuai Ye, Zhe Ma, Jianfeng Dong, Sifeng He, Feng Qian, Xiaobo Zhang, Roger Zimmermann, Lei Yang

    Abstract: The self-media era provides us tremendous high quality videos. Unfortunately, frequent video copyright infringements are now seriously damaging the interests and enthusiasm of video creators. Identifying infringing videos is therefore a compelling task. Current state-of-the-art methods tend to simply feed high-dimensional mixed video features into deep neural networks and count on the networks to… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: This paper is accepted by ACM MM 2023

  32. arXiv:2308.09591  [pdf, other

    cs.CV

    O$^2$-Recon: Completing 3D Reconstruction of Occluded Objects in the Scene with a Pre-trained 2D Diffusion Model

    Authors: Yubin Hu, Sheng Ye, Wang Zhao, Matthieu Lin, Yuze He, Yu-Hui Wen, Ying He, Yong-** Liu

    Abstract: Occlusion is a common issue in 3D reconstruction from RGB-D videos, often blocking the complete reconstruction of objects and presenting an ongoing problem. In this paper, we propose a novel framework, empowered by a 2D diffusion-based in-painting model, to reconstruct complete surfaces for the hidden parts of objects. Specifically, we utilize a pre-trained diffusion model to fill in the hidden ar… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

    Comments: AAAI 2024

  33. arXiv:2307.10928  [pdf, other

    cs.CL cs.AI

    FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

    Authors: Seonghyeon Ye, Doyoung Kim, Sungdong Kim, Hyeonbin Hwang, Seungone Kim, Yongrae Jo, James Thorne, Juho Kim, Minjoon Seo

    Abstract: Evaluation of Large Language Models (LLMs) is challenging because instruction-following necessitates alignment with human values and the required set of skills varies depending on the instruction. However, previous studies have mainly focused on coarse-grained evaluation (i.e. overall preference-based evaluation), which limits interpretability since it does not consider the nature of user instruct… ▽ More

    Submitted 14 April, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: ICLR 2024 Spotlight

  34. arXiv:2307.06244  [pdf, other

    cs.RO cs.LG cs.MA

    Diffusion Models for Multi-target Adversarial Tracking

    Authors: Sean Ye, Manisha Natarajan, Zixuan Wu, Matthew Gombolay

    Abstract: Target tracking plays a crucial role in real-world scenarios, particularly in drug-trafficking interdiction, where the knowledge of an adversarial target's location is often limited. Improving autonomous tracking systems will enable unmanned aerial, surface, and underwater vehicles to better assist in interdicting smugglers that use manned surface, semi-submersible, and aerial vessels. As unmanned… ▽ More

    Submitted 12 January, 2024; v1 submitted 12 July, 2023; originally announced July 2023.

  35. arXiv:2307.04858  [pdf, other

    cs.HC cs.CV q-bio.NC

    AmadeusGPT: a natural language interface for interactive animal behavioral analysis

    Authors: Shaokai Ye, Jessy Lauer, Mu Zhou, Alexander Mathis, Mackenzie W. Mathis

    Abstract: The process of quantifying and analyzing animal behavior involves translating the naturally occurring descriptive language of their actions into machine-readable code. Yet, codifying behavior analysis is often challenging without deep understanding of animal behavior and technical machine learning knowledge. To limit this gap, we introduce AmadeusGPT: a natural language interface that turns natura… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: demo available https://github.com/AdaptiveMotorControlLab/AmadeusGPT

    Journal ref: Published in Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS) 2023

  36. Augmenting Sports Videos with VisCommentator

    Authors: Chen Zhu-Tian, Shuainan Ye, Xiangtong Chu, Haijun Xia, Hui Zhang, Huamin Qu, Yingcai Wu

    Abstract: Visualizing data in sports videos is gaining traction in sports analytics, given its ability to communicate insights and explicate player strategies engagingly. However, augmenting sports videos with such data visualizations is challenging, especially for sports analysts, as it requires considerable expertise in video editing. To ease the creation process, we present a design space that characteri… ▽ More

    Submitted 10 May, 2024; v1 submitted 23 June, 2023; originally announced June 2023.

    Journal ref: IEEE Transactions on Visualization and Computer Graphics ( Volume: 28, Issue: 1, January 2022)

  37. arXiv:2306.12870  [pdf, other

    cs.SI

    HOFA: Twitter Bot Detection with Homophily-Oriented Augmentation and Frequency Adaptive Attention

    Authors: Sen Ye, Zhaoxuan Tan, Zhenyu Lei, Ruijie He, Hongrui Wang, Qinghua Zheng, Minnan Luo

    Abstract: Twitter bot detection has become an increasingly important and challenging task to combat online misinformation, facilitate social content moderation, and safeguard the integrity of social platforms. Though existing graph-based Twitter bot detection methods achieved state-of-the-art performance, they are all based on the homophily assumption, which assumes users with the same label are more likely… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

    Comments: 11 pages, 7 figures

  38. arXiv:2306.11301  [pdf, other

    cs.LG cs.AI cs.RO

    Adversarial Search and Tracking with Multiagent Reinforcement Learning in Sparsely Observable Environment

    Authors: Zixuan Wu, Sean Ye, Manisha Natarajan, Letian Chen, Rohan Paleja, Matthew C. Gombolay

    Abstract: We study a search and tracking (S&T) problem where a team of dynamic search agents must collaborate to track an adversarial, evasive agent. The heterogeneous search team may only have access to a limited number of past adversary trajectories within a large search space. This problem is challenging for both model-based searching and reinforcement learning (RL) methods since the adversary exhibits r… ▽ More

    Submitted 20 October, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: Accepted by IEEE International Symposium on Multi-Robot & Multi-Agent Systems (MRS) 2023

  39. arXiv:2306.11168  [pdf, other

    cs.LG cs.AI cs.MA

    Learning Models of Adversarial Agent Behavior under Partial Observability

    Authors: Sean Ye, Manisha Natarajan, Zixuan Wu, Rohan Paleja, Letian Chen, Matthew C. Gombolay

    Abstract: The need for opponent modeling and tracking arises in several real-world scenarios, such as professional sports, video game design, and drug-trafficking interdiction. In this work, we present Graph based Adversarial Modeling with Mutal Information (GrAMMI) for modeling the behavior of an adversarial opponent agent. GrAMMI is a novel graph neural network (GNN) based approach that uses mutual inform… ▽ More

    Submitted 5 July, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: 8 pages, 3 figures, 2 tables

  40. arXiv:2306.04893  [pdf, other

    cs.CV

    Co** with Change: Learning Invariant and Minimum Sufficient Representations for Fine-Grained Visual Categorization

    Authors: Shuo Ye, Shujian Yu, Wen** Hou, Yu Wang, Xinge You

    Abstract: Fine-grained visual categorization (FGVC) is a challenging task due to similar visual appearances between various species. Previous studies always implicitly assume that the training and test data have the same underlying distributions, and that features extracted by modern backbone architectures remain discriminative and generalize well to unseen test data. However, we empirically justify that th… ▽ More

    Submitted 9 October, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: Manuscript accepted by CVIU, code is available at Github

  41. arXiv:2306.02346  [pdf, other

    cs.CV

    CDLT: A Dataset with Concept Drift and Long-Tailed Distribution for Fine-Grained Visual Categorization

    Authors: Shuo Ye, Yufeng Shi, Ruxin Wang, Yu Wang, Jiamiao Xu, Chuanwu Yang, Xinge You

    Abstract: Data is the foundation for the development of computer vision, and the establishment of datasets plays an important role in advancing the techniques of fine-grained visual categorization~(FGVC). In the existing FGVC datasets used in computer vision, it is generally assumed that each collected instance has fixed characteristics and the distribution of different categories is relatively balanced. In… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

  42. arXiv:2305.14877  [pdf, other

    cs.CL

    Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis

    Authors: Sohee Yang, Jonghyeon Kim, Joel Jang, Seonghyeon Ye, Hyunji Lee, Minjoon Seo

    Abstract: Previous works in prompt engineering for large language models have introduced different gradient-free probability-based prompt selection methods that aim to choose the optimal prompt among the candidates for a given task but have failed to provide a comprehensive and fair comparison between each other. In this paper, we propose a unified framework to interpret and evaluate the existing probabilit… ▽ More

    Submitted 8 March, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: TACL 2024 (Pre-MIT Press publication version)

  43. arXiv:2305.14405  [pdf, other

    cs.LG cs.AI cs.AR

    NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference

    Authors: Ruiqi Sun, Siwei Ye, Jie Zhao, Xin He, Yiran Li, An Zou

    Abstract: The inherent diversity of computation types within individual Deep Neural Network (DNN) models imposes a corresponding need for a varied set of computation units within hardware processors. This diversity poses a significant constraint on computation efficiency during the execution of different neural networks. In this study, we present NeuralMatrix, a framework that transforms the computation of… ▽ More

    Submitted 8 February, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 11 pages, 6figures, Submitted to 41st International Conference on Machine Learning

  44. arXiv:2305.14045  [pdf, other

    cs.CL cs.AI cs.LG

    The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning

    Authors: Seungone Kim, Se June Joo, Doyoung Kim, Joel Jang, Seonghyeon Ye, Jamin Shin, Minjoon Seo

    Abstract: Language models (LMs) with less than 100B parameters are known to perform poorly on chain-of-thought (CoT) reasoning in contrast to large LMs when solving unseen tasks. In this work, we aim to equip smaller LMs with the step-by-step reasoning capability by instruction tuning with CoT rationales. In order to achieve this goal, we first introduce a new instruction-tuning dataset called the CoT Colle… ▽ More

    Submitted 14 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 (Main Conference)

  45. arXiv:2304.02745  [pdf, other

    cs.CG

    Analysis of Dynamic Voronoi Diagrams in the Hilbert Metric

    Authors: Madeline Bumpus, Xufeng Caesar Dai, Auguste H. Gezalyan, Sam Munoz, Renita Santhoshkumar, Songyu Ye, David M. Mount

    Abstract: The Hilbert metric is a projective metric defined on a convex body which generalizes the Cayley-Klein model of hyperbolic geometry to any convex set. In this paper we analyze Hilbert Voronoi diagrams in the Dynamic setting. In addition we introduce dynamic visualization software for Voronoi diagrams in the Hilbert metric on user specified convex polygons.

    Submitted 20 July, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

  46. arXiv:2303.03645  [pdf, other

    cs.CV cs.CC

    Filter Pruning based on Information Capacity and Independence

    Authors: Xiaolong Tang, Shuo Ye, Yufeng Shi, Tianheng Hu, Qinmu Peng, Xinge You

    Abstract: Filter pruning has gained widespread adoption for the purpose of compressing and speeding up convolutional neural networks (CNNs). However, existing approaches are still far from practical applications due to biased filter selection and heavy computation cost. This paper introduces a new filter pruning method that selects filters in an interpretable, multi-perspective, and lightweight manner. Spec… ▽ More

    Submitted 12 June, 2024; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS).The code will be available at https://github.com/txl-hub/ICI

  47. arXiv:2303.03131  [pdf, other

    cs.CV cs.AI cs.MM

    Video Question Answering Using CLIP-Guided Visual-Text Attention

    Authors: Shuhong Ye, Weikai Kong, Chenglin Yao, Jianfeng Ren, Xudong Jiang

    Abstract: Cross-modal learning of video and text plays a key role in Video Question Answering (VideoQA). In this paper, we propose a visual-text attention mechanism to utilize the Contrastive Language-Image Pre-training (CLIP) trained on lots of general domain language-image pairs to guide the cross-modal learning for VideoQA. Specifically, we first extract video features using a TimeSformer and text featur… ▽ More

    Submitted 8 March, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Submitted to the 2023 IEEE International Conference on Image Processing (ICIP 2023)

    ACM Class: I.2.10

  48. arXiv:2303.03105  [pdf, other

    cs.MM

    Confidence-based Event-centric Online Video Question Answering on a Newly Constructed ATBS Dataset

    Authors: Weikai Kong, Shuhong Ye, Chenglin Yao, Jianfeng Ren

    Abstract: Deep neural networks facilitate video question answering (VideoQA), but the real-world applications on video streams such as CCTV and live cast place higher demands on the solver. To address the challenges of VideoQA on long videos of unknown length, we define a new set of problems called Online Open-ended Video Question Answering (O^2VQA). It requires an online state-updating mechanism for the so… ▽ More

    Submitted 7 March, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted for publication at the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

  49. arXiv:2303.02455  [pdf, other

    cs.CV

    DistilPose: Tokenized Pose Regression with Heatmap Distillation

    Authors: Suhang Ye, Yingyi Zhang, Jie Hu, Liujuan Cao, Shengchuan Zhang, Lei Shen, Jun Wang, Shouhong Ding, Rongrong Ji

    Abstract: In the field of human pose estimation, regression-based methods have been dominated in terms of speed, while heatmap-based methods are far ahead in terms of performance. How to take advantage of both schemes remains a challenging problem. In this paper, we propose a novel human pose estimation framework termed DistilPose, which bridges the gaps between heatmap-based and regression-based methods. S… ▽ More

    Submitted 16 March, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

    Comments: accepted by CVPR2023

  50. arXiv:2302.14691  [pdf, other

    cs.CL cs.AI

    Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following

    Authors: Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, Minjoon Seo

    Abstract: In this paper, we present our finding that prepending a Task-Agnostic Prefix Prompt (TAPP) to the input improves the instruction-following ability of various Large Language Models (LLMs) during inference. TAPP is different from canonical prompts for LLMs in that it is a fixed prompt prepended to the beginning of every input regardless of the target task for zero-shot generalization. We observe tha… ▽ More

    Submitted 24 December, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: AAAI 2024