Skip to main content

Showing 1–50 of 269 results for author: Han, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.01349  [pdf, other

    cs.CV

    Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation

    Authors: Enhui Ma, Lijun Zhou, Tao Tang, Zhan Zhang, Dong Han, Junpeng Jiang, Kun Zhan, Peng Jia, Xianpeng Lang, Haiyang Sun, Di Lin, Kaicheng Yu

    Abstract: Using generative models to synthesize new data has become a de-facto standard in autonomous driving to address the data scarcity issue. Though existing approaches are able to boost perception models, we discover that these approaches fail to improve the performance of planning of end-to-end autonomous driving models as the generated videos are usually less than 8 frames and the spatial and tempora… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Project Page: https://westlake-autolab.github.io/delphi.github.io/, 8 figures

  2. arXiv:2406.00988  [pdf, other

    cs.AR

    ADE-HGNN: Accelerating HGNNs through Attention Disparity Exploitation

    Authors: Dengke Han, Meng Wu, Runzhen Xue, Mingyu Yan, Xiaochun Ye, Dongrui Fan

    Abstract: Heterogeneous Graph Neural Networks (HGNNs) have recently demonstrated great power in handling heterogeneous graph data, rendering them widely applied in many critical real-world domains. Most HGNN models leverage attention mechanisms to significantly improvemodel accuracy, albeit at the cost of increased computational complexity and memory bandwidth requirements. Fortunately, the attention dispar… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 15 pages, 9 figures, accepted by Euro-PAR 2024

  3. arXiv:2405.16605  [pdf, other

    cs.CV

    Demystify Mamba in Vision: A Linear Attention Perspective

    Authors: Dongchen Han, Ziyi Wang, Zhuofan Xia, Yizeng Han, Yifan Pu, Chunjiang Ge, Jun Song, Shiji Song, Bo Zheng, Gao Huang

    Abstract: Mamba is an effective state space model with linear computation complexity. It has recently shown impressive efficiency in dealing with high-resolution inputs across various vision tasks. In this paper, we reveal that the powerful Mamba model shares surprising similarities with linear attention Transformer, which typically underperform conventional Transformer in practice. By exploring the similar… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  4. arXiv:2405.14745  [pdf, other

    cs.LG

    AnyLoss: Transforming Classification Metrics into Loss Functions

    Authors: Doheon Han, Nuno Moniz, Nitesh V Chawla

    Abstract: Many evaluation metrics can be used to assess the performance of models in binary classification tasks. However, most of them are derived from a confusion matrix in a non-differentiable form, making it very difficult to generate a differentiable loss function that could directly optimize them. The lack of solutions to bridge this challenge not only hinders our ability to solve difficult tasks, suc… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  5. arXiv:2405.14362  [pdf, other

    cs.NE

    Advancing Spiking Neural Networks for Sequential Modeling with Central Pattern Generators

    Authors: Changze Lv, Dongqi Han, Yansen Wang, Xiaoqing Zheng, Xuan**g Huang, Dongsheng Li

    Abstract: Spiking neural networks (SNNs) represent a promising approach to develo** artificial neural networks that are both energy-efficient and biologically plausible. However, applying SNNs to sequential tasks, such as text classification and time-series forecasting, has been hindered by the challenge of creating an effective and hardware-friendly spike-form positional encoding (PE) strategy. Drawing i… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  6. arXiv:2405.13549  [pdf, other

    eess.SP cs.IT

    Multi-Objective Optimization-Based Waveform Design for Multi-User and Multi-Target MIMO-ISAC Systems

    Authors: Peng Wang, Dongsheng Han, Yashuai Cao, Wanli Ni, Dusit Niyato

    Abstract: Integrated sensing and communication (ISAC) opens up new service possibilities for sixth-generation (6G) systems, where both communication and sensing (C&S) functionalities co-exist by sharing the same hardware platform and radio resource. In this paper, we investigate the waveform design problem in a downlink multi-user and multi-target ISAC system under different C&S performance preferences. The… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 13 pages, submitted to IEEE TWC

  7. arXiv:2405.05947  [pdf, other

    cs.HC

    A Survey on Visualization Approaches in Political Science for Social and Political Factors: Progress to Date and Future Opportunities

    Authors: Dongyun Han, Abdullah-Al-Raihan Nayeem, Jason Windett, Yaoyao Dai, Benjamin Radford, Isaac Cho

    Abstract: Politics is the set of activities related to strategic decision-making in groups. Political scientists study the strategic interactions between states, institutions, politicians, and citizens; they seek to understand the causes and consequences of those decisions and interactions. While some decisions might alleviate social problems, others might lead to disasters such as war and conflict. Data vi… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  8. arXiv:2405.02760  [pdf, other

    cs.CE cs.SI

    GTFS2STN: Analyzing GTFS Transit Data by Generating Spatiotemporal Transit Network

    Authors: Diyi Liu, **g Guo, Yangsong Gu, Meredith King, Lee D. Han, Candace Brakewood

    Abstract: GTFS, the General Transit Feed Specialization, is an open standard format to record transit information used by thousands of transit agencies across the world. By converting a static GTFS transit network to a spatiotemporal network connecting bus stops over space and time, a preliminary tool named GTFS2STN is implemented to analyze the accessibility of the transit system. Furthermore, a simple app… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 8 pages, 8 figures

  9. arXiv:2405.01113  [pdf, other

    cs.CV cs.AI eess.IV

    Domain-Transferred Synthetic Data Generation for Improving Monocular Depth Estimation

    Authors: Seungyeop Lee, Knut Peterson, Solmaz Arezoomandan, Bill Cai, Peihan Li, Lifeng Zhou, David Han

    Abstract: A major obstacle to the development of effective monocular depth estimation algorithms is the difficulty in obtaining high-quality depth data that corresponds to collected RGB images. Collecting this data is time-consuming and costly, and even data collected by modern sensors has limited range or resolution, and is subject to inconsistencies and noise. To combat this, we propose a method of data g… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  10. arXiv:2404.18560  [pdf, other

    math.OC cs.RO

    Non-convex Pose Graph Optimization in SLAM via Proximal Linearized Riemannian ADMM

    Authors: Xin Chen, Chunfeng Cui, Deren Han, Liqun Qi

    Abstract: Pose graph optimization (PGO) is a well-known technique for solving the pose-based simultaneous localization and map** (SLAM) problem. In this paper, we represent the rotation and translation by a unit quaternion and a three-dimensional vector, and propose a new PGO model based on the von Mises-Fisher distribution. The constraints derived from the unit quaternions are spherical manifolds, and th… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  11. arXiv:2404.17955  [pdf, other

    cs.SE

    A Survey of Third-Party Library Security Research in Application Software

    Authors: Jia Zeng, Dan Han, Yaling Zhu, Yangzhong Wang, Fangchen Weng

    Abstract: In the current software development environment, third-party libraries play a crucial role. They provide developers with rich functionality and convenient solutions, speeding up the pace and efficiency of software development. However, with the widespread use of third-party libraries, associated security risks and potential vulnerabilities are increasingly apparent. Malicious attackers can exploit… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 21 pages, 3 figures, one table

  12. arXiv:2404.17507  [pdf, other

    cs.CV

    HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts

    Authors: Wonjae Kim, Sanghyuk Chun, Taekyung Kim, Dongyoon Han, Sangdoo Yun

    Abstract: In an era where the volume of data drives the effectiveness of self-supervised learning, the specificity and clarity of data semantics play a crucial role in model training. Addressing this, we introduce HYPerbolic Entailment filtering (HYPE), a novel methodology designed to meticulously extract modality-wise meaningful and well-aligned data from extensive, noisy image-text pair datasets. Our appr… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 28pages, 4.5MB

  13. arXiv:2404.16659  [pdf, other

    cs.CL cs.AI

    ProbGate at EHRSQL 2024: Enhancing SQL Query Generation Accuracy through Probabilistic Threshold Filtering and Error Handling

    Authors: Sangryul Kim, Donghee Han, Sehyun Kim

    Abstract: Recently, deep learning-based language models have significantly enhanced text-to-SQL tasks, with promising applications in retrieving patient records within the medical domain. One notable challenge in such applications is discerning unanswerable queries. Through fine-tuning model, we demonstrate the feasibility of converting medical record inquiries into SQL queries. Additionally, we introduce a… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: The 6th Clinical Natural Language Processing Workshop at NAACL 2024. Code is available at https://github.com/venzino-han/probgate_ehrsql

  14. arXiv:2404.14285  [pdf, other

    cs.RO cs.AI

    LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekee** Robots

    Authors: Dongge Han, Trevor McInroe, Adam Jelley, Stefano V. Albrecht, Peter Bell, Amos Storkey

    Abstract: Large language models (LLMs) have shown significant potential for robotics applications, particularly task planning, by harnessing their language comprehension and text generation capabilities. However, in applications such as household robotics, a critical gap remains in the personalization of these models to individual user preferences. We introduce LLM-Personalize, a novel framework with an opt… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  15. arXiv:2404.09490  [pdf, other

    cs.CV

    Leveraging Temporal Contextualization for Video Action Recognition

    Authors: Minji Kim, Dongyoon Han, Taekyung Kim, Bohyung Han

    Abstract: Pretrained vision-language models have shown effectiveness in video understanding. However, recent studies have not sufficiently leveraged essential temporal information from videos, simply averaging frame-wise representations or referencing consecutive frames. We introduce Temporally Contextualized CLIP (TC-CLIP), a pioneering framework for video understanding that effectively and efficiently lev… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 24 pages, 10 figures, 12 tables

  16. arXiv:2404.08003  [pdf, other

    cs.LG cs.DC cs.NI

    Asynchronous Federated Reinforcement Learning with Policy Gradient Updates: Algorithm Design and Convergence Analysis

    Authors: Guangchen Lan, Dong-Jun Han, Abolfazl Hashemi, Vaneet Aggarwal, Christopher G. Brinton

    Abstract: To improve the efficiency of reinforcement learning, we propose a novel asynchronous federated reinforcement learning framework termed AFedPG, which constructs a global model through collaboration among $N$ agents using policy gradient (PG) updates. To handle the challenge of lagged policies in asynchronous settings, we design delay-adaptive lookahead and normalized update techniques that can effe… ▽ More

    Submitted 14 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    ACM Class: I.2.6; I.2.11

  17. GDR-HGNN: A Heterogeneous Graph Neural Networks Accelerator Frontend with Graph Decoupling and Recoupling

    Authors: Runzhen Xue, Mingyu Yan, Dengke Han, Yihan Teng, Zhimin Tang, Xiaochun Ye, Dongrui Fan

    Abstract: Heterogeneous Graph Neural Networks (HGNNs) have broadened the applicability of graph representation learning to heterogeneous graphs. However, the irregular memory access pattern of HGNNs leads to the buffer thrashing issue in HGNN accelerators. In this work, we identify an opportunity to address buffer thrashing in HGNN acceleration through an analysis of the topology of heterogeneous graphs. To… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 6 pages, 10 figures, accepted by DAC'61

  18. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seong** Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  19. arXiv:2404.01745  [pdf, other

    cs.CV cs.AI

    Unleash the Potential of CLIP for Video Highlight Detection

    Authors: Donghoon Han, Seunghyeon Seo, Eunhwan Park, Seong-Uk Nam, Nojun Kwak

    Abstract: Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-train… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  20. arXiv:2404.01604  [pdf, other

    cs.CV eess.IV

    WaveDH: Wavelet Sub-bands Guided ConvNet for Efficient Image Dehazing

    Authors: Seongmin Hwang, Daeyoung Han, Cheolkon Jung, Moongu Jeon

    Abstract: The surge in interest regarding image dehazing has led to notable advancements in deep learning-based single image dehazing approaches, exhibiting impressive performance in recent studies. Despite these strides, many existing methods fall short in meeting the efficiency demands of practical applications. In this paper, we introduce WaveDH, a novel and compact ConvNet designed to address this effic… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Submitted to TMM

    MSC Class: 68T07 ACM Class: I.4.4; I.4.9

  21. arXiv:2403.19588  [pdf, other

    cs.CV cs.LG cs.NE

    DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs

    Authors: Donghyun Kim, Byeongho Heo, Dongyoon Han

    Abstract: This paper revives Densely Connected Convolutional Networks (DenseNets) and reveals the underrated effectiveness over predominant ResNet-style architectures. We believe DenseNets' potential was overlooked due to untouched training methods and traditional design elements not fully revealing their capabilities. Our pilot study shows dense connections through concatenation are strong, demonstrating t… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Code at https://github.com/naver-ai/rdnet

  22. arXiv:2403.19522  [pdf, other

    cs.LG cs.CV

    Model Stock: All we need is just a few fine-tuned models

    Authors: Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han

    Abstract: This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance. Breaking away from traditional practices that need a multitude of fine-tuned models for averaging, our approach employs significantly fewer models to achieve final weights yet yield superior accuracy. Drawing from key insights in the we… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Code at https://github.com/naver-ai/model-stock

  23. arXiv:2403.15692  [pdf, other

    cs.IT eess.SP

    Block Orthogonal Sparse Superposition Codes for $ \sf{L}^3 $ Communications: Low Error Rate, Low Latency, and Low Power Consumption

    Authors: Donghwa Han, Bowhyung Lee, Min Jang, Donghun Lee, Seho Myung, Namyoon Lee

    Abstract: Block orthogonal sparse superposition (BOSS) code is a class of joint coded modulation methods, which can closely achieve the finite-blocklength capacity with a low-complexity decoder at a few coding rates under Gaussian channels. However, for fading channels, the code performance degrades considerably because coded symbols experience different channel fading effects. In this paper, we put forth n… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  24. arXiv:2403.13298  [pdf, other

    cs.CV cs.LG

    Rotary Position Embedding for Vision Transformer

    Authors: Byeongho Heo, Song Park, Dongyoon Han, Sangdoo Yun

    Abstract: Rotary Position Embedding (RoPE) performs remarkably on language models, especially for length extrapolation of Transformers. However, the impacts of RoPE on computer vision domains have been underexplored, even though RoPE appears capable of enhancing Vision Transformer (ViT) performance in a way similar to the language domain. This study provides a comprehensive analysis of RoPE when applied to… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 20 pages, 5 figures

  25. arXiv:2403.12404  [pdf, other

    cs.LG cs.CV

    Understanding and Improving Training-free Loss-based Diffusion Guidance

    Authors: Yifei Shen, Xinyang Jiang, Yezhen Wang, Yifan Yang, Dongqi Han, Dongsheng Li

    Abstract: Adding additional control to pretrained diffusion models has become an increasingly popular research area, with extensive applications in computer vision, reinforcement learning, and AI for science. Recently, several studies have proposed training-free loss-based guidance by using off-the-shelf networks pretrained on clean images. This approach enables zero-shot conditional generation for universa… ▽ More

    Submitted 29 May, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  26. arXiv:2403.11079  [pdf, other

    cs.SE cs.LG

    Bridging Expert Knowledge with Deep Learning Techniques for Just-In-Time Defect Prediction

    Authors: Xin Zhou, DongGyun Han, David Lo

    Abstract: Just-In-Time (JIT) defect prediction aims to automatically predict whether a commit is defective or not, and has been widely studied in recent years. In general, most studies can be classified into two categories: 1) simple models using traditional machine learning classifiers with hand-crafted features, and 2) complex models using deep learning techniques to automatically extract features from co… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: 48 pages

  27. arXiv:2403.09675  [pdf, other

    cs.CV cs.GR

    Open-Universe Indoor Scene Generation using LLM Program Synthesis and Uncurated Object Databases

    Authors: Rio Aguina-Kang, Maxim Gumin, Do Heon Han, Stewart Morris, Seung Jean Yoo, Aditya Ganeshan, R. Kenny Jones, Qiuhong Anna Wei, Kailiang Fu, Daniel Ritchie

    Abstract: We present a system for generating indoor scenes in response to text prompts. The prompts are not limited to a fixed vocabulary of scene descriptions, and the objects in generated scenes are not restricted to a fixed set of object categories -- we call this setting indoor scene generation. Unlike most prior work on indoor scene generation, our system does not require a large training dataset of ex… ▽ More

    Submitted 4 February, 2024; originally announced March 2024.

    Comments: See ancillary files for link to supplemental material

  28. arXiv:2402.15265  [pdf, other

    cs.HC cs.CL

    CloChat: Understanding How People Customize, Interact, and Experience Personas in Large Language Models

    Authors: Juhye Ha, Hyeon Jeon, DaEun Han, **wook Seo, Changhoon Oh

    Abstract: Large language models (LLMs) have facilitated significant strides in generating conversational agents, enabling seamless, contextually relevant dialogues across diverse topics. However, the existing LLM-driven conversational agents have fixed personalities and functionalities, limiting their adaptability to individual user needs. Creating personalized agent personas with distinct expertise or trai… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '24)

  29. arXiv:2402.15019  [pdf, other

    cs.LG cs.AI stat.ML

    Consistency-Guided Temperature Scaling Using Style and Content Information for Out-of-Domain Calibration

    Authors: Wonjeong Choi, Jungwuk Park, Dong-Jun Han, Younghyun Park, Jaekyun Moon

    Abstract: Research interests in the robustness of deep neural networks against domain shifts have been rapidly increasing in recent years. Most existing works, however, focus on improving the accuracy of the model, not the calibration performance which is another important requirement for trustworthy AI systems. Temperature scaling (TS), an accuracy-preserving post-hoc calibration method, has been proven to… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Accepted at AAAI-24 (The 38th AAAI Conference on Artificial Intelligence, February 2024)

  30. arXiv:2402.13851  [pdf, other

    cs.CV

    VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models

    Authors: Jiawei Liang, Siyuan Liang, Man Luo, Aishan Liu, Dongchen Han, Ee-Chien Chang, Xiaochun Cao

    Abstract: Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context. Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities. However, we uncover the potential threat posed by backdoor attacks on autoregressive VLMs during instruction tuning. Adversaries can implant a backdoor by injecting pois… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  31. arXiv:2402.08963  [pdf, other

    cs.LG cs.AI

    DUEL: Duplicate Elimination on Active Memory for Self-Supervised Class-Imbalanced Learning

    Authors: Won-Seok Choi, Hyundo Lee, Dong-Sig Han, Junseok Park, Heeyeon Koo, Byoung-Tak Zhang

    Abstract: Recent machine learning algorithms have been developed using well-curated datasets, which often require substantial cost and resources. On the other hand, the direct use of raw data often leads to overfitting towards frequently occurring class information. To address class imbalances cost-efficiently, we propose an active data filtering process during self-supervised pre-training in our novel fram… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted as a full paper at AAAI 2024: The 38th Annual AAAI Conference on Artificial Intelligence (Main Tech Track). 7 pages (main paper), 2 pages (references), 11 pages (appendix) each

  32. arXiv:2402.03448  [pdf, other

    cs.LG cs.DC

    Decentralized Sporadic Federated Learning: A Unified Algorithmic Framework with Convergence Guarantees

    Authors: Shahryar Zehtabi, Dong-Jun Han, Rohit Parasnis, Seyyedali Hosseinalipour, Christopher G. Brinton

    Abstract: Decentralized federated learning (DFL) captures FL settings where both (i) model updates and (ii) model aggregations are exclusively carried out by the clients without a central server. Existing DFL works have mostly focused on settings where clients conduct a fixed number of local updates between local model exchanges, overlooking heterogeneity and dynamics in communication and computation capabi… ▽ More

    Submitted 31 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  33. arXiv:2402.02442  [pdf, other

    cs.LG eess.IV

    A Momentum Accelerated Algorithm for ReLU-based Nonlinear Matrix Decomposition

    Authors: Qingsong Wang, Chunfeng Cui, Deren Han

    Abstract: Recently, there has been a growing interest in the exploration of Nonlinear Matrix Decomposition (NMD) due to its close ties with neural networks. NMD aims to find a low-rank matrix from a sparse nonnegative matrix with a per-element nonlinear function. A typical choice is the Rectified Linear Unit (ReLU) activation function. To address over-fitting in the existing ReLU-based NMD model (ReLU-NMD),… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 5 pages, 7 figures

  34. arXiv:2402.02225  [pdf, other

    cs.LG

    Rethinking the Starting Point: Collaborative Pre-Training for Federated Downstream Tasks

    Authors: Yun-Wei Chu, Dong-Jun Han, Seyyedali Hosseinalipour, Christopher G. Brinton

    Abstract: A few recent studies have demonstrated that leveraging centrally pre-trained models can offer advantageous initializations for federated learning (FL). However, existing pre-training methods do not generalize well when faced with an arbitrary set of downstream FL tasks. Specifically, they often (i) achieve limited average accuracy, particularly when there are unseen downstream labels, and (ii) res… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  35. arXiv:2402.01533  [pdf, other

    cs.NE

    Efficient and Effective Time-Series Forecasting with Spiking Neural Networks

    Authors: Changze Lv, Yansen Wang, Dongqi Han, Xiaoqing Zheng, Xuan**g Huang, Dongsheng Li

    Abstract: Spiking neural networks (SNNs), inspired by the spiking behavior of biological neurons, provide a unique pathway for capturing the intricacies of temporal data. However, applying SNNs to time-series forecasting is challenging due to difficulties in effective temporal alignment, complexities in encoding processes, and the absence of standardized guidelines for model selection. In this paper, we pro… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  36. Improving the accuracy of freight mode choice models: A case study using the 2017 CFS PUF data set and ensemble learning techniques

    Authors: Diyi Liu, Hyeonsup Lim, Majbah Uddin, Yuandong Liu, Lee D. Han, Ho-ling Hwang, Shih-Miao Chin

    Abstract: The US Census Bureau has collected two rounds of experimental data from the Commodity Flow Survey, providing shipment-level characteristics of nationwide commodity movements, published in 2012 (i.e., Public Use Microdata) and in 2017 (i.e., Public Use File). With this information, data-driven methods have become increasingly valuable for understanding detailed patterns in freight logistics. In thi… ▽ More

    Submitted 12 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Journal ref: Expert Systems with Applications, 240, 122478 (2024)

  37. arXiv:2401.16685  [pdf, other

    cs.LG cs.DC

    Communication-Efficient Multimodal Federated Learning: Joint Modality and Client Selection

    Authors: Liangqi Yuan, Dong-Jun Han, Su Wang, Devesh Upadhyay, Christopher G. Brinton

    Abstract: Multimodal federated learning (FL) aims to enrich model training in FL settings where clients are collecting measurements across multiple modalities. However, key challenges to multimodal FL remain unaddressed, particularly in heterogeneous network settings where: (i) the set of modalities collected by each client will be diverse, and (ii) communication limitations prevent clients from uploading a… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.07048

  38. arXiv:2401.15459  [pdf, other

    cs.SE

    Multi-LLM Collaboration + Data-Centric Innovation = 2x Better Vulnerability Repair

    Authors: Xin Zhou, Kisub Kim, Bowen Xu, DongGyun Han, David Lo

    Abstract: The advances of deep learning (DL) have paved the way for automatic software vulnerability repair approaches, which effectively learn the map** from the vulnerable code to the fixed code. Nevertheless, existing DL-based vulnerability repair methods face notable limitations: 1) they struggle to handle lengthy vulnerable code, 2) they treat code as natural language texts, neglecting its inherent s… ▽ More

    Submitted 12 March, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

    Comments: Accepted in the ICSE 2024 Research Track with a different title "Out of Sight, Out of Mind: Better Automatic Vulnerability Repair by Broadening Input Ranges and Sources"

  39. arXiv:2401.13386  [pdf, other

    cs.CV

    Privacy-Preserving Face Recognition in Hybrid Frequency-Color Domain

    Authors: Dong Han, Yong Li, Joachim Denzler

    Abstract: Face recognition technology has been deployed in various real-life applications. The most sophisticated deep learning-based face recognition systems rely on training millions of face images through complex deep neural networks to achieve high accuracy. It is quite common for clients to upload face images to the service provider in order to access the model inference. However, the face image is a t… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: This work is already accepted at the conference International Conference on Computer Vision Theory and Applications (VISAPP) 2024 as a regular paper

  40. arXiv:2401.10691  [pdf, other

    cs.CR

    Explainable and Transferable Adversarial Attack for ML-Based Network Intrusion Detectors

    Authors: Hangsheng Zhang, Dongqi Han, Yinlong Liu, Zhiliang Wang, Jiyan Sun, Shangyuan Zhuang, Jiqiang Liu, **song Dong

    Abstract: espite being widely used in network intrusion detection systems (NIDSs), machine learning (ML) has proven to be highly vulnerable to adversarial attacks. White-box and black-box adversarial attacks of NIDS have been explored in several studies. However, white-box attacks unrealistically assume that the attackers have full knowledge of the target NIDSs. Meanwhile, existing black-box attacks can not… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

  41. arXiv:2401.07456  [pdf, other

    cs.CL cs.AI

    Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation

    Authors: Yun-Wei Chu, Dong-Jun Han, Christopher G. Brinton

    Abstract: Federated learning (FL) is a promising approach for solving multilingual tasks, potentially enabling clients with their own language-specific data to collaboratively construct a high-quality neural machine translation (NMT) model. However, communication constraints in practical network systems present challenges for exchanging large-scale NMT engines between FL parties. In this paper, we propose a… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  42. arXiv:2401.03398  [pdf, other

    cs.CY cs.RO

    Amplifying robotics capacities with a human touch: An immersive low-latency panoramic remote system

    Authors: Junjie Li, Kang Li, Dewei Han, Jian Xu, Zhaoyuan Ma

    Abstract: AI and robotics technologies have witnessed remarkable advancements in the past decade, revolutionizing work patterns and opportunities in various domains. The application of these technologies has propelled society towards an era of symbiosis between humans and machines. To facilitate efficient communication between humans and intelligent robots, we propose the "Avatar" system, an immersive low-l… ▽ More

    Submitted 8 January, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

    Comments: 9 pages, 4 figures

  43. arXiv:2401.00442  [pdf, other

    cs.CV

    A Comprehensive Overview of Fish-Eye Camera Distortion Correction Methods

    Authors: Jian Xu, De-Wei Han, Kang Li, Jun-Jie Li, Zhao-Yuan Ma

    Abstract: The fisheye camera, with its unique wide field of view and other characteristics, has found extensive applications in various fields. However, the fisheye camera suffers from significant distortion compared to pinhole cameras, resulting in distorted images of captured objects. Fish-eye camera distortion is a common issue in digital image processing, requiring effective correction techniques to enh… ▽ More

    Submitted 13 May, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

  44. arXiv:2401.00254  [pdf, other

    cs.CV

    Morphing Tokens Draw Strong Masked Image Models

    Authors: Taekyung Kim, Byeongho Heo, Dongyoon Han

    Abstract: Masked image modeling (MIM) is a promising option for training Vision Transformers among various self-supervised learning (SSL) methods. The essence of MIM lies in token-wise masked token predictions, with targets tokenized from images or generated by pre-trained models such as vision-language models. While tokenizers or pre-trained models are plausible MIM targets, they often offer spatially inco… ▽ More

    Submitted 2 May, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

    Comments: 27 pages, 17 tables, 6 figures

  45. arXiv:2312.15361  [pdf, other

    cs.DC cs.AI

    Cooperative Federated Learning over Ground-to-Satellite Integrated Networks: Joint Local Computation and Data Offloading

    Authors: Dong-Jun Han, Seyyedali Hosseinalipour, David J. Love, Mung Chiang, Christopher G. Brinton

    Abstract: While network coverage maps continue to expand, many devices located in remote areas remain unconnected to terrestrial communication infrastructures, preventing them from getting access to the associated data-driven services. In this paper, we propose a ground-to-satellite cooperative federated learning (FL) methodology to facilitate machine learning service management over remote regions. Our met… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: This paper is accepted for publication in IEEE Journal on Selected Areas in Communications (JSAC)

  46. arXiv:2312.10105  [pdf, other

    cs.CV

    SeiT++: Masked Token Modeling Improves Storage-efficient Training

    Authors: Minhyun Lee, Song Park, Byeongho Heo, Dongyoon Han, Hyunjung Shim

    Abstract: Recent advancements in Deep Neural Network (DNN) models have significantly improved performance across computer vision tasks. However, achieving highly generalizable and high-performing vision models requires expansive datasets, resulting in significant storage requirements. This storage challenge is a critical bottleneck for scaling up models. A recent breakthrough by SeiT proposed the use of Vec… ▽ More

    Submitted 2 April, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: First two authors contributed equally

  47. arXiv:2312.10103  [pdf, other

    cs.CV

    GSVA: Generalized Segmentation via Multimodal Large Language Models

    Authors: Zhuofan Xia, Dongchen Han, Yizeng Han, Xuran Pan, Shiji Song, Gao Huang

    Abstract: Generalized Referring Expression Segmentation (GRES) extends the scope of classic RES to refer to multiple objects in one expression or identify the empty targets absent in the image. GRES poses challenges in modeling the complex spatial relationships of the instances in the image and identifying non-existing referents. Multimodal Large Language Models (MLLMs) have recently shown tremendous progre… ▽ More

    Submitted 21 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR2024 (19 pages, 9 figures, 11 tables)

  48. arXiv:2312.09579  [pdf, other

    cs.CV cs.AI

    MobileSAMv2: Faster Segment Anything to Everything

    Authors: Chaoning Zhang, Dongshen Han, Sheng Zheng, **woo Choi, Tae-Ho Kim, Choong Seon Hong

    Abstract: Segment anything model (SAM) addresses two practical yet challenging segmentation tasks: \textbf{segment anything (SegAny)}, which utilizes a certain point to predict the mask for a single object of interest, and \textbf{segment everything (SegEvery)}, which predicts the masks for all objects on the image. What makes SegAny slow for SAM is its heavyweight image encoder, which has been addressed by… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: MobileSAM achieves faster segment anything, while MobileSAMv2 achieves faster segment everything

  49. arXiv:2312.09461  [pdf, other

    eess.SP cs.HC cs.LG

    Improving Generalization of Drowsiness State Classification by Domain-Specific Normalization

    Authors: Dong-Young Kim, Dong-Kyun Han, Seo-Hyeon Park, Geun-Deok Jang, Seong-Whan Lee

    Abstract: Abnormal driver states, particularly have been major concerns for road safety, emphasizing the importance of accurate drowsiness detection to prevent accidents. Electroencephalogram (EEG) signals are recognized for their effectiveness in monitoring a driver's mental state by monitoring brain activities. However, the challenge lies in the requirement for prior calibration due to the variation of EE… ▽ More

    Submitted 14 November, 2023; originally announced December 2023.

    Comments: Submitted to 2024 12th IEEE International Winter Conference on Brain-Computer Interface

  50. arXiv:2312.08874  [pdf, other

    cs.CV

    Agent Attention: On the Integration of Softmax and Linear Attention

    Authors: Dongchen Han, Tianzhu Ye, Yizeng Han, Zhuofan Xia, Shiji Song, Gao Huang

    Abstract: The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation power. Specifically, the Agent Attention… ▽ More

    Submitted 22 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.