Skip to main content

Showing 1–50 of 644 results for author: Lin, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00009  [pdf, other

    cs.DC cs.NI

    An Open-Source Fast Parallel Routing Approach for Commercial FPGAs

    Authors: Xinshi Zang, Wenhao Lin, Shiju Lin, **wei Liu, Evangeline F. Y. Young

    Abstract: In the face of escalating complexity and size of contemporary FPGAs and circuits, routing emerges as a pivotal and time-intensive phase in FPGA compilation flows. In response to this challenge, we present an open-source parallel routing methodology designed to expedite routing procedures for commercial FPGAs. Our approach introduces a novel recursive partitioning ternary tree to augment the parall… ▽ More

    Submitted 25 April, 2024; originally announced July 2024.

  2. arXiv:2406.19394  [pdf, other

    cs.CV

    HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

    Authors: Liujuan Cao, Jianghang Lin, Zebo Hong, Yunhang Shen, Shaohui Lin, Chao Chen, Rongrong Ji

    Abstract: Most WSOD methods rely on traditional object proposals to generate candidate regions and are confronted with unstable training, which easily gets stuck in a poor local optimum. In this paper, we introduce a unified, high-capacity weakly supervised object detection (WSOD) network called HUWSOD, which utilizes a comprehensive self-training framework without needing external modules or additional sup… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2406.16437  [pdf, other

    cs.LG cs.AI

    Theory on Mixture-of-Experts in Continual Learning

    Authors: Hongbo Li, Sen Lin, Lingjie Duan, Yingbin Liang, Ness B. Shroff

    Abstract: Continual learning (CL) has garnered significant attention because of its ability to adapt to new tasks that arrive over time. Catastrophic forgetting (of old tasks) has been identified as a major issue in CL, as the model adapts to new tasks. The Mixture-of-Experts (MoE) model has recently been shown to effectively mitigate catastrophic forgetting in CL, by employing a gating network to sparsify… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.12433  [pdf, other

    cs.IR

    LLM-enhanced Reranking in Recommender Systems

    Authors: **gtong Gao, Bo Chen, Xiangyu Zhao, Weiwen Liu, Xiangyang Li, Yichao Wang, Zijian Zhang, Wanyu Wang, Yuyang Ye, Shanru Lin, Huifeng Guo, Ruiming Tang

    Abstract: Reranking is a critical component in recommender systems, playing an essential role in refining the output of recommendation algorithms. Traditional reranking models have focused predominantly on accuracy, but modern applications demand consideration of additional criteria such as diversity and fairness. Existing reranking approaches often fail to harmonize these diverse criteria effectively at th… ▽ More

    Submitted 20 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  5. arXiv:2406.11251  [pdf, other

    cs.IR

    Unifying Multimodal Retrieval via Document Screenshot Embedding

    Authors: Xueguang Ma, Sheng-Chieh Lin, Minghan Li, Wenhu Chen, Jimmy Lin

    Abstract: In the real world, documents are organized in different formats and varied modalities. Traditional retrieval pipelines require tailored document parsing techniques and content extraction modules to prepare input for indexing. This process is tedious, prone to errors, and has information loss. To this end, we propose Document Screenshot Embedding} (DSE), a novel retrieval paradigm that regards docu… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  6. arXiv:2406.10961  [pdf, other

    cs.CV cs.AI cs.CY

    Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIP

    Authors: Shuyang Lin, Tong Jia, Hao Wang, Bowen Ma, Mingyuan Li, Dongyue Chen

    Abstract: X-ray prohibited item detection is an essential component of security check and categories of prohibited item are continuously increasing in accordance with the latest laws. Previous works all focus on close-set scenarios, which can only recognize known categories used for training and often require time-consuming as well as labor-intensive annotations when learning novel categories, resulting in… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  7. arXiv:2406.10280  [pdf, other

    cs.CR cs.CL cs.LG

    Transferable Embedding Inversion Attack: Uncovering Privacy Risks in Text Embeddings without Model Queries

    Authors: Yu-Hsiang Huang, Yuche Tsai, Hsiang Hsiao, Hong-Yi Lin, Shou-De Lin

    Abstract: This study investigates the privacy risks associated with text embeddings, focusing on the scenario where attackers cannot access the original embedding model. Contrary to previous research requiring direct model access, we explore a more realistic threat model by develo** a transfer attack method. This approach uses a surrogate model to mimic the victim model's behavior, allowing the attacker t… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024 Main Conference

  8. arXiv:2406.06253  [pdf, other

    eess.SY cs.PL

    PretVM: Predictable, Efficient Virtual Machine for Real-Time Concurrency

    Authors: Shaokai Lin, Erling Jellum, Mirco Theile, Tassilo Tanneberger, Binqi Sun, Chadlia Jerad, Ruomu Xu, Guangyu Feng, Christian Menard, Marten Lohstroh, Jeronimo Castrillon, Sanjit Seshia, Edward Lee

    Abstract: This paper introduces the Precision-Timed Virtual Machine (PretVM), an intermediate platform facilitating the execution of quasi-static schedules compiled from a subset of programs written in the Lingua Franca (LF) coordination language. The subset consists of those programs that in principle should have statically verifiable and predictable timing behavior. The PretVM provides a schedule with wel… ▽ More

    Submitted 25 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  9. arXiv:2406.02787  [pdf, other

    cs.CL cs.AI cs.LG

    Disentangling Logic: The Role of Context in Large Language Model Reasoning Capabilities

    Authors: Wenyue Hua, Kaijie Zhu, Lingyao Li, Lizhou Fan, Shuhang Lin, Mingyu **, Haochen Xue, Zelong Li, **Dong Wang, Yongfeng Zhang

    Abstract: This study intends to systematically disentangle pure logic reasoning and text understanding by investigating the contrast across abstract and contextualized logical problems from a comprehensive set of domains. We explore whether LLMs demonstrate genuine reasoning capabilities across various domains when the underlying logical structure remains constant. We focus on two main questions (1) Can abs… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 22 pages, 9 figures

  10. arXiv:2406.01304  [pdf, other

    cs.CL cs.AI cs.SE

    CodeR: Issue Resolving with Multi-Agent and Task Graphs

    Authors: Dong Chen, Shaoxin Lin, Muhan Zeng, Daoguang Zan, Jian-Gang Wang, Anton Cheshkov, Jun Sun, Hao Yu, Guoliang Dong, Artem Aliev, Jie Wang, Xiao Cheng, Guangtai Liang, Yuchi Ma, Pan Bian, Tao Xie, Qianxiang Wang

    Abstract: GitHub issue resolving recently has attracted significant attention from academia and industry. SWE-bench is proposed to measure the performance in resolving issues. In this paper, we propose CodeR, which adopts a multi-agent framework and pre-defined task graphs to Repair & Resolve reported bugs and add new features within code Repository. On SWE-bench lite, CodeR is able to solve 28.33% of issue… ▽ More

    Submitted 10 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: https://github.com/NL2Code/CodeR

  11. arXiv:2406.00427  [pdf, other

    cs.CV

    You Only Need Less Attention at Each Stage in Vision Transformers

    Authors: Shuoxi Zhang, Hanpeng Liu, Stephen Lin, Kun He

    Abstract: The advent of Vision Transformers (ViTs) marks a substantial paradigm shift in the realm of computer vision. ViTs capture the global information of images through self-attention modules, which perform dot product computations among patchified image tokens. While self-attention modules empower ViTs to capture long-range dependencies, the computational complexity grows quadratically with the number… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Camera-Ready; 10 pages, 3 figures

  12. arXiv:2405.21075  [pdf, other

    cs.CV cs.CL

    Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

    Authors: Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing Sun

    Abstract: In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs) have emerged as a focal point in recent advancements. However, the predominant focus remains on develo** their capabilities in static image understanding. The potential of MLLMs in processing sequential visual data is still insufficiently explored, highlighting the absence of a comprehensive, high-quality… ▽ More

    Submitted 16 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Project Page: https://video-mme.github.io

  13. arXiv:2405.18497  [pdf, other

    cs.IT

    Capacity Results for Non-Ergodic Multi-Modal Broadcast Channels with Controllable Statistics

    Authors: Alireza Vahid, Shih-Chun Lin

    Abstract: Movable antennas and reconfigurable intelligent surfaces enable a new paradigm in which channel statistics can be controlled and altered. Further, the known trajectory and operation protocol of communication satellites results in networks with predictable statistics. The predictability of future changes results in a non-ergodic model for which the fundamentals are largely unknown. We consider the… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

    Comments: under review

  14. arXiv:2405.17477  [pdf, other

    cs.LG cs.AI

    OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning

    Authors: Sheng Yue, Xingyuan Hua, Ju Ren, Sen Lin, Junshan Zhang, Yaoxue Zhang

    Abstract: In this paper, we study offline-to-online Imitation Learning (IL) that pretrains an imitation policy from static demonstration data, followed by fast finetuning with minimal environmental interaction. We find the naïve combination of existing offline IL and online IL methods tends to behave poorly in this context, because the initial discriminator (often used in online IL) operates randomly and di… ▽ More

    Submitted 30 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: International Conference on Machine Learning (ICML)

  15. arXiv:2405.17476  [pdf, other

    cs.LG cs.AI

    How to Leverage Diverse Demonstrations in Offline Imitation Learning

    Authors: Sheng Yue, Jiani Liu, Xingyuan Hua, Ju Ren, Sen Lin, Junshan Zhang, Yaoxue Zhang

    Abstract: Offline Imitation Learning (IL) with imperfect demonstrations has garnered increasing attention owing to the scarcity of expert data in many real-world domains. A fundamental problem in this scenario is how to extract positive behaviors from noisy data. In general, current approaches to the problem select data building on state-action similarity to given expert demonstrations, neglecting precious… ▽ More

    Submitted 30 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: International Conference on Machine Learning (ICML)

  16. arXiv:2405.16634  [pdf, other

    cs.GR

    Fast and Globally Consistent Normal Orientation based on the Winding Number Normal Consistency

    Authors: Siyou Lin, Zuoqiang Shi, Yebin Liu

    Abstract: Estimating a consistently oriented normal vector field for an unoriented point cloud enables a number of important downstream applications in computer graphics. While normal estimation for a small patch of points can be done with simple techniques like principal component analysis (PCA), orienting these normals to be globally consistent has been a notoriously difficult problem. Some recent methods… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  17. arXiv:2405.15334  [pdf, other

    cs.CL

    Detection and Positive Reconstruction of Cognitive Distortion sentences: Mandarin Dataset and Evaluation

    Authors: Shuya Lin, Yuxiong Wang, Jonathan Dong, Shiguang Ni

    Abstract: This research introduces a Positive Reconstruction Framework based on positive psychology theory. Overcoming negative thoughts can be challenging, our objective is to address and reframe them through a positive reinterpretation. To tackle this challenge, a two-fold approach is necessary: identifying cognitive distortions and suggesting a positively reframed alternative while preserving the origina… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  18. arXiv:2405.14358  [pdf, other

    cs.MA

    AI-Olympics: Exploring the Generalization of Agents through Open Competitions

    Authors: Chen Wang, Yan Song, Shuai Wu, Sa Wu, Ruizhi Zhang, Shu Lin, Haifeng Zhang

    Abstract: Between 2021 and 2023, AI-Olympics, a series of online AI competitions was hosted by the online evaluation platform Jidi in collaboration with the IJCAI committee. In these competitions, an agent is required to accomplish diverse sports tasks in a two-dimensional continuous world, while competing against an opponent. This paper provides a brief overview of the competition series and highlights not… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: IJCAI 2024 Demo Track Paper

  19. arXiv:2405.12117  [pdf, other

    cs.DC

    Strongly-Consistent Distributed Discrete-event Systems

    Authors: Peter Donovan, Erling Jellum, Byeonggil Jun, Hokeun Kim, Edward A. Lee, Shaokai Lin, Marten Lohstroh, Anirudh Rengarajan

    Abstract: Discrete-event (DE) systems are concurrent programs where components communicate via tagged events, where tags are drawn from a totally ordered set. Reactors are an emerging model of computation based on DE and realized in the open-source coordination language Lingua Franca. Distributed DE (DDE) systems are DE systems where the components (reactors) communicate over networks. The prior art has req… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  20. arXiv:2405.11734  [pdf, other

    cs.IT

    Finite Field Multiple Access for Sourced Massive Random Access with Finite Blocklength

    Authors: Qi-yue Yu, Shi-wen Lin, Shu Lin

    Abstract: For binary source transmission, this paper proposes an element-pair (EP) coding scheme for supporting sourced massive random access, which is used to solve the finite blocklength (FBL) of multiuser reliability transmission problem. In this paper, we first give the definition of an EP, which is used as a virtual resource. If the Cartesian product of $J$ distinct EPs satisfies the unique sum-pattern… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.14086

  21. arXiv:2405.09487  [pdf, other

    cs.CV

    Color Space Learning for Cross-Color Person Re-Identification

    Authors: Jiahao Nie, Shan Lin, Alex C. Kot

    Abstract: The primary color profile of the same identity is assumed to remain consistent in typical Person Re-identification (Person ReID) tasks. However, this assumption may be invalid in real-world situations and images hold variant color profiles, because of cross-modality cameras or identity with different clothing. To address this issue, we propose Color Space Learning (CSL) for those Cross-Color Perso… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: Accepted by ICME 2024 (Oral)

  22. arXiv:2405.08765  [pdf, other

    cs.CV

    Image to Pseudo-Episode: Boosting Few-Shot Segmentation by Unlabeled Data

    Authors: Jie Zhang, Yuhan Li, Yude Wang, Stephen Lin, Shiguang Shan

    Abstract: Few-shot segmentation (FSS) aims to train a model which can segment the object from novel classes with a few labeled samples. The insufficient generalization ability of models leads to unsatisfactory performance when the models lack enough labeled data from the novel classes. Considering that there are abundant unlabeled data available, it is promising to improve the generalization ability by expl… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  23. arXiv:2405.08748  [pdf, other

    cs.CV

    Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

    Authors: Zhimin Li, Jianwei Zhang, Qin Lin, Jiangfeng Xiong, Yanxin Long, Xinchi Deng, Yingfang Zhang, Xingchao Liu, Minbin Huang, Zedong Xiao, Dayou Chen, Jiajun He, Jiahao Li, Wenyue Li, Chen Zhang, Rongwei Quan, Jianxiang Lu, Jiabin Huang, Xiaoyan Yuan, Xiaoxiao Zheng, Yixuan Li, Jihong Zhang, Chao Zhang, Meng Chen, Jie Liu , et al. (20 additional authors not shown)

    Abstract: We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Mu… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Project Page: https://dit.hunyuan.tencent.com/

  24. arXiv:2405.07319  [pdf, other

    cs.CV

    LayGA: Layered Gaussian Avatars for Animatable Clothing Transfer

    Authors: Siyou Lin, Zhe Li, Zhaoqi Su, Zerong Zheng, Hongwen Zhang, Yebin Liu

    Abstract: Animatable clothing transfer, aiming at dressing and animating garments across characters, is a challenging problem. Most human avatar works entangle the representations of the human body and clothing together, which leads to difficulties for virtual try-on across identities. What's worse, the entangled representations usually fail to exactly track the sliding motion of garments. To overcome these… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH 2024 conference track

  25. arXiv:2405.04480  [pdf, other

    cs.NE cs.AI

    Concentration Tail-Bound Analysis of Coevolutionary and Bandit Learning Algorithms

    Authors: Per Kristian Lehre, Shishen Lin

    Abstract: Runtime analysis, as a branch of the theory of AI, studies how the number of iterations algorithms take before finding a solution (its runtime) depends on the design of the algorithm and the problem structure. Drift analysis is a state-of-the-art tool for estimating the runtime of randomised algorithms, such as evolutionary and bandit algorithms. Drift refers roughly to the expected progress towar… ▽ More

    Submitted 10 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted at International Joint Conference on Artificial Intelligence (IJCAI) 2024

  26. arXiv:2405.01525  [pdf, other

    cs.CL cs.AI

    FLAME: Factuality-Aware Alignment for Large Language Models

    Authors: Sheng-Chieh Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Wen-tau Yih, Xilun Chen

    Abstract: Alignment is a standard procedure to fine-tune pre-trained large language models (LLMs) to follow natural language instructions and serve as helpful AI assistants. We have observed, however, that the conventional alignment process fails to enhance the factual accuracy of LLMs, and often leads to the generation of more false facts (i.e. hallucination). In this paper, we study how to make the LLM al… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  27. arXiv:2405.00946  [pdf, other

    cs.LG

    SparseTSF: Modeling Long-term Time Series Forecasting with 1k Parameters

    Authors: Shengsheng Lin, Weiwei Lin, Wentai Wu, Haojun Chen, Junjie Yang

    Abstract: This paper introduces SparseTSF, a novel, extremely lightweight model for Long-term Time Series Forecasting (LTSF), designed to address the challenges of modeling complex temporal dependencies over extended horizons with minimal computational resources. At the heart of SparseTSF lies the Cross-Period Sparse Forecasting technique, which simplifies the forecasting task by decoupling the periodicity… ▽ More

    Submitted 3 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  28. arXiv:2405.00146  [pdf, other

    quant-ph cs.ET

    Averting multi-qubit burst errors in surface code magic state factories

    Authors: Jason D. Chadwick, Christopher Kang, Joshua Viszlai, Sophia Fuhui Lin, Frederic T. Chong

    Abstract: Fault-tolerant quantum computation relies on the assumption of time-invariant, sufficiently low physical error rates. However, current superconducting quantum computers suffer from frequent disruptive noise events, including cosmic ray impacts and shifting two-level system defects. Several methods have been proposed to mitigate these issues in software, but they add large overheads in terms of phy… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 13 pages, 12 figures

  29. arXiv:2404.19221  [pdf, other

    cs.CV cs.CL

    Transcrib3D: 3D Referring Expression Resolution through Large Language Models

    Authors: Jiading Fang, Xiangshan Tan, Shengjie Lin, Igor Vasiljevic, Vitor Guizilini, Hongyuan Mei, Rares Ambrus, Gregory Shakhnarovich, Matthew R Walter

    Abstract: If robots are to work effectively alongside people, they must be able to interpret natural language references to objects in their 3D environment. Understanding 3D referring expressions is challenging -- it requires the ability to both parse the 3D structure of the scene and correctly ground free-form language in the presence of distraction and clutter. We introduce Transcrib3D, an approach that b… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: CORLW 2023

  30. arXiv:2404.15532  [pdf, other

    cs.HC cs.AI cs.CL cs.CV cs.MA

    BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis

    Authors: Shuhang Lin, Wenyue Hua, Lingyao Li, Che-Jui Chang, Lizhou Fan, Jianchao Ji, Hang Hua, Mingyu **, Jiebo Luo, Yongfeng Zhang

    Abstract: This paper presents BattleAgent, an emulation system that combines the Large Vision-Language Model and Multi-agent System. This novel system aims to simulate complex dynamic interactions among multiple agents, as well as between agents and their environments, over a period of time. It emulates both the decision-making processes of leaders and the viewpoints of ordinary participants, such as soldie… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 26 pages, 14 figures The data and code for this project are accessible at https://github.com/agiresearch/battleagent

  31. arXiv:2404.12210  [pdf, other

    cs.CV

    An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training

    Authors: ** Gao, Shubo Lin, Shaoru Wang, Yutong Kou, Zeming Li, Liang Li, Congxuan Zhang, Xiaoqin Zhang, Yizheng Wang, Weiming Hu

    Abstract: Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features. In this paper, we question if the \textit{extremely simple} lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm, which is considerably less studied yet in contrast to the well-esta… ▽ More

    Submitted 25 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: A submission to IJCV

  32. arXiv:2404.12008  [pdf, other

    cs.IR cs.AI

    How Do Recommendation Models Amplify Popularity Bias? An Analysis from the Spectral Perspective

    Authors: Siyi Lin, Chongming Gao, Jiawei Chen, Sheng Zhou, Binbin Hu, Yan Feng, Chun Chen, Can Wang

    Abstract: Recommendation Systems (RS) are often plagued by popularity bias. When training a recommendation model on a typically long-tailed dataset, the model tends to not only inherit this bias but often exacerbate it, resulting in over-representation of popular items in the recommendation lists. This study conducts comprehensive empirical and theoretical analyses to expose the root causes of this phenomen… ▽ More

    Submitted 13 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: 23 pages, 9 figures

  33. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  34. arXiv:2404.09146  [pdf, other

    cs.CV cs.AI

    Fusion-Mamba for Cross-modality Object Detection

    Authors: Wenhao Dong, Haodong Zhu, Shaohui Lin, Xiaoyan Luo, Yunhang Shen, Xuhui Liu, Juan Zhang, Guodong Guo, Baochang Zhang

    Abstract: Cross-modality fusing complementary information from different modalities effectively improves object detection performance, making it more useful and robust for a wider range of applications. Existing fusion strategies combine different types of images or merge different backbone features through elaborated neural network modules. However, these methods neglect that modality disparities affect cr… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  35. arXiv:2404.08028  [pdf, other

    cs.LG cs.DC

    FedAuxHMTL: Federated Auxiliary Hard-Parameter Sharing Multi-Task Learning for Network Edge Traffic Classification

    Authors: Faisal Ahmed, Myung** Lee, Suresh Subramaniam, Motoharu Matsuura, Hiroshi Hasegawa, Shih-Chun Lin

    Abstract: Federated Learning (FL) has garnered significant interest recently due to its potential as an effective solution for tackling many challenges in diverse application scenarios, for example, data privacy in network edge traffic classification. Despite its recognized advantages, FL encounters obstacles linked to statistical data heterogeneity and labeled data scarcity during the training of single-ta… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  36. arXiv:2404.07847  [pdf, other

    cs.CV

    The Effectiveness of a Simplified Model Structure for Crowd Counting

    Authors: Lei Chen, Xinghang Gao, Fei Chao, Xiang Chang, Chih Min Lin, Xingen Gao, Shaopeng Lin, Hongyi Zhang, Juqiang Lin

    Abstract: In the field of crowd counting research, many recent deep learning based methods have demonstrated robust capabilities for accurately estimating crowd sizes. However, the enhancement in their performance often arises from an increase in the complexity of the model structure. This paper discusses how to construct high-performance crowd counting models using only simple structures. We proposes the F… ▽ More

    Submitted 18 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  37. arXiv:2404.06691  [pdf

    q-bio.BM cs.LG cs.NE

    Latent Chemical Space Searching for Plug-in Multi-objective Molecule Generation

    Authors: Ningfeng Liu, Jie Yu, Siyu Xiu, Xinfang Zhao, Siyu Lin, Bo Qiang, Ruqiu Zheng, Hongwei **, Liangren Zhang, Zhenming Liu

    Abstract: Molecular generation, an essential method for identifying new drug structures, has been supported by advancements in machine learning and computational technology. However, challenges remain in multi-objective generation, model adaptability, and practical application in drug discovery. In this study, we developed a versatile 'plug-in' molecular generation model that incorporates multiple objective… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  38. arXiv:2404.06075  [pdf, other

    cs.CV

    LIPT: Latency-aware Image Processing Transformer

    Authors: Junbo Qiao, Wei Li, Haizhen Xie, Hanting Chen, Yunshuai Zhou, Zhijun Tu, Jie Hu, Shaohui Lin

    Abstract: Transformer is leading a trend in the field of image processing. Despite the great success that existing lightweight image processing transformers have achieved, they are tailored to FLOPs or parameters reduction, rather than practical inference acceleration. In this paper, we present a latency-aware image processing transformer, termed LIPT. We devise the low-latency proportion LIPT block that su… ▽ More

    Submitted 28 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  39. arXiv:2404.05657  [pdf, other

    cs.CV

    MLP Can Be A Good Transformer Learner

    Authors: Sihao Lin, Pumeng Lyu, Dongrui Liu, Tao Tang, Xiaodan Liang, Andy Song, Xiaojun Chang

    Abstract: Self-attention mechanism is the key of the Transformer but often criticized for its computation demands. Previous token pruning works motivate their methods from the view of computation redundancy but still need to load the full network and require same memory costs. This paper introduces a novel strategy that simplifies vision transformers and reduces computational load through the selective remo… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: efficient transformer

  40. arXiv:2404.04965  [pdf

    cs.HC q-bio.NC

    Towards Develo** Brain-Computer Interfaces for People with Multiple Sclerosis

    Authors: John S. Russo, Tim Mahoney, Kirill Kokorin, Ashley Reynolds, Chin-Hsuan Sophie Lin, Sam E. John, David B. Grayden

    Abstract: Multiple Sclerosis (MS) is a severely disabling condition that leads to various neurological symptoms. A Brain-Computer Interface (BCI) may substitute some lost function; however, there is a lack of BCI research in people with MS. To progress this research area effectively and efficiently, we aimed to evaluate user needs and assess the feasibility and user-centric requirements of a BCI for people… ▽ More

    Submitted 8 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: 18 pages, 9 figures, 1 table. For supplementary material, please contact the corresponding author; corrected ordering of figures 6 and 7

  41. arXiv:2404.02573  [pdf, other

    cs.CV

    Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution

    Authors: Simiao Li, Yun Zhang, Wei Li, Hanting Chen, Wenjia Wang, Bingyi **g, Shaohui Lin, Jie Hu

    Abstract: Knowledge distillation (KD) is a promising yet challenging model compression technique that transfers rich learning representations from a well-performing but cumbersome teacher model to a compact student model. Previous methods for image super-resolution (SR) mostly compare the feature maps directly or after standardizing the dimensions with basic algebraic operations (e.g. average, dot-product).… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  42. arXiv:2404.00672  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    A General and Efficient Training for Transformer via Token Expansion

    Authors: Wenxuan Huang, Yunhang Shen, Jiao Xie, Baochang Zhang, Gaoqi He, Ke Li, Xing Sun, Shaohui Lin

    Abstract: The remarkable performance of Vision Transformers (ViTs) typically requires an extremely large training cost. Existing methods have attempted to accelerate the training of ViTs, yet typically disregard method universality with accuracy drop**. Meanwhile, they break the training consistency of the original transformers, including the consistency of hyper-parameters, architecture, and strategy, wh… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024. Code is available at https://github.com/Osilly/TokenExpansion

  43. arXiv:2403.19425  [pdf, ps, other

    eess.IV cs.CV

    A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES Challenge

    Authors: Ezequiel de la Rosa, Mauricio Reyes, Sook-Lei Liew, Alexandre Hutton, Roland Wiest, Johannes Kaesmacher, Uta Hanning, Arsany Hakim, Richard Zubal, Waldo Valenzuela, David Robben, Diana M. Sima, Vincenzo Anania, Arne Brys, James A. Meakin, Anne Mickan, Gabriel Broocks, Christian Heitkamp, Shengbo Gao, Kongming Liang, Ziji Zhang, Md Mahfuzur Rahman Siddiquee, Andriy Myronenko, Pooya Ashtari, Sabine Van Huffel , et al. (33 additional authors not shown)

    Abstract: Diffusion-weighted MRI (DWI) is essential for stroke diagnosis, treatment decisions, and prognosis. However, image and disease variability hinder the development of generalizable AI algorithms with clinical value. We address this gap by presenting a novel ensemble algorithm derived from the 2022 Ischemic Stroke Lesion Segmentation (ISLES) challenge. ISLES'22 provided 400 patient scans with ischemi… ▽ More

    Submitted 3 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  44. arXiv:2403.16286  [pdf, other

    eess.IV cs.CV

    HemoSet: The First Blood Segmentation Dataset for Automation of Hemostasis Management

    Authors: Albert J. Miao, Shan Lin, **gpei Lu, Florian Richter, Benjamin Ostrander, Emily K. Funk, Ryan K. Orosco, Michael C. Yip

    Abstract: Hemorrhaging occurs in surgeries of all types, forcing surgeons to quickly adapt to the visual interference that results from blood rapidly filling the surgical field. Introducing automation into the crucial surgical task of hemostasis management would offload mental and physical tasks from the surgeon and surgical assistants while simultaneously increasing the efficiency and safety of the operati… ▽ More

    Submitted 2 June, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  45. arXiv:2403.16116  [pdf, other

    cs.CV cs.AI cs.LG

    Self-Supervised Multi-Frame Neural Scene Flow

    Authors: Dongrui Liu, Daqi Liu, Xueqian Li, Sihao Lin, Hongwei xie, Bing Wang, Xiaojun Chang, Lei Chu

    Abstract: Neural Scene Flow Prior (NSFP) and Fast Neural Scene Flow (FNSF) have shown remarkable adaptability in the context of large out-of-distribution autonomous driving. Despite their success, the underlying reasons for their astonishing generalization capabilities remain unclear. Our research addresses this gap by examining the generalization capabilities of NSFP through the lens of uniform stability,… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  46. arXiv:2403.16032  [pdf, other

    cs.SE

    FineWAVE: Fine-Grained Warning Verification of Bugs for Automated Static Analysis Tools

    Authors: Han Liu, Jian Zhang, Cen Zhang, Xiaohan Zhang, Kaixuan Li, Sen Chen, Shang-Wei Lin, Yixiang Chen, Xinhua Li, Yang Liu

    Abstract: Automated Static Analysis Tools (ASATs) have evolved over time to assist in detecting bugs. However, the excessive false warnings can impede developers' productivity and confidence in the tools. Previous research efforts have explored learning-based methods to validate the reported warnings. Nevertheless, their coarse granularity, focusing on either long-term warnings or function-level alerts, whi… ▽ More

    Submitted 6 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  47. arXiv:2403.15975  [pdf, other

    cs.NI

    Prioritized Multi-Tenant Traffic Engineering for Dynamic QoS Provisioning in Autonomous SDN-OpenFlow Edge Networks

    Authors: Mohammad Sajid Shahriar, Faisal Ahmed, Genshe Chen, Khanh D. Pham, Suresh Subramaniam, Motoharu Matsuura, Hiroshi Hasegawa, Shih-Chun Lin

    Abstract: This letter indicates the critical need for prioritized multi-tenant quality-of-service (QoS) management by emerging mobile edge systems, particularly for high-throughput beyond fifth-generation networks. Existing traffic engineering tools utilize complex functions baked into closed, proprietary infrastructures, largely limiting design flexibility, scalability, and adaptiveness. Hence, this study… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  48. arXiv:2403.13793  [pdf, other

    cs.LG

    Evaluating Frontier Models for Dangerous Capabilities

    Authors: Mary Phuong, Matthew Aitchison, Elliot Catt, Sarah Cogan, Alexandre Kaskasoli, Victoria Krakovna, David Lindner, Matthew Rahtz, Yannis Assael, Sarah Hodkinson, Heidi Howard, Tom Lieberum, Ramana Kumar, Maria Abi Raad, Albert Webson, Lewis Ho, Sharon Lin, Sebastian Farquhar, Marcus Hutter, Gregoire Deletang, Anian Ruoss, Seliem El-Sayed, Sasha Brown, Anca Dragan, Rohin Shah , et al. (2 additional authors not shown)

    Abstract: To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations cover four areas: (1) persuasion and deception; (2) cyber-security; (3) self-proliferation; and (4) self-reasoning. We do not find evidence of strong dangerous… ▽ More

    Submitted 5 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  49. arXiv:2403.13379  [pdf

    cs.CE

    Application of advanced ultrasonic testing methods to Dissimilar Metal Welds -- Comparison of simulated and experimental results

    Authors: Audrey Gardahaut, Hugues Lourme, Steve Mahaut, Masaki Nagai, Shan Lin

    Abstract: Widely present in the primary circuit of Nuclear Power Plants (NPP), Dissimilar Metal Welds (DMW) are inspected using Ultrasonic nondestructive Testing (UT) techniques to ensure the integrity of the structure and detect defects such as Stress Corrosion Cracking (SCC).In a previous collaborative research, CRIEPI and CEA have worked on the understanding of the propagation of ultrasonic waves in comp… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  50. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/