Skip to main content

Showing 1–50 of 426 results for author: Lin, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01492  [pdf, other

    cs.CL cs.AI

    RegMix: Data Mixture as Regression for Language Model Pre-training

    Authors: Qian Liu, Xiaosen Zheng, Niklas Muennighoff, Guangtao Zeng, Longxu Dou, Tianyu Pang, **g Jiang, Min Lin

    Abstract: The data mixture for large language model pre-training significantly impacts performance, yet how to determine an effective mixture remains unclear. We propose RegMix to automatically identify a high-performing data mixture by formulating it as a regression task. RegMix involves training a set of small models with diverse data mixtures and fitting a regression model to predict their performance gi… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00631  [pdf, other

    cs.LG cs.AI

    TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets

    Authors: **tai Chen, Yaojun Hu, Yue Wang, Yingzhou Lu, Xu Cao, Miao Lin, Hongxia Xu, Jian Wu, Cao Xiao, Jimeng Sun, Lucas Glass, Kexin Huang, Marinka Zitnik, Tianfan Fu

    Abstract: Clinical trials are pivotal for develo** new medical treatments, yet they typically pose some risks such as patient mortality, adverse events, and enrollment failure that waste immense efforts spanning over a decade. Applying artificial intelligence (AI) to forecast or simulate key events in clinical trials holds great potential for providing insights to guide trial designs. However, complex dat… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  3. arXiv:2407.00497  [pdf, other

    cs.CL

    LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

    Authors: Jiahao Ying, Mingbao Lin, Yixin Cao, Wei Tang, Bo Wang, Qianru Sun, Xuan**g Huang, Shuicheng Yan

    Abstract: This paper introduces the innovative "LLMs-as-Instructors" framework, which leverages the advanced Large Language Models (LLMs) to autonomously enhance the training of smaller target models. Inspired by the theory of "Learning from Errors", this framework employs an instructor LLM to meticulously analyze the specific errors within a target model, facilitating targeted and efficient training cycles… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  4. arXiv:2407.00474  [pdf, other

    cs.LG cs.AI

    MH-pFLGB: Model Heterogeneous personalized Federated Learning via Global Bypass for Medical Image Analysis

    Authors: Luyuan Xie, Manqing Lin, ChenMing Xu, Tianyu Luan, Zhipeng Zeng, Wenjun Qian, Cong Li, Yuejian Fang, Qingni Shen, Zhonghai Wu

    Abstract: In the evolving application of medical artificial intelligence, federated learning is notable for its ability to protect training data privacy. Federated learning facilitates collaborative model development without the need to share local data from healthcare institutions. Yet, the statistical and system heterogeneity among these institutions poses substantial challenges, which affects the effecti… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  5. arXiv:2407.00462  [pdf, other

    cs.CV cs.AI

    pFLFE: Cross-silo Personalized Federated Learning via Feature Enhancement on Medical Image Segmentation

    Authors: Luyuan Xie, Manqing Lin, Siyuan Liu, ChenMing Xu, Tianyu Luan, Cong Li, Yuejian Fang, Qingni Shen, Zhonghai Wu

    Abstract: In medical image segmentation, personalized cross-silo federated learning (FL) is becoming popular for utilizing varied data across healthcare settings to overcome data scarcity and privacy concerns. However, existing methods often suffer from client drift, leading to inconsistent performance and delayed training. We propose a new framework, Personalized Federated Learning via Feature Enhancement… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  6. arXiv:2406.18173  [pdf, other

    cs.CL

    UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs

    Authors: Wenhao Li, Mingbao Lin, Yunshan Zhong, Shuicheng Yan, Rongrong Ji

    Abstract: Managing long texts is challenging for large language models (LLMs) due to limited context window sizes. This study introduces UIO-LLMs, an unbiased incremental optimization approach for memory-enhanced transformers under long-context settings. We initially conceptualize the process as a streamlined encoder-decoder framework where the weights-shared encoder and decoder respectively encapsulate a c… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  7. arXiv:2406.17628  [pdf, other

    cs.CV cs.CR

    Video Inpainting Localization with Contrastive Learning

    Authors: Zijie Lou, Gang Cao, Man Lin

    Abstract: Deep video inpainting is typically used as malicious manipulation to remove important objects for creating fake videos. It is significant to identify the inpainted regions blindly. This letter proposes a simple yet effective forensic scheme for Video Inpainting LOcalization with ContrAstive Learning (ViLocal). Specifically, a 3D Uniformer encoder is applied to the video noise residual for learning… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2406.13576

  8. arXiv:2406.14162  [pdf, other

    cs.IR cs.AI cs.CL

    DIRAS: Efficient LLM-Assisted Annotation of Document Relevance in Retrieval Augmented Generation

    Authors: **gwei Ni, Tobias Schimanski, Meihong Lin, Mrinmaya Sachan, Elliott Ash, Markus Leippold

    Abstract: Retrieval Augmented Generation (RAG) is widely employed to ground responses to queries on domain-specific documents. But do RAG implementations leave out important information or excessively include irrelevant information? To allay these concerns, it is necessary to annotate domain-specific benchmarks to evaluate information retrieval (IR) performance, as relevance definitions vary across queries… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  9. arXiv:2406.13576  [pdf, other

    cs.CV cs.CR

    Trusted Video Inpainting Localization via Deep Attentive Noise Learning

    Authors: Zijie Lou, Gang Cao, Man Lin

    Abstract: Digital video inpainting techniques have been substantially improved with deep learning in recent years. Although inpainting is originally designed to repair damaged areas, it can also be used as malicious manipulation to remove important objects for creating false scenes and facts. As such it is significant to identify inpainted regions blindly. In this paper, we present a Trusted Video Inpaintin… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  10. arXiv:2406.10740  [pdf, other

    cs.CV

    FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models

    Authors: Zhikai Zhang, Yitang Li, Haofeng Huang, Mingxian Lin, Li Yi

    Abstract: Human motion synthesis is a fundamental task in computer animation. Despite recent progress in this field utilizing deep learning and motion capture data, existing methods are always limited to specific motion categories, environments, and styles. This poor generalizability can be partially attributed to the difficulty and expense of collecting large-scale and high-quality motion data. At the same… ▽ More

    Submitted 21 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  11. arXiv:2406.09836  [pdf, other

    cs.LG cs.CR

    Robustness-Inspired Defense Against Backdoor Attacks on Graph Neural Networks

    Authors: Zhiwei Zhang, Minhua Lin, Junjie Xu, Zongyu Wu, Enyan Dai, Suhang Wang

    Abstract: Graph Neural Networks (GNNs) have achieved promising results in tasks such as node classification and graph classification. However, recent studies reveal that GNNs are vulnerable to backdoor attacks, posing a significant threat to their real-world adoption. Despite initial efforts to defend against specific graph backdoor attacks, there is no work on defending against various types of backdoor at… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  12. arXiv:2406.09760  [pdf, other

    cs.CL cs.LG

    Bootstrap** Language Models with DPO Implicit Rewards

    Authors: Changyu Chen, Zichen Liu, Chao Du, Tianyu Pang, Qian Liu, Arunesh Sinha, Pradeep Varakantham, Min Lin

    Abstract: Human alignment in large language models (LLMs) is an active area of research. A recent groundbreaking work, direct preference optimization (DPO), has greatly simplified the process from past work in reinforcement learning from human feedback (RLHF) by bypassing the reward learning stage in RLHF. DPO, after training, provides an implicit reward model. In this work, we make a novel observation that… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  13. arXiv:2406.09648  [pdf, other

    cs.LG cs.CV

    An Intrinsic Vector Heat Network

    Authors: Alexander Gao, Maurice Chu, Mubbasir Kapadia, Ming C. Lin, Hsueh-Ti Derek Liu

    Abstract: Vector fields are widely used to represent and model flows for many science and engineering applications. This paper introduces a novel neural network architecture for learning tangent vector fields that are intrinsically defined on manifold surfaces embedded in 3D. Previous approaches to learning vector fields on surfaces treat vectors as multi-dimensional scalar fields, using traditional scalar-… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  14. arXiv:2406.09136  [pdf, other

    cs.CL cs.LG

    Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

    Authors: Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin

    Abstract: The recent development of chain-of-thought (CoT) decoding has enabled large language models (LLMs) to generate explicit logical reasoning paths for complex problem-solving. However, research indicates that these paths are not always deliberate and optimal. The tree-of-thought (ToT) method employs tree-searching to extensively explore the reasoning space and find better reasoning paths that CoT dec… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  15. arXiv:2406.06038  [pdf, other

    cs.RO

    Navigation and 3D Surface Reconstruction from Passive Whisker Sensing

    Authors: Michael A. Lin, Hao Li, Chengyi Xing, Mark R. Cutkosky

    Abstract: Whiskers provide a way to sense surfaces in the immediate environment without disturbing it. In this paper we present a method for using highly flexible, curved, passive whiskers mounted along a robot arm to gather sensory data as they brush past objects during normal robot motion. The information is useful both for guiding the robot in cluttered spaces and for reconstructing the exposed faces of… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:2210.12387

  16. arXiv:2406.04313  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.CY

    Improving Alignment and Robustness with Circuit Breakers

    Authors: Andy Zou, Long Phan, Justin Wang, Derek Duenas, Maxwell Lin, Maksym Andriushchenko, Rowan Wang, Zico Kolter, Matt Fredrikson, Dan Hendrycks

    Abstract: AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that interrupts the models as they respond with harmful outputs with "circuit breakers." Existing techniques aimed at improving alignment, such as refusal training, are often bypassed. Techniques such as adversarial training try to… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  17. arXiv:2406.02859   

    eess.AS cs.SD

    ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization

    Authors: Bi-Cheng Yan, Wei-Cheng Chao, Jiun-Ting Li, Yi-Cheng Wang, Hsin-Wei Wang, Meng-Shin Lin, Berlin Chen

    Abstract: Automatic pronunciation assessment (APA) manages to evaluate the pronunciation proficiency of a second language (L2) learner in a target language. Existing efforts typically draw on regression models for proficiency score prediction, where the models are trained to estimate target values without explicitly accounting for phoneme-awareness in the feature space. In this paper, we propose a contrasti… ▽ More

    Submitted 8 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: This paper has been withdrawn because the authors aim to achieve better organization in writing and more detailed experimental analysis

  18. arXiv:2406.01431  [pdf, other

    cs.RO

    Deep Stochastic Kinematic Models for Probabilistic Motion Forecasting in Traffic

    Authors: Laura Zheng, Sanghyun Son, **g Liang, Xijun Wang, Brian Clipp, Ming C. Lin

    Abstract: Kinematic priors have shown to be helpful in boosting generalization and performance in prior work on trajectory forecasting. Specifically, kinematic priors have been applied such that models predict a set of actions instead of future output trajectories. By unrolling predicted trajectories via time integration and models of kinematic dynamics, predicted trajectories are not only kinematically fea… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 8 pages

  19. arXiv:2406.01425  [pdf, other

    cs.CV

    Sensitivity-Informed Augmentation for Robust Segmentation

    Authors: Laura Zheng, Wenjie Wei, Tony Wu, Jacob Clements, Shreelekha Revankar, Andre Harrison, Yu Shen, Ming C. Lin

    Abstract: Segmentation is an integral module in many visual computing applications such as virtual try-on, medical imaging, autonomous driving, and agricultural automation. These applications often involve either widespread consumer use or highly variable environments, both of which can degrade the quality of visual sensor data, whether from a common mobile phone or an expensive satellite imaging camera. In… ▽ More

    Submitted 16 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: 10 pages

  20. arXiv:2406.01288  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses

    Authors: Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, **g Jiang, Min Lin

    Abstract: Recently, Anil et al. (2024) show that many-shot (up to hundreds of) demonstrations can jailbreak state-of-the-art LLMs by exploiting their long-context capability. Nevertheless, is it possible to use few-shot demonstrations to efficiently jailbreak LLMs within limited context sizes? While the vanilla few-shot jailbreaking may be inefficient, we propose improved techniques such as injecting specia… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  21. arXiv:2406.01032  [pdf, other

    cs.LG cs.AI

    LLM and GNN are Complementary: Distilling LLM for Multimodal Graph Learning

    Authors: Junjie Xu, Zongyu Wu, Minhua Lin, Xiang Zhang, Suhang Wang

    Abstract: Recent progress in Graph Neural Networks (GNNs) has greatly enhanced the ability to model complex molecular structures for predicting properties. Nevertheless, molecular data encompasses more than just graph structures, including textual and visual information that GNNs do not handle well. To bridge this gap, we present an innovative framework that utilizes multimodal molecular data to extract ins… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  22. arXiv:2405.21018  [pdf, other

    cs.LG cs.CL cs.CR

    Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

    Authors: Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, **dong Gu, Yang Liu, Xiaochun Cao, Min Lin

    Abstract: Large language models (LLMs) are being rapidly developed, and a key component of their widespread deployment is their safety-related alignment. Many red-teaming efforts aim to jailbreak LLMs, where among these efforts, the Greedy Coordinate Gradient (GCG) attack's success has led to a growing interest in the study of optimization-based jailbreaking techniques. Although GCG is a significant milesto… ▽ More

    Submitted 5 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  23. arXiv:2405.19026  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    DiveR-CT: Diversity-enhanced Red Teaming with Relaxing Constraints

    Authors: Andrew Zhao, Quentin Xu, Matthieu Lin, Shenzhi Wang, Yong-** Liu, Zilong Zheng, Gao Huang

    Abstract: Recent advances in large language models (LLMs) have made them indispensable, raising significant concerns over managing their safety. Automated red teaming offers a promising alternative to the labor-intensive and error-prone manual probing for vulnerabilities, providing more consistent and scalable safety evaluations. However, existing approaches often compromise diversity by focusing on maximiz… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  24. arXiv:2405.18810  [pdf, other

    cs.CV cs.AI

    UniPTS: A Unified Framework for Proficient Post-Training Sparsity

    Authors: **g**g Xie, Yuxin Zhang, Mingbao Lin, Zhihang Lin, Liujuan Cao, Rongrong Ji

    Abstract: Post-training Sparsity (PTS) is a recently emerged avenue that chases efficient network sparsity with limited data in need. Existing PTS methods, however, undergo significant performance degradation compared with traditional methods that retrain the sparse networks via the whole dataset, especially at high sparsity ratios. In this paper, we attempt to reconcile this disparity by transposing three… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR2024

  25. arXiv:2405.16279  [pdf, other

    physics.ins-det cs.AI

    AI-Assisted Detector Design for the EIC (AID(2)E)

    Authors: M. Diefenthaler, C. Fanelli, L. O. Gerlach, W. Guan, T. Horn, A. Jentsch, M. Lin, K. Nagai, H. Nayak, C. Pecar, K. Suresh, A. Vossen, T. Wang, T. Wenaus

    Abstract: Artificial Intelligence is poised to transform the design of complex, large-scale detectors like the ePIC at the future Electron Ion Collider. Featuring a central detector with additional detecting systems in the far forward and far backward regions, the ePIC experiment incorporates numerous design parameters and objectives, including performance, physics reach, and cost, constrained by mechanical… ▽ More

    Submitted 28 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: 11 pages, 4 figures, AI4EIC 2023 proceeding

  26. arXiv:2405.15362  [pdf, other

    cs.LG cs.CL cs.DC

    Pipeline Parallelism with Controllable Memory

    Authors: Penghui Qi, Xinyi Wan, Nyamdavaa Amar, Min Lin

    Abstract: Pipeline parallelism has been widely explored, but most existing schedules lack a systematic methodology. In this paper, we propose a framework to decompose pipeline schedules as repeating a building block and we show that the lifespan of the building block decides the peak activation memory of the pipeline schedule. Guided by the observations, we find that almost all existing pipeline schedules,… ▽ More

    Submitted 10 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  27. arXiv:2405.14341  [pdf, other

    cs.HC

    How do Observable Users Decompose D3 Code? An Exploratory Study

    Authors: Melissa Lin, Heer Patel, Medina Lamkin, Tukey Tu, Hannah Bako, Soham Raut, Leilani Battle

    Abstract: Users often struggle to program visualizations using complex toolkits like D3. Before we can design effective code assistants to support them, we must first understand how D3 users reason about their code. In this work, we explore users' understanding of D3 using an important gauge of code comprehension in CS education: code decomposition. We qualitatively analyze 560 D3 programs published on Obse… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  28. arXiv:2405.13685  [pdf, other

    cs.CV

    Prompt Mixing in Diffusion Models using the Black Scholes Algorithm

    Authors: Divya Kothandaraman, Ming Lin, Dinesh Manocha

    Abstract: We introduce a novel approach for prompt mixing, aiming to generate images at the intersection of multiple text prompts using pre-trained text-to-image diffusion models. At each time step during diffusion denoising, our algorithm forecasts predictions w.r.t. the generated image and makes informed text conditioning decisions. To do so, we leverage the connection between diffusion models (rooted in… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  29. Rethinking Graph Backdoor Attacks: A Distribution-Preserving Perspective

    Authors: Zhiwei Zhang, Minhua Lin, Enyan Dai, Suhang Wang

    Abstract: Graph Neural Networks (GNNs) have shown remarkable performance in various tasks. However, recent works reveal that GNNs are vulnerable to backdoor attacks. Generally, backdoor attack poisons the graph by attaching backdoor triggers and the target class label to a set of nodes in the training graph. A GNN trained on the poisoned graph will then be misled to predict test nodes attached with trigger… ▽ More

    Submitted 20 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted in KDD 2024

  30. arXiv:2405.08786  [pdf, other

    cs.CV

    Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring

    Authors: Tiantian Zhang, Manxi Lin, Hongda Guo, Xiaofan Zhang, Ka Fung Peter Chiu, Aasa Feragen, Qi Dou

    Abstract: The Prostate Imaging Reporting and Data System (PI-RADS) is pivotal in the diagnosis of clinically significant prostate cancer through MRI imaging. Current deep learning-based PI-RADS scoring methods often lack the incorporation of essential PI-RADS clinical guidelines~(PICG) utilized by radiologists, potentially compromising scoring accuracy. This paper introduces a novel approach that adapts a m… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  31. arXiv:2405.08780  [pdf

    cs.CV cs.AI

    Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling

    Authors: Gregory Holste, Mingquan Lin, Ruiwen Zhou, Fei Wang, Lei Liu, Qi Yan, Sarah H. Van Tassel, Kyle Kovacs, Emily Y. Chew, Zhiyong Lu, Zhangyang Wang, Yifan Peng

    Abstract: Deep learning has enabled breakthroughs in automated diagnosis from medical imaging, with many successful applications in ophthalmology. However, standard medical image classification approaches only assess disease presence at the time of acquisition, neglecting the common clinical setting of longitudinal imaging. For slow, progressive eye diseases like age-related macular degeneration (AMD) and p… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  32. arXiv:2405.06944  [pdf, other

    cs.CV

    Learning Monocular Depth from Focus with Event Focal Stack

    Authors: Chenxu Jiang, Mingyuan Lin, Chi Zhang, Zhenghai Wang, Lei Yu

    Abstract: Depth from Focus estimates depth by determining the moment of maximum focus from multiple shots at different focal distances, i.e. the Focal Stack. However, the limited sampling rate of conventional optical cameras makes it difficult to obtain sufficient focus cues during the focal sweep. Inspired by biological vision, the event camera records intensity changes over time in extremely low latency,… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  33. arXiv:2405.06918  [pdf, other

    cs.CV

    Super-Resolving Blurry Images with Events

    Authors: Chi Zhang, Mingyuan Lin, Xiang Zhang, Chenxu Jiang, Lei Yu

    Abstract: Super-resolution from motion-blurred images poses a significant challenge due to the combined effects of motion blur and low spatial resolution. To address this challenge, this paper introduces an Event-based Blurry Super Resolution Network (EBSR-Net), which leverages the high temporal resolution of events to mitigate motion blur and improve high-resolution image prediction. Specifically, we propo… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  34. arXiv:2405.06822  [pdf, other

    cs.LG cs.AI

    MH-pFLID: Model Heterogeneous personalized Federated Learning via Injection and Distillation for Medical Data Analysis

    Authors: Luyuan Xie, Manqing Lin, Tianyu Luan, Cong Li, Yuejian Fang, Qingni Shen, Zhonghai Wu

    Abstract: Federated learning is widely used in medical applications for training global models without needing local data access. However, varying computational capabilities and network architectures (system heterogeneity), across clients pose significant challenges in effectively aggregating information from non-independently and identically distributed (non-IID) data. Current federated learning methods us… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: This paper is accepted by ICML 2024

  35. arXiv:2405.05803  [pdf, other

    cs.CV cs.AI

    Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference

    Authors: Zhihang Lin, Mingbao Lin, Luxi Lin, Rongrong Ji

    Abstract: Multimodal large language models (MLLMs) demand considerable computations for inference due to the extensive parameters and the additional input tokens needed for visual information representation. Herein, we introduce Visual Tokens Withdrawal (VTW), a plug-and-play module to boost MLLMs for rapid inference. Our approach is inspired by two intriguing phenomena we have observed: (1) the attention s… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  36. arXiv:2405.05363  [pdf, other

    cs.CV cs.RO

    LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation

    Authors: Tianrui Guan, Yurou Yang, Harry Cheng, Muyuan Lin, Richard Kim, Rajasimman Madhivanan, Arnie Sen, Dinesh Manocha

    Abstract: In this paper, we present LOC-ZSON, a novel Language-driven Object-Centric image representation for object navigation task within complex scenes. We propose an object-centric image representation and corresponding losses for visual-language model (VLM) fine-tuning, which can handle complex object-level queries. In addition, we design a novel LLM-based augmentation and prompt templates for stabilit… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted to ICRA 2024

  37. arXiv:2404.17230  [pdf, other

    cs.CV

    ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion

    Authors: Ziyue Zhang, Mingbao Lin, Rongrong Ji

    Abstract: We introduce ObjectAdd, a training-free diffusion modification method to add user-expected objects into user-specified area. The motive of ObjectAdd stems from: first, describing everything in one prompt can be difficult, and second, users often need to add objects into the generated image. To accommodate with real world, our ObjectAdd maintains accurate image consistency after adding objects with… ▽ More

    Submitted 2 May, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: 12 pages

  38. arXiv:2404.15141  [pdf, other

    cs.CV cs.AI

    CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method

    Authors: Mingbao Lin, Zhihang Lin, Wengyi Zhan, Liujuan Cao, Rongrong Ji

    Abstract: Transforming large pre-trained low-resolution diffusion models to cater to higher-resolution demands, i.e., diffusion extrapolation, significantly improves diffusion adaptability. We propose tuning-free CutDiffusion, aimed at simplifying and accelerating the diffusion extrapolation process, making it more affordable and improving performance. CutDiffusion abides by the existing patch-wise extrapol… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  39. arXiv:2404.13972  [pdf, other

    cs.CV

    Non-Uniform Exposure Imaging via Neuromorphic Shutter Control

    Authors: Mingyuan Lin, Jian Liu, Chi Zhang, Zibo Zhao, Chu He, Lei Yu

    Abstract: By leveraging the blur-noise trade-off, imaging with non-uniform exposures largely extends the image acquisition flexibility in harsh environments. However, the limitation of conventional cameras in perceiving intra-frame dynamic information prevents existing methods from being implemented in the real-world frame acquisition for real-time adaptive camera shutter control. To address this challenge,… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  40. arXiv:2404.13445  [pdf, other

    cs.CV cs.GR

    DMesh: A Differentiable Mesh Representation

    Authors: Sanghyun Son, Matheus Gadelha, Yang Zhou, Zexiang Xu, Ming C. Lin, Yi Zhou

    Abstract: We present a differentiable representation, DMesh, for general 3D triangular meshes. DMesh considers both the geometry and connectivity information of a mesh. In our design, we first get a set of convex tetrahedra that compactly tessellates the domain based on Weighted Delaunay Triangulation (WDT), and select triangular faces on the tetrahedra to define the final mesh. We formulate probability of… ▽ More

    Submitted 1 June, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

    Comments: 35 pages, 22 figures. Updated with more analysis and experimental results

  41. arXiv:2404.09918  [pdf, other

    cs.CV

    EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting

    Authors: Min-Hui Lin, Mahesh Reddy, Guillaume Berger, Michel Sarkis, Fatih Porikli, Ning Bi

    Abstract: In this paper, we present EdgeRelight360, an approach for real-time video portrait relighting on mobile devices, utilizing text-conditioned generation of 360-degree high dynamic range image (HDRI) maps. Our method proposes a diffusion-based text-to-360-degree image generation in the HDR domain, taking advantage of the HDR10 standard. This technique facilitates the generation of high-quality, reali… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Camera-ready version (CVPR workshop - EDGE'24)

  42. arXiv:2404.09445  [pdf, other

    cs.LG cs.AI cs.CV

    Exploring Text-to-Motion Generation with Human Preference

    Authors: Jenny Sheng, Matthieu Lin, Andrew Zhao, Kevin Pruvost, Yu-Hui Wen, Yangguang Li, Gao Huang, Yong-** Liu

    Abstract: This paper presents an exploration of preference learning in text-to-motion generation. We find that current improvements in text-to-motion generation still rely on datasets requiring expert labelers with motion capture systems. Instead, learning from human preference data does not require motion capture systems; a labeler with no expertise simply compares two generated motions. This is particular… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024 HuMoGen Workshop

  43. arXiv:2404.09231  [pdf, other

    cs.CV

    Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms

    Authors: Diandian Guo, Manxi Lin, Jialun Pei, He Tang, Yueming **, Pheng-Ann Heng

    Abstract: A comprehensive understanding of surgical scenes allows for monitoring of the surgical process, reducing the occurrence of accidents and enhancing efficiency for medical professionals. Semantic modeling within operating rooms, as a scene graph generation (SGG) task, is challenging since it involves consecutive recognition of subtle surgical actions over prolonged periods. To address this challenge… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 10 pages, 4 figures, 3 tables

  44. arXiv:2404.08135  [pdf, other

    cs.CV

    SciFlow: Empowering Lightweight Optical Flow Models with Self-Cleaning Iterations

    Authors: Jamie Menjay Lin, Jisoo Jeong, Hong Cai, Risheek Garrepalli, Kai Wang, Fatih Porikli

    Abstract: Optical flow estimation is crucial to a variety of vision tasks. Despite substantial recent advancements, achieving real-time on-device optical flow estimation remains a complex challenge. First, an optical flow model must be sufficiently lightweight to meet computation and memory constraints to ensure real-time performance on devices. Second, the necessity for real-time on-device operation impose… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPRW 2024

  45. arXiv:2404.07847  [pdf, other

    cs.CV

    The Effectiveness of a Simplified Model Structure for Crowd Counting

    Authors: Lei Chen, Xinghang Gao, Fei Chao, Xiang Chang, Chih Min Lin, Xingen Gao, Shaopeng Lin, Hongyi Zhang, Juqiang Lin

    Abstract: In the field of crowd counting research, many recent deep learning based methods have demonstrated robust capabilities for accurately estimating crowd sizes. However, the enhancement in their performance often arises from an increase in the complexity of the model structure. This paper discusses how to construct high-performance crowd counting models using only simple structures. We proposes the F… ▽ More

    Submitted 18 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  46. arXiv:2404.07519  [pdf, other

    eess.IV cs.AI

    LATTE: Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer

    Authors: Jiing-** Wang, Ming-Guang Lin, An-Yeu, Wu

    Abstract: With the rise of Transformer models in NLP and CV domain, Multi-Head Attention has been proven to be a game-changer. However, its expensive computation poses challenges to the model throughput and efficiency, especially for the long sequence tasks. Exploiting the sparsity in attention has been proven to be an effective way to reduce computation. Nevertheless, prior works do not consider the variou… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  47. arXiv:2404.03608  [pdf, other

    cs.CL cs.AI

    Sailor: Open Language Models for South-East Asia

    Authors: Longxu Dou, Qian Liu, Guangtao Zeng, Jia Guo, Jiahui Zhou, Wei Lu, Min Lin

    Abstract: We present Sailor, a family of open language models ranging from 0.5B to 7B parameters, tailored for South-East Asian (SEA) languages. These models are continually pre-trained from Qwen1.5, a great language model for multilingual use cases. From Qwen1.5, Sailor models accept 200B to 400B tokens, primarily covering the languages of English, Chinese, Vietnamese, Thai, Indonesian, Malay, and Lao. The… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Code is available at https://github.com/sail-sg/sailor-llm

  48. arXiv:2404.02284  [pdf, other

    cs.RO

    APEX: Ambidextrous Dual-Arm Robotic Manipulation Using Collision-Free Generative Diffusion Models

    Authors: Apan Dastider, Hao Fang, Mingjie Lin

    Abstract: Dexterous manipulation, particularly adept coordinating and gras**, constitutes a fundamental and indispensable capability for robots, facilitating the emulation of human-like behaviors. Integrating this capability into robots empowers them to supplement and even supplant humans in undertaking increasingly intricate tasks in both daily life and industrial settings. Unfortunately, contemporary me… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Under Review in IEEE IROS 2024

  49. arXiv:2404.00032  [pdf, other

    cs.HC cs.CV eess.IV

    Deployment of Deep Learning Model in Real World Clinical Setting: A Case Study in Obstetric Ultrasound

    Authors: Chun Kit Wong, Mary Ngo, Manxi Lin, Zahra Bashir, Amihai Heen, Morten Bo Søndergaard Svendsen, Martin Grønnebæk Tolsgaard, Anders Nymark Christensen, Aasa Feragen

    Abstract: Despite the rapid development of AI models in medical image analysis, their validation in real-world clinical settings remains limited. To address this, we introduce a generic framework designed for deploying image-based AI models in such settings. Using this framework, we deployed a trained model for fetal ultrasound standard plane detection, and evaluated it in real-time sessions with both novic… ▽ More

    Submitted 22 March, 2024; originally announced April 2024.

    Comments: 10 pages

  50. arXiv:2403.18092  [pdf, other

    cs.CV

    OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation

    Authors: Jisoo Jeong, Hong Cai, Risheek Garrepalli, Jamie Menjay Lin, Munawar Hayat, Fatih Porikli

    Abstract: The scarcity of ground-truth labels poses one major challenge in develo** optical flow estimation models that are both generalizable and robust. While current methods rely on data augmentation, they have yet to fully exploit the rich information available in labeled video sequences. We propose OCAI, a method that supports robust frame interpolation by generating intermediate video frames alongsi… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR 2024