Skip to main content

Showing 1–50 of 2,166 results for author: Huang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01262  [pdf, other

    cs.LG

    Complementary Fusion of Deep Network and Tree Model for ETA Prediction

    Authors: YuRui Huang, Jie Zhang, HengDa Bao, Yang Yang, Jian Yang

    Abstract: Estimated time of arrival (ETA) is a very important factor in the transportation system. It has attracted increasing attentions and has been widely used as a basic service in navigation systems and intelligent transportation systems. In this paper, we propose a novel solution to the ETA estimation problem, which is an ensemble on tree models and neural networks. We proved the accuracy and robustne… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.00665  [pdf, other

    cs.CV

    Resolving Variable Respiratory Motion From Unsorted 4D Computed Tomography

    Authors: Yuliang Huang, Bjoern Eiben, Kris Thielemans, Jamie R. McClelland

    Abstract: 4D Computed Tomography (4DCT) is widely used for many clinical applications such as radiotherapy treatment planning, PET and ventilation imaging. However, common 4DCT methods reconstruct multiple breath cycles into a single, arbitrary breath cycle which can lead to various artefacts, impacting the downstream clinical applications. Surrogate driven motion models can estimate continuous variable mot… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI 2024

  3. Generative Iris Prior Embedded Transformer for Iris Restoration

    Authors: Yubo Huang, Jia Wang, Peipei Li, Liuyu Xiang, Peigang Li, Zhaofeng He

    Abstract: Iris restoration from complexly degraded iris images, aiming to improve iris recognition performance, is a challenging problem. Due to the complex degradation, directly training a convolutional neural network (CNN) without prior cannot yield satisfactory results. In this work, we propose a generative iris prior embedded Transformer model (Gformer), in which we build a hierarchical encoder-decoder… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Our code is available at https://github.com/sawyercharlton/Gformer

    Journal ref: 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 2023, pp. 510-515

  4. arXiv:2406.20044  [pdf, other

    cs.AI stat.CO stat.ML

    Electrostatics-based particle sampling and approximate inference

    Authors: Yongchao Huang

    Abstract: A new particle-based sampling and approximate inference method, based on electrostatics and Newton mechanics principles, is introduced with theoretical ground, algorithm design and experimental validation. This method simulates an interacting particle system (IPS) where particles, i.e. the freely-moving negative charges and spatially-fixed positive charges with magnitudes proportional to the targe… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  5. arXiv:2406.19705  [pdf, other

    cs.AI

    DISCO: Efficient Diffusion Solver for Large-Scale Combinatorial Optimization Problems

    Authors: Kexiong Yu, Hang Zhao, Yuhang Huang, Renjiao Yi, Kai Xu, Chenyang Zhu

    Abstract: Combinatorial Optimization (CO) problems are fundamentally crucial in numerous practical applications across diverse industries, characterized by entailing enormous solution space and demanding time-sensitive response. Despite significant advancements made by recent neural solvers, their limited expressiveness does not conform well to the multi-modal nature of CO landscapes. While some research ha… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  6. arXiv:2406.18977  [pdf, other

    cs.RO cs.CL cs.CV

    RoboUniView: Visual-Language Model with Unified View Representation for Robotic Manipulaiton

    Authors: Fanfan Liu, Feng Yan, Liming Zheng, Chengjian Feng, Yiyang Huang, Lin Ma

    Abstract: Utilizing Vision-Language Models (VLMs) for robotic manipulation represents a novel paradigm, aiming to enhance the model's ability to generalize to new objects and instructions. However, due to variations in camera specifications and mounting positions, existing methods exhibit significant performance disparities across different robotic platforms. To address this challenge, we propose RoboUniVie… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  7. arXiv:2406.18966  [pdf, other

    cs.CL

    UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models

    Authors: Siyuan Wu, Yue Huang, Chujie Gao, Dong** Chen, Qihui Zhang, Yao Wan, Tianyi Zhou, Xiangliang Zhang, Jianfeng Gao, Chaowei Xiao, Lichao Sun

    Abstract: Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted various fields by enabling high-quality synthetic data generation and reducing dependence on expensive human-generated datasets. Despite this, challenges remain in the areas of generalization, controllability, diversity, and truthfulness within the existing generative frameworks. To address these challenges, this pap… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  8. arXiv:2406.18837  [pdf, other

    cs.CV cs.RO

    Dense Monocular Motion Segmentation Using Optical Flow and Pseudo Depth Map: A Zero-Shot Approach

    Authors: Yuxiang Huang, Yuhao Chen, John Zelek

    Abstract: Motion segmentation from a single moving camera presents a significant challenge in the field of computer vision. This challenge is compounded by the unknown camera movements and the lack of depth information of the scene. While deep learning has shown impressive capabilities in addressing these issues, supervised models require extensive training on massive annotated datasets, and unsupervised mo… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: For the offical publication, see https://crv.pubpub.org/pub/iunjzl55

    Journal ref: Proceedings of the 21st Conference on Robots and Vision (2024)

  9. arXiv:2406.18664  [pdf, other

    cs.CL cs.LG

    Evaluating Copyright Takedown Methods for Language Models

    Authors: Boyi Wei, Weijia Shi, Yangsibo Huang, Noah A. Smith, Chiyuan Zhang, Luke Zettlemoyer, Kai Li, Peter Henderson

    Abstract: Language models (LMs) derive their capabilities from extensive training on diverse data, including potentially copyrighted material. These models can memorize and generate content similar to their training data, posing potential concerns. Therefore, model creators are motivated to develop mitigation methods that prevent generating protected content. We term this procedure as copyright takedowns fo… ▽ More

    Submitted 1 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: 31 pages, 9 figures, 14 tables

  10. Artificial Immune System of Secure Face Recognition Against Adversarial Attacks

    Authors: Min Ren, Yunlong Wang, Yuhao Zhu, Yongzhen Huang, Zhenan Sun, Qi Li, Tieniu Tan

    Abstract: Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Journal ref: International Journal of Computer Vision (IJCV), 2024

  11. arXiv:2406.18070  [pdf, other

    cs.CV

    EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation

    Authors: Baoqi Pei, Guo Chen, Jilan Xu, Yu** He, Yicheng Liu, Kanghua Pan, Yifei Huang, Yali Wang, Tong Lu, Limin Wang, Yu Qiao

    Abstract: In this report, we present our solutions to the EgoVis Challenges in CVPR 2024, including five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge. Building upon the video-language two-tower model and leveraging our meticulously organized egocentric video data, we introduce a novel foundation model called EgoVideo. This model is specifically designed to cater to the uniqu… ▽ More

    Submitted 30 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Champion solutions in the EgoVis CVPR 2024 workshop

  12. arXiv:2406.17797  [pdf, other

    physics.chem-ph cs.AI cs.LG

    MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis

    Authors: Shikun Feng, Jiaxin Zheng, Yinjun Jia, Yanwen Huang, Fengfeng Zhou, Wei-Ying Ma, Yanyan Lan

    Abstract: Molecular representation learning is pivotal for various molecular property prediction tasks related to drug discovery. Robust and accurate benchmarks are essential for refining and validating current methods. Existing molecular property benchmarks derived from wet experiments, however, face limitations such as data volume constraints, unbalanced label distribution, and noisy labels. To address th… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  13. arXiv:2406.17675  [pdf, other

    cs.CL

    Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models

    Authors: Yuan Li, Yue Huang, Hongyi Wang, Xiangliang Zhang, James Zou, Lichao Sun

    Abstract: Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities, increasingly adopting roles akin to human-like assistants. The broader integration of LLMs into society has sparked interest in whether they manifest psychological attributes, and whether these attributes are stable-inquiries that could deepen the understanding of their behaviors. Inspired by psychometrics, this… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  14. arXiv:2406.16767  [pdf, other

    cs.CL

    The GPT-WritingPrompts Dataset: A Comparative Analysis of Character Portrayal in Short Stories

    Authors: Xi Yu Huang, Krishnapriya Vishnubhotla, Frank Rudzicz

    Abstract: The improved generative capabilities of large language models have made them a powerful tool for creative writing and storytelling. It is therefore important to quantitatively understand the nature of generated stories, and how they differ from human storytelling. We augment the Reddit WritingPrompts dataset with short stories generated by GPT-3.5, given the same prompts. We quantify and compare t… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  15. arXiv:2406.16135  [pdf, other

    cs.CL cs.LG

    Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models

    Authors: Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chulin Xie, Chiyuan Zhang

    Abstract: Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora. But can these models relate corresponding concepts across languages, effectively being crosslingual? This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks. We observe that while these models show promising surface-level crosslingual abilities on machine translation… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  16. arXiv:2406.15490  [pdf, other

    cs.CL cs.AI cs.LG

    Causal Discovery Inspired Unsupervised Domain Adaptation for Emotion-Cause Pair Extraction

    Authors: Yuncheng Hua, Yu** Huang, Shuo Huang, Tao Feng, Lizhen Qu, Chris Bain, Richard Bassed, Gholamreza Haffari

    Abstract: This paper tackles the task of emotion-cause pair extraction in the unsupervised domain adaptation setting. The problem is challenging as the distributions of the events causing emotions in target domains are dramatically different than those in source domains, despite the distributions of emotional expressions between domains are overlapped. Inspired by causal discovery, we propose a novel deep l… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 12 pages, 6 figures, 4 tables; Under Review in EMNLP 2024

    ACM Class: I.2.4

  17. arXiv:2406.15132  [pdf, other

    cs.LG cs.AI

    Younger: The First Dataset for Artificial Intelligence-Generated Neural Network Architecture

    Authors: Zhengxin Yang, Wanling Gao, Luzhou Peng, Yunyou Huang, Fei Tang, Jianfeng Zhan

    Abstract: Designing and optimizing neural network architectures typically requires extensive expertise, starting with handcrafted designs and then manual or automated refinement. This dependency presents a significant barrier to rapid innovation. Recognizing the complexity of automatically generating neural network architecture from scratch, we introduce Younger, a pioneering dataset to advance this ambitio… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 31 pages, 29 figures, 11 tables

  18. Towards Timely Video Analytics Services at the Network Edge

    Authors: Xishuo Li, Shan Zhang, Yuejiao Huang, Xiao Ma, Zhiyuan Wang, Hongbin Luo

    Abstract: Real-time video analytics services aim to provide users with accurate recognition results timely. However, existing studies usually fall into the dilemma between reducing delay and improving accuracy. The edge computing scenario imposes strict transmission and computation resource constraints, making balancing these conflicting metrics under dynamic network conditions difficult. In this regard, we… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  19. arXiv:2406.14721  [pdf, other

    cs.CL

    1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators?

    Authors: Yue Huang, Chenrui Fan, Yuan Li, Siyuan Wu, Tianyi Zhou, Xiangliang Zhang, Lichao Sun

    Abstract: Large Language Models (LLMs) have garnered significant attention due to their remarkable ability to process information across various languages. Despite their capabilities, they exhibit inconsistencies in handling identical queries in different languages, presenting challenges for further advancement. This paper introduces a method to enhance the multilingual performance of LLMs by aggregating kn… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  20. arXiv:2406.14598  [pdf, other

    cs.AI

    SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

    Authors: Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, Prateek Mittal

    Abstract: Evaluating aligned large language models' (LLMs) ability to recognize and reject unsafe user requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts, however, face three limitations that we address with SORRY-Bench, our proposed benchmark. First, existing methods often use coarse-grained taxonomies of unsafe topics, and are over-representing some fine-grained topics… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  21. arXiv:2406.14526  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Fantastic Copyrighted Beasts and How (Not) to Generate Them

    Authors: Luxi He, Yangsibo Huang, Weijia Shi, Tinghao Xie, Haotian Liu, Yue Wang, Luke Zettlemoyer, Chiyuan Zhang, Danqi Chen, Peter Henderson

    Abstract: Recent studies show that image and video generation models can be prompted to reproduce copyrighted content from their training data, raising serious legal concerns around copyright infringement. Copyrighted characters, in particular, pose a difficult challenge for image generation services, with at least one lawsuit already awarding damages based on the generation of these characters. Yet, little… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  22. arXiv:2406.14408  [pdf, other

    cs.AI cs.CL cs.LG

    FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving

    Authors: Xiaohan Lin, Qingxing Cao, Yinya Huang, Haiming Wang, Jianqiao Lu, Zhengying Liu, Linqi Song, Xiaodan Liang

    Abstract: Formal verification (FV) has witnessed growing significance with current emerging program synthesis by the evolving large language models (LLMs). However, current formal verification mainly resorts to symbolic verifiers or hand-craft rules, resulting in limitations for extensive and flexible verification. On the other hand, formal languages for automated theorem proving, such as Isabelle, as anoth… ▽ More

    Submitted 20 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  23. arXiv:2406.14359  [pdf, other

    cs.NE

    Learning to Transfer for Evolutionary Multitasking

    Authors: Sheng-Hao Wu, Yuxiao Huang, Xingyu Wu, Liang Feng, Zhi-Hui Zhan, Kay Chen Tan

    Abstract: Evolutionary multitasking (EMT) is an emerging approach for solving multitask optimization problems (MTOPs) and has garnered considerable research interest. The implicit EMT is a significant research branch that utilizes evolution operators to enable knowledge transfer (KT) between tasks. However, current approaches in implicit EMT face challenges in adaptability, due to the use of a limited numbe… ▽ More

    Submitted 22 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Under review

  24. arXiv:2406.14322  [pdf, other

    cs.CL cs.CR cs.LG

    Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

    Authors: Lynn Chua, Badih Ghazi, Yangsibo Huang, Pritish Kamath, Daogao Liu, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang

    Abstract: Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy (DP) offers a promising solution by ensuring models are `almost indistinguishable' with or without any particular privacy unit, current evaluations on LLMs most… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  25. arXiv:2406.14098  [pdf, ps, other

    cs.CV

    HeartBeat: Towards Controllable Echocardiography Video Synthesis with Multimodal Conditions-Guided Diffusion Models

    Authors: Xinrui Zhou, Yuhao Huang, Wufeng Xue, Haoran Dou, Jun Cheng, Han Zhou, Dong Ni

    Abstract: Echocardiography (ECHO) video is widely used for cardiac examination. In clinical, this procedure heavily relies on operator experience, which needs years of training and maybe the assistance of deep learning-based systems for enhanced accuracy and efficiency. However, it is challenging since acquiring sufficient customized data (e.g., abnormal cases) for novice training and deep model development… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI 2024

  26. arXiv:2406.13870  [pdf, other

    cs.CV

    Splatter a Video: Video Gaussian Representation for Versatile Processing

    Authors: Yang-Tian Sun, Yi-Hua Huang, Lin Ma, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi

    Abstract: Video representation is a long-standing problem that is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing. However, current methods either struggle to model complex motions due to the absence of 3D structure or rely on implicit 3D representations that are ill-suited for manipulation tasks. To address these challenges, we introduce a no… ▽ More

    Submitted 26 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  27. arXiv:2406.13662  [pdf, other

    cs.CL

    ObscurePrompt: Jailbreaking Large Language Models via Obscure Input

    Authors: Yue Huang, **gyu Tang, Dong** Chen, Bingda Tang, Yao Wan, Lichao Sun, Xiangliang Zhang

    Abstract: Recently, Large Language Models (LLMs) have garnered significant attention for their exceptional natural language processing capabilities. However, concerns about their trustworthiness remain unresolved, particularly in addressing "jailbreaking" attacks on aligned LLMs. Previous research predominantly relies on scenarios with white-box LLMs or specific and fixed prompt templates, which are often i… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  28. arXiv:2406.13619  [pdf, other

    stat.ML cs.LG

    Generative Modeling by Minimizing the Wasserstein-2 Loss

    Authors: Yu-Jui Huang, Zachariah Malik

    Abstract: This paper approaches the unsupervised learning problem by minimizing the second-order Wasserstein loss (the $W_2$ loss). The minimization is characterized by a distribution-dependent ordinary differential equation (ODE), whose dynamics involves the Kantorovich potential between a current estimated distribution and the true data distribution. A main result shows that the time-marginal law of the O… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    MSC Class: 34A06; 49Q22; 68T01

  29. arXiv:2406.13317  [pdf, other

    cs.CV

    M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and Atmosphere

    Authors: Mengqiu Xu, Ming Wu, Kaixin Chen, Yixiang Huang, Mingrui Xu, Yujia Yang, Yiqing Feng, Yiying Guo, Bin Huang, Dongliang Chang, Zhenwei Shi, Chuang Zhang, Zhanyu Ma, Jun Guo

    Abstract: Marine fog poses a significant hazard to global ship**, necessitating effective detection and forecasting to reduce economic losses. In recent years, several machine learning (ML) methods have demonstrated superior detection accuracy compared to traditional meteorological methods. However, most of these works are developed on proprietary datasets, and the few publicly accessible datasets are oft… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  30. arXiv:2406.13007  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Night Photography Rendering

    Authors: Egor Ershov, Artyom Panshin, Oleg Karasev, Sergey Korchagin, Shepelev Lev, Alexandr Startsev, Daniil Vladimirov, Ekaterina Zaychenkova, Nikola Banić, Dmitrii Iarchuk, Maria Efimova, Radu Timofte, Arseniy Terekhin, Shuwei Yue, Yuyang Liu, Minchen Wei, Lu Xu, Chao Zhang, Yasi Wang, Furkan Kınlı, Doğa Yılmaz, Barış Özcan, Furkan Kıraç, Shuai Liu, **gyuan Xiao , et al. (25 additional authors not shown)

    Abstract: This paper presents a review of the NTIRE 2024 challenge on night photography rendering. The goal of the challenge was to find solutions that process raw camera images taken in nighttime conditions, and thereby produce a photo-quality output images in the standard RGB (sRGB) space. Unlike the previous year's competition, the challenge images were collected with a mobile phone and the speed of algo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 10 figures

  31. arXiv:2406.12193  [pdf, other

    cs.LG

    Adaptive Collaborative Correlation Learning-based Semi-Supervised Multi-Label Feature Selection

    Authors: Yanyong Huang, Li Yang, Dongjie Wang, Ke Li, Xiuwen Yi, Fengmao Lv, Tianrui Li

    Abstract: Semi-supervised multi-label feature selection has recently been developed to solve the curse of dimensionality problem in high-dimensional multi-label data with certain samples missing labels. Although many efforts have been made, most existing methods use a predefined graph approach to capture the sample similarity or the label correlation. In this manner, the presence of noise and outliers withi… ▽ More

    Submitted 25 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  32. arXiv:2406.11839  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    mDPO: Conditional Preference Optimization for Multimodal Large Language Models

    Authors: Fei Wang, Wenxuan Zhou, James Y. Huang, Nan Xu, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: Direct preference optimization (DPO) has shown to be an effective method for large language model (LLM) alignment. Recent works have attempted to apply DPO to multimodal scenarios but have found it challenging to achieve consistent improvement. Through a comparative experiment, we identify the unconditional preference problem in multimodal preference optimization, where the model overlooks the ima… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  33. arXiv:2406.11038  [pdf, ps, other

    cs.IT

    Physical-Layer Security for 6G: Safe Jamming against Malicious Sensing

    Authors: Pu Xie, Yang Huang

    Abstract: The integration of sensing, communications, array signal processing, etc. into 6G mobile networks has ushered in an era of heightened situational awareness. However, this progress brings forth significant concerns regarding privacy and security, particularly due to the proliferation of devices equipped with radar-like sensing capability, including malicious ones. In response, this paper proposes a… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: accepted for presentation at 2024 IEEE/CIC International Conference on Communications in China (ICCC)

  34. arXiv:2406.10903  [pdf, other

    cs.LG cs.CL cs.SE

    New Solutions on LLM Acceleration, Optimization, and Application

    Authors: Yingbing Huang, Lily Jiaxin Wan, Hanchen Ye, Manvi Jha, **ghua Wang, Yuhong Li, Xiaofan Zhang, Deming Chen

    Abstract: Large Language Models (LLMs) have become extremely potent instruments with exceptional capacities for comprehending and producing human-like text in a wide range of applications. However, the increasing size and complexity of LLMs present significant challenges in both training and deployment, leading to substantial computational and storage costs as well as heightened energy consumption. In this… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: This is an expanded and more comprehensive study based on our invited DAC-24 paper with the same title and co-authors

  35. arXiv:2406.10840  [pdf, other

    cs.LG cs.AI q-bio.BM

    CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph

    Authors: Haitao Lin, Guojiang Zhao, Odin Zhang, Yufei Huang, Lirong Wu, Zicheng Liu, Siyuan Li, Cheng Tan, Zhifeng Gao, Stan Z. Li

    Abstract: Structure-based drug design (SBDD) aims to generate potential drugs that can bind to a target protein and is greatly expedited by the aid of AI techniques in generative models. However, a lack of systematic understanding persists due to the diverse settings, complex implementation, difficult reproducibility, and task singularity. Firstly, the absence of standardization can lead to unfair compariso… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 9 pages main context

  36. arXiv:2406.10819  [pdf, other

    cs.CV cs.AI cs.CL

    GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents

    Authors: Dong** Chen, Yue Huang, Siyuan Wu, **gyu Tang, Liuyi Chen, Yilin Bai, Zhigang He, Chenlong Wang, Huichi Zhou, Yiqiang Li, Tianshuo Zhou, Yue Yu, Chujie Gao, Qihui Zhang, Yi Gui, Zhen Li, Yao Wan, Pan Zhou, Jianfeng Gao, Lichao Sun

    Abstract: Recently, Multimodal Large Language Models (MLLMs) have been used as agents to control keyboard and mouse inputs by directly perceiving the Graphical User Interface (GUI) and generating corresponding code. However, current agents primarily exhibit excellent understanding capabilities in static environments and are predominantly applied in relatively simple domains, such as Web or mobile interfaces… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  37. arXiv:2406.10801  [pdf, other

    cs.CV

    Saliency-guided and Patch-based Mixup for Long-tailed Skin Cancer Image Classification

    Authors: Tianyunxi Wei, Yi** Huang, Li Lin, Pu** Cheng, Sirui Li, Xiaoying Tang

    Abstract: Medical image datasets often exhibit long-tailed distributions due to the inherent challenges in medical data collection and annotation. In long-tailed contexts, some common disease categories account for most of the data, while only a few samples are available in the rare disease categories, resulting in poor performance of deep learning methods. To address this issue, previous approaches have em… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: IEEE ISBI2024

  38. arXiv:2406.10407  [pdf, other

    math.OC cs.LG math.NA

    Suboptimality bounds for trace-bounded SDPs enable a faster and scalable low-rank SDP solver SDPLR+

    Authors: Yufan Huang, David F. Gleich

    Abstract: Semidefinite programs (SDPs) and their solvers are powerful tools with many applications in machine learning and data science. Designing scalable SDP solvers is challenging because by standard the positive semidefinite decision variable is an $n \times n$ dense matrix, even though the input is often an $n \times n$ sparse matrix. However, the information in the solution may not correspond to a ful… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 17 pages

  39. arXiv:2406.10280  [pdf, other

    cs.CR cs.CL cs.LG

    Transferable Embedding Inversion Attack: Uncovering Privacy Risks in Text Embeddings without Model Queries

    Authors: Yu-Hsiang Huang, Yuche Tsai, Hsiang Hsiao, Hong-Yi Lin, Shou-De Lin

    Abstract: This study investigates the privacy risks associated with text embeddings, focusing on the scenario where attackers cannot access the original embedding model. Contrary to previous research requiring direct model access, we explore a more realistic threat model by develo** a transfer attack method. This approach uses a surrogate model to mimic the victim model's behavior, allowing the attacker t… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024 Main Conference

  40. arXiv:2406.10000  [pdf, other

    cs.CV

    OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

    Authors: Yuzhong Huang, Zhong Li, Zhang Chen, Zhiyuan Ren, Guosheng Lin, Fred Morstatter, Yi Xu

    Abstract: In the evolving landscape of text-to-3D technology, Dreamfusion has showcased its proficiency by utilizing Score Distillation Sampling (SDS) to optimize implicit representations such as NeRF. This process is achieved through the distillation of pretrained large-scale text-to-image diffusion models. However, Dreamfusion encounters fidelity and efficiency constraints: it faces the multi-head Janus i… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  41. arXiv:2406.09913  [pdf, other

    cs.CV

    OpenECAD: An Efficient Visual Language Model for Computer-Aided Design

    Authors: Zhe Yuan, Jianqi Shi, Yanhong Huang

    Abstract: Computer-aided design (CAD) tools are utilized in the manufacturing industry for modeling everything from cups to spacecraft. These programs are complex to use and typically require years of training and experience to master. Structured and well-constrained 2D sketches and 3D constructions are crucial components of CAD modeling. A well-executed CAD model can be seamlessly integrated into the manuf… ▽ More

    Submitted 22 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

  42. arXiv:2406.09411  [pdf, other

    cs.CV cs.AI cs.CL

    MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

    Authors: Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  43. arXiv:2406.09317  [pdf, other

    eess.IV cs.CV

    Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases

    Authors: Meng Wang, Tian Lin, Aidi Lin, Kai Yu, Yuanyuan Peng, Lianyu Wang, Cheng Chen, Ke Zou, Huiyu Liang, Man Chen, Xue Yao, Meiqin Zhang, Binwei Huang, Chaoxin Zheng, Peixin Zhang, Wei Chen, Yilong Luo, Yifan Chen, Honghe Xia, Tingkun Shi, Qi Zhang, **ming Guo, Xiaolin Chen, **gcheng Wang, Yih Chung Tham , et al. (24 additional authors not shown)

    Abstract: Previous foundation models for retinal images were pre-trained with limited disease categories and knowledge base. Here we introduce RetiZero, a vision-language foundation model that leverages knowledge from over 400 fundus diseases. To RetiZero's pre-training, we compiled 341,896 fundus images paired with text descriptions, sourced from public datasets, ophthalmic literature, and online resources… ▽ More

    Submitted 30 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

  44. arXiv:2406.08987  [pdf, other

    cs.NE

    Towards Next Era of Multi-objective Optimization: Large Language Models as Architects of Evolutionary Operators

    Authors: Yuxiao Huang, Shenghao Wu, Wenjie Zhang, Jibin Wu, Liang Feng, Kay Chen Tan

    Abstract: Multi-objective optimization problems (MOPs) are prevalent in various real-world applications, necessitating sophisticated solutions that balance conflicting objectives. Traditional evolutionary algorithms (EAs), while effective, often rely on domain-specific expert knowledge and iterative tuning, which can impede innovation when encountering novel MOPs. Very recently, the emergence of Large Langu… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 14 pages, 5 figures, 5 tables

  45. arXiv:2406.08980  [pdf, other

    q-bio.BM cs.LG

    From Theory to Therapy: Reframing SBDD Model Evaluation via Practical Metrics

    Authors: Bowen Gao, Haichuan Tan, Yanwen Huang, Minsi Ren, Xiao Huang, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

    Abstract: Recent advancements in structure-based drug design (SBDD) have significantly enhanced the efficiency and precision of drug discovery by generating molecules tailored to bind specific protein pockets. Despite these technological strides, their practical application in real-world drug development remains challenging due to the complexities of synthesizing and testing these molecules. The reliability… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  46. arXiv:2406.08961  [pdf, other

    q-bio.BM cs.LG

    SIU: A Million-Scale Structural Small Molecule-Protein Interaction Dataset for Unbiased Bioactivity Prediction

    Authors: Yanwen Huang, Bowen Gao, Yinjun Jia, Hongbo Ma, Wei-Ying Ma, Ya-Qin Zhang, Yanyan Lan

    Abstract: Small molecules play a pivotal role in modern medicine, and scrutinizing their interactions with protein targets is essential for the discovery and development of novel, life-saving therapeutics. The term "bioactivity" encompasses various biological effects resulting from these interactions, including both binding and functional responses. The magnitude of bioactivity dictates the therapeutic or t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  47. arXiv:2406.08731  [pdf, other

    cs.SE

    Where Do Large Language Models Fail When Generating Code?

    Authors: Zhijie Wang, Zijie Zhou, Da Song, Yuheng Huang, Shengmai Chen, Lei Ma, Tianyi Zhang

    Abstract: Large Language Models (LLMs) have shown great potential in code generation. However, current LLMs still cannot reliably generate correct code. Moreover, it is unclear what kinds of code generation errors LLMs can make. To address this, we conducted an empirical study to analyze incorrect code snippets generated by six popular LLMs on the HumanEval dataset. We analyzed these errors alongside two di… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Extended from our MAPS 2023 paper. Our data is available at https://llm-code-errors.cs.purdue.edu

  48. arXiv:2406.08160  [pdf, other

    cs.RO

    Chemistry3D: Robotic Interaction Benchmark for Chemistry Experiments

    Authors: Shoujie Li, Yan Huang, Changqing Guo, Tong Wu, Jiawei Zhang, Linrui Zhang, Wenbo Ding

    Abstract: The advent of simulation engines has revolutionized learning and operational efficiency for robots, offering cost-effective and swift pipelines. However, the lack of a universal simulation platform tailored for chemical scenarios impedes progress in robotic manipulation and visualization of reaction processes. Addressing this void, we present Chemistry3D, an innovative toolkit that integrates exte… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  49. arXiv:2406.08124  [pdf, other

    cs.CL cs.AI

    Legend: Leveraging Representation Engineering to Annotate Safety Margin for Preference Datasets

    Authors: Duanyu Feng, Bowen Qin, Chen Huang, Youcheng Huang, Zheng Zhang, Wenqiang Lei

    Abstract: The success of the reward model in distinguishing between responses with subtle safety differences depends critically on the high-quality preference dataset, which should capture the fine-grained nuances of harmful and harmless responses. This motivates the need to develop a dataset involving preference margins, which accurately quantify how harmless one response is compared to another. In this pa… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Our code is available at https://github.com/colfeng/Legend

  50. arXiv:2406.08002  [pdf, other

    cs.AI cs.MA

    Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning

    Authors: Yizhe Huang, Anji Liu, Fanqi Kong, Yaodong Yang, Song-Chun Zhu, Xue Feng

    Abstract: Despite the recent successes of multi-agent reinforcement learning (MARL) algorithms, efficiently adapting to co-players in mixed-motive environments remains a significant challenge. One feasible approach is to hierarchically model co-players' behavior based on inferring their characteristics. However, these methods often encounter difficulties in efficient reasoning and utilization of inferred in… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.