Skip to main content

Showing 1–50 of 381 results for author: Gao, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00905  [pdf, other

    cs.CV

    Learning Robust 3D Representation from CLIP via Dual Denoising

    Authors: Shuqing Luo, Bowen Qu, Wei Gao

    Abstract: In this paper, we explore a critical yet under-investigated issue: how to learn robust and well-generalized 3D representation from pre-trained vision language models such as CLIP. Previous works have demonstrated that cross-modal distillation can provide rich and useful knowledge for 3D data. However, like most deep learning models, the resultant 3D learning network is still vulnerable to adversar… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  2. arXiv:2406.16976  [pdf, other

    cs.NE cs.AI cs.LG physics.chem-ph

    Efficient Evolutionary Search Over Chemical Space with Large Language Models

    Authors: Haorui Wang, Marta Skreta, Cher-Tian Ser, Wenhao Gao, Lingkai Kong, Felix Streith-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang

    Abstract: Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  3. arXiv:2406.15132  [pdf, other

    cs.LG cs.AI

    Younger: The First Dataset for Artificial Intelligence-Generated Neural Network Architecture

    Authors: Zhengxin Yang, Wanling Gao, Luzhou Peng, Yunyou Huang, Fei Tang, Jianfeng Zhan

    Abstract: Designing and optimizing neural network architectures typically requires extensive expertise, starting with handcrafted designs and then manual or automated refinement. This dependency presents a significant barrier to rapid innovation. Recognizing the complexity of automatically generating neural network architecture from scratch, we introduce Younger, a pioneering dataset to advance this ambitio… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 31 pages, 29 figures, 11 tables

  4. arXiv:2406.14194  [pdf, other

    cs.CV cs.AI

    VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

    Authors: Jie Zhang, Sibo Wang, Xiangkui Cao, Zheng Yuan, Shiguang Shan, Xilin Chen, Wen Gao

    Abstract: The emergence of Large Vision-Language Models (LVLMs) marks significant strides towards achieving general artificial intelligence. However, these advancements are tempered by the outputs that often reflect biases, a concern not yet extensively investigated. Existing benchmarks are not sufficiently comprehensive in evaluating biases due to their limited data scale, single questioning format and nar… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  6. arXiv:2406.09136  [pdf, other

    cs.CL cs.LG

    Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs

    Authors: Xuan Zhang, Chao Du, Tianyu Pang, Qian Liu, Wei Gao, Min Lin

    Abstract: The recent development of chain-of-thought (CoT) decoding has enabled large language models (LLMs) to generate explicit logical reasoning paths for complex problem-solving. However, research indicates that these paths are not always deliberate and optimal. The tree-of-thought (ToT) method employs tree-searching to extensively explore the reasoning space and find better reasoning paths that CoT dec… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  7. arXiv:2406.07362  [pdf, other

    cs.HC

    AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database

    Authors: Wanling Gao, Yuan Liu, Zhuoming Yu, Dandan Cui, Wen**g Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Fan Huang, Gangyuan Zhao, Chongrong Jiang, Tianyi Wei, Zhifei Zhang, Yunyou Huang, Jianfeng Zhan

    Abstract: Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI f… ▽ More

    Submitted 15 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

    Comments: 12 pages

  8. arXiv:2406.06562  [pdf, other

    cs.CL cs.AI

    Achieving Sparse Activation in Small Language Models

    Authors: Jifeng Song, Kai Huang, Xiangyu Yin, Boyuan Yang, Wei Gao

    Abstract: Sparse activation, which selectively activates only an input-dependent set of neurons in inference, is a useful technique to reduce the computing cost of Large Language Models (LLMs) without retraining or adaptation efforts. However, whether it can be applied to the recently emerging Small Language Models (SLMs) remains questionable, because SLMs are generally less over-parameterized than LLMs. In… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: 15 pages

  9. arXiv:2406.04628  [pdf, other

    cs.CE q-bio.QM

    Projecting Molecules into Synthesizable Chemical Spaces

    Authors: Shitong Luo, Wenhao Gao, Zuofan Wu, Jian Peng, Connor W. Coley, Jianzhu Ma

    Abstract: Discovering new drug molecules is a pivotal yet challenging process due to the near-infinitely large chemical space and notorious demands on time and resources. Numerous generative models have recently been introduced to accelerate the drug discovery process, but their progression to experimental validation remains limited, largely due to a lack of consideration for synthetic accessibility in prac… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  10. arXiv:2406.02143  [pdf, other

    cs.CL

    Reinforcement Tuning for Detecting Stances and Debunking Rumors Jointly with Large Language Models

    Authors: Ruichao Yang, Wei Gao, **g Ma, Hongzhan Lin, Bo Wang

    Abstract: Learning multi-task models for jointly detecting stance and verifying rumors poses challenges due to the need for training data of stance at post level and rumor veracity at claim level, which are difficult to obtain. To address this issue, we leverage large language models (LLMs) as the foundation annotators for the joint stance detection (SD) and rumor verification (RV) tasks, dubbed as JSDRV. W… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: ACL 2024 (Findings)

  11. arXiv:2405.21074  [pdf, other

    cs.CV

    Latent Intrinsics Emerge from Training to Relight

    Authors: Xiao Zhang, William Gao, Seemandhar Jain, Michael Maire, David. A. Forsyth, Anand Bhattad

    Abstract: Image relighting is the task of showing what a scene from a source image would look like if illuminated differently. Inverse graphics schemes recover an explicit representation of geometry and a set of chosen intrinsics, then relight with some form of renderer. However error control for inverse graphics is difficult, and inverse graphics methods can represent only the effects of the chosen intrins… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  12. arXiv:2405.17472  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    FreezeAsGuard: Mitigating Illegal Adaptation of Diffusion Models via Selective Tensor Freezing

    Authors: Kai Huang, Wei Gao

    Abstract: Text-to-image diffusion models can be fine-tuned in custom domains to adapt to specific user preferences, but such unconstrained adaptability has also been utilized for illegal purposes, such as forging public figures' portraits and duplicating copyrighted artworks. Most existing work focuses on detecting the illegally generated contents, but cannot prevent or mitigate illegal adaptations of diffu… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 18 pages

  13. arXiv:2405.12491  [pdf, other

    cs.SE

    Bridging the Gap Between Domain-specific Frameworks and Multiple Hardware Devices

    Authors: Xu Wen, Wanling Gao, Lei Wang, Jianfeng Zhan

    Abstract: The rapid development of domain-specific frameworks has presented us with a significant challenge: The current approach of implementing solutions on a case-by-case basis incurs a theoretical complexity of O(M*N), thereby increasing the cost of porting applications to different hardware platforms. To address these challenges, we propose a systematic methodology that effectively bridges the gap betw… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 15pages, 8 figures

  14. arXiv:2405.09054  [pdf, other

    cs.CV

    Dim Small Target Detection and Tracking: A Novel Method Based on Temporal Energy Selective Scaling and Trajectory Association

    Authors: Weihua Gao, Wenlong Niu, Wenlong Lu, Pengcheng Wang, Zhaoyuan Qi, Xiaodong Peng, Zhen Yang

    Abstract: The detection and tracking of small targets in passive optical remote sensing (PORS) has broad applications. However, most of the previously proposed methods seldom utilize the abundant temporal features formed by target motion, resulting in poor detection and tracking performance for low signal-to-clutter ratio (SCR) targets. In this article, we analyze the difficulty based on spatial features an… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  15. arXiv:2405.08403  [pdf, other

    cs.LG

    TFWT: Tabular Feature Weighting with Transformer

    Authors: Xinhao Zhang, Zaitian Wang, Lu Jiang, Wanfu Gao, Pengfei Wang, Kunpeng Liu

    Abstract: In this paper, we propose a novel feature weighting method to address the limitation of existing feature processing methods for tabular data. Typically the existing methods assume equal importance across all samples and features in one dataset. This simplified processing methods overlook the unique contributions of each feature, and thus may miss important feature information. As a result, it lead… ▽ More

    Submitted 17 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  16. arXiv:2405.07447  [pdf

    cs.HC

    From traces to measures: Large language models as a tool for psychological measurement from text

    Authors: Joseph J. P. Simons, Wong Liang Ze, Prasanta Bhattacharya, Brandon Siyuan Loh, Wei Gao

    Abstract: Digital trace data provide potentially valuable resources for understanding human behaviour, but their value has been limited by issues of unclear measurement. The growth of large language models provides an opportunity to address this limitation in the case of text data. Specifically, recognizing cases where their responses are a form of psychological measurement (the use of observable indicators… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: 12 pages, 2 figures, 1 table

  17. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  18. GroupedMixer: An Entropy Model with Group-wise Token-Mixers for Learned Image Compression

    Authors: Daxin Li, Yuanchao Bai, Kai Wang, Junjun Jiang, Xianming Liu, Wen Gao

    Abstract: Transformer-based entropy models have gained prominence in recent years due to their superior ability to capture long-range dependencies in probability distribution estimation compared to convolution-based methods. However, previous transformer-based entropy models suffer from a sluggish coding process due to pixel-wise autoregression or duplicated computation during inference. In this paper, we p… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE TCSVT

  19. arXiv:2404.17283  [pdf, other

    cs.CL

    Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM

    Authors: Xuan Zhang, Wei Gao

    Abstract: Retrieval-augmented language models have exhibited promising performance across various areas of natural language processing (NLP), including fact-critical tasks. However, due to the black-box nature of advanced large language models (LLMs) and the non-retrieval-oriented supervision signal of specific tasks, the training of retrieval model faces significant challenges under the setting of black-bo… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by COLING 2024

  20. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  21. arXiv:2404.15892  [pdf, other

    cs.CG

    Filling holes in LoD2 building models

    Authors: Weixiao Gao, Ravi Peters, Hugo Ledoux, Jantien Stoter

    Abstract: This paper presents a new algorithm for filling holes in Level of Detail 2 (LoD2) building mesh models, addressing the challenges posed by geometric inaccuracies and topological errors. Unlike traditional methods that often alter the original geometric structure or impose stringent input requirements, our approach preserves the integrity of the original model while effectively managing a range of… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  22. arXiv:2404.15644  [pdf, other

    cs.CV

    Building-PCC: Building Point Cloud Completion Benchmarks

    Authors: Weixiao Gao, Ravi Peters, Jantien Stoter

    Abstract: With the rapid advancement of 3D sensing technologies, obtaining 3D shape information of objects has become increasingly convenient. Lidar technology, with its capability to accurately capture the 3D information of objects at long distances, has been widely applied in the collection of 3D data in urban scenes. However, the collected point cloud data often exhibit incompleteness due to factors such… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  23. arXiv:2404.15297  [pdf, ps, other

    eess.SP cs.IT cs.LG

    Multi-stream Transmission for Directional Modulation Network via Distributed Multi-UAV-aided Multi-active-IRS

    Authors: Ke Yang, Rongen Dong, Wei Gao, Feng Shu, Wei** Shi, Yan Wang, Xuehui Wang, Jiangzhou Wang

    Abstract: Active intelligent reflecting surface (IRS) is a revolutionary technique for the future 6G networks. The conventional far-field single-IRS-aided directional modulation(DM) networks have only one (no direct path) or two (existing direct path) degrees of freedom (DoFs). This means that there are only one or two streams transmitted simultaneously from base station to user and will seriously limit its… ▽ More

    Submitted 28 April, 2024; v1 submitted 26 March, 2024; originally announced April 2024.

  24. arXiv:2404.14676  [pdf, other

    cs.CV cs.GR

    DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance

    Authors: Linxuan Xin, Zheng Zhang, **fu Wei, Wei Gao, Duan Gao

    Abstract: Prior material creation methods had limitations in producing diverse results mainly because reconstruction-based methods relied on real-world measurements and generation-based methods were trained on relatively small material datasets. To address these challenges, we propose DreamPBR, a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by… ▽ More

    Submitted 1 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 16 pages, 17 figures

    ACM Class: I.3.0; I.4.9

  25. arXiv:2404.13573  [pdf, other

    cs.CV

    Exploring AIGC Video Quality: A Focus on Visual Harmony, Video-Text Consistency and Domain Distribution Gap

    Authors: Bowen Qu, Xiaoyu Liang, Shangkun Sun, Wei Gao

    Abstract: The recent advancements in Text-to-Video Artificial Intelligence Generated Content (AIGC) have been remarkable. Compared with traditional videos, the assessment of AIGC videos encounters various challenges: visual inconsistency that defy common sense, discrepancies between content and the textual prompt, and distribution gap between various generative models, etc. Target at these challenges, in th… ▽ More

    Submitted 27 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: 9 pages, 3 figures, 3 tables. Accepted by CVPR2024 Workshop (3rd place winner of NTIRE2024 Quality Assessment for AI-Generated Content - Track 2 Video)

  26. arXiv:2404.10947  [pdf, other

    cs.CV

    Residual Connections Harm Abstract Feature Learning in Masked Autoencoders

    Authors: Xiao Zhang, Ruoxi Jiang, William Gao, Rebecca Willett, Michael Maire

    Abstract: We demonstrate that adding a weighting factor to decay the strength of identity shortcuts within residual networks substantially improves semantic feature learning in the state-of-the-art self-supervised masked autoencoding (MAE) paradigm. Our modification to the identity shortcuts within a VIT-B/16 backbone of an MAE boosts linear probing accuracy on ImageNet from 67.8% to 72.7%. This significant… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  27. arXiv:2404.07556  [pdf, other

    eess.IV cs.CV

    Attention-Aware Laparoscopic Image Desmoking Network with Lightness Embedding and Hybrid Guided Embedding

    Authors: Ziteng Liu, Jiahua Zhu, Bainan Liu, Hao Liu, Wenpeng Gao, Yili Fu

    Abstract: This paper presents a novel method of smoke removal from the laparoscopic images. Due to the heterogeneous nature of surgical smoke, a two-stage network is proposed to estimate the smoke distribution and reconstruct a clear, smoke-free surgical scene. The utilization of the lightness channel plays a pivotal role in providing vital information pertaining to smoke density. The reconstruction of smok… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: ISBI2024

  28. arXiv:2404.07181  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.comp-ph

    BAMBOO: a predictive and transferable machine learning force field framework for liquid electrolyte development

    Authors: Sheng Gong, Yumin Zhang, Zhenliang Mu, Zhichen Pu, Hongyi Wang, Zhiao Yu, Mengyi Chen, Tianze Zheng, Zhi Wang, Lifei Chen, Xiaojie Wu, Shaochen Shi, Weihao Gao, Wen Yan, Liang Xiang

    Abstract: Despite the widespread applications of machine learning force field (MLFF) on solids and small molecules, there is a notable gap in applying MLFF to complex liquid electrolytes. In this work, we introduce BAMBOO (ByteDance AI Molecular Simulation Booster), a novel framework for molecular dynamics (MD) simulations, with a demonstration of its capabilities in the context of liquid electrolytes for l… ▽ More

    Submitted 22 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  29. Event Grounded Criminal Court View Generation with Cooperative (Large) Language Models

    Authors: Linan Yue, Qi Liu, Lili Zhao, Li Wang, Weibo Gao, Yanqing An

    Abstract: With the development of legal intelligence, Criminal Court View Generation has attracted much attention as a crucial task of legal intelligence, which aims to generate concise and coherent texts that summarize case facts and provide explanations for verdicts. Existing researches explore the key information in case facts to yield the court views. Most of them employ a coarse-grained approach that p… ▽ More

    Submitted 16 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted to SIGIR2024

  30. arXiv:2404.06661  [pdf, other

    cs.CV

    Efficient Denoising using Score Embedding in Score-based Diffusion Models

    Authors: Andrew S. Na, William Gao, Justin W. L. Wan

    Abstract: It is well known that training a denoising score-based diffusion models requires tens of thousands of epochs and a substantial number of image data to train the model. In this paper, we propose to increase the efficiency in training score-based diffusion models. Our method allows us to decrease the number of epochs needed to train the diffusion model. We accomplish this by solving the log-density… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  31. arXiv:2404.02003  [pdf, other

    cs.LG

    AUTODIFF: Autoregressive Diffusion Modeling for Structure-based Drug Design

    Authors: Xinze Li, Penglei Wang, Tianfan Fu, Wenhao Gao, Chengtao Li, Leilei Shi, Junhong Liu

    Abstract: Structure-based drug design (SBDD), which aims to generate molecules that can bind tightly to the target protein, is an essential problem in drug discovery, and previous approaches have achieved initial success. However, most existing methods still suffer from invalid local structure or unrealistic conformation issues, which are mainly due to the poor leaning of bond angles or torsional angles. To… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  32. arXiv:2404.00021  [pdf, other

    cs.HC cs.CE cs.CY cs.PF

    Evaluatology: The Science and Engineering of Evaluation

    Authors: Jianfeng Zhan, Lei Wang, Wanling Gao, Hongxiao Li, Chenxi Wang, Yunyou Huang, Yatao Li, Zhengxin Yang, Guoxin Kang, Chunjie Luo, Hainan Ye, Shaopeng Dai, Zhifei Zhang

    Abstract: Evaluation is a crucial aspect of human existence and plays a vital role in various fields. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant repercussions. This article aims to formally introduce the discipline of evaluatology, which encompasses the science… ▽ More

    Submitted 19 March, 2024; originally announced April 2024.

    Comments: 29 pages, 16 figures, and 2 tables

  33. arXiv:2403.19881  [pdf, other

    cs.AI

    IME: Integrating Multi-curvature Shared and Specific Embedding for Temporal Knowledge Graph Completion

    Authors: Jiapu Wang, Zheng Cui, Boyue Wang, Shirui Pan, Junbin Gao, Baocai Yin, Wen Gao

    Abstract: Temporal Knowledge Graphs (TKGs) incorporate a temporal dimension, allowing for a precise capture of the evolution of knowledge and reflecting the dynamic nature of the real world. Typically, TKGs contain complex geometric structures, with various geometric structures interwoven. However, existing Temporal Knowledge Graph Completion (TKGC) methods either model TKGs in a single space or neglect the… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  34. arXiv:2403.18714  [pdf, other

    cs.CV cs.MM

    Bringing Textual Prompt to AI-Generated Image Quality Assessment

    Authors: Bowen Qu, Haohui Li, Wei Gao

    Abstract: AI-Generated Images (AGIs) have inherent multimodal nature. Unlike traditional image quality assessment (IQA) on natural scenarios, AGIs quality assessment (AGIQA) takes the correspondence of image and its textual prompt into consideration. This is coupled in the ground truth score, which confuses the unimodal IQA methods. To solve this problem, we introduce IP-IQA (AGIs Quality Assessment via Ima… ▽ More

    Submitted 21 May, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: 6 pages, 3 figures, accepted by ICME2024

  35. arXiv:2403.17698  [pdf, other

    cs.LG cs.AI

    MEP: Multiple Kernel Learning Enhancing Relative Positional Encoding Length Extrapolation

    Authors: Weiguo Gao

    Abstract: When the predicted sequence length exceeds the length seen during training, the transformer's inference accuracy diminishes. Existing relative position encoding methods, such as those based on the ALiBi technique, address the length extrapolation challenge exclusively through the implementation of a single kernel function, which introduces a constant bias to every post-softmax attention scores acc… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  36. arXiv:2403.07955  [pdf, other

    cs.LG cs.AI

    Towards Faithful Explanations: Boosting Rationalization with Shortcuts Discovery

    Authors: Linan Yue, Qi Liu, Yichao Du, Li Wang, Weibo Gao, Yanqing An

    Abstract: The remarkable success in neural networks provokes the selective rationalization. It explains the prediction results by identifying a small subset of the inputs sufficient to support them. Since existing methods still suffer from adopting the shortcuts in data to compose rationales and limited large-scale annotated rationales by human, in this paper, we propose a Shortcuts-fused Selective Rational… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted to ICLR 2024

  37. arXiv:2403.06430  [pdf, other

    cs.CV

    AS-FIBA: Adaptive Selective Frequency-Injection for Backdoor Attack on Deep Face Restoration

    Authors: Zhenbo Song, Wenhao Gao, Kaihao Zhang, Wenhan Luo, Zhaoxin Fan, Jianfeng Lu

    Abstract: Deep learning-based face restoration models, increasingly prevalent in smart devices, have become targets for sophisticated backdoor attacks. These attacks, through subtle trigger injection into input face images, can lead to unexpected restoration outcomes. Unlike conventional methods focused on classification tasks, our approach introduces a unique degradation objective tailored for attacking re… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  38. arXiv:2403.06239  [pdf, other

    cs.LG cs.AI

    Cooperative Classification and Rationalization for Graph Generalization

    Authors: Linan Yue, Qi Liu, Ye Liu, Weibo Gao, Fangzhou Yao, Wenfeng Li

    Abstract: Graph Neural Networks (GNNs) have achieved impressive results in graph classification tasks, but they struggle to generalize effectively when faced with out-of-distribution (OOD) data. Several approaches have been proposed to address this problem. Among them, one solution is to diversify training distributions in vanilla classification by modifying the data environment, yet accessing the environme… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted to WWW 2024

  39. arXiv:2403.05046  [pdf, other

    cs.RO

    EgoPAT3Dv2: Predicting 3D Action Target from 2D Egocentric Vision for Human-Robot Interaction

    Authors: Irving Fang, Yuzhong Chen, Yifan Wang, Jianghan Zhang, Qiushi Zhang, Jiali Xu, Xibo He, Weibo Gao, Hao Su, Yiming Li, Chen Feng

    Abstract: A robot's ability to anticipate the 3D action target location of a hand's movement from egocentric videos can greatly improve safety and efficiency in human-robot interaction (HRI). While previous research predominantly focused on semantic action classification or 2D target region prediction, we argue that predicting the action target's 3D coordinate could pave the way for more versatile downstrea… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 6 pages. Accepted at ICRA 2024

  40. arXiv:2402.16882  [pdf, other

    physics.chem-ph cs.AI cs.LG q-bio.BM

    Substrate Scope Contrastive Learning: Repurposing Human Bias to Learn Atomic Representations

    Authors: Wenhao Gao, Priyanka Raghavan, Ron Shprints, Connor W. Coley

    Abstract: Learning molecular representation is a critical step in molecular machine learning that significantly influences modeling success, particularly in data-scarce situations. The concept of broadly pre-training neural networks has advanced fields such as computer vision, natural language processing, and protein engineering. However, similar approaches for small organic molecules have not achieved comp… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  41. arXiv:2402.16366  [pdf, other

    cs.CV cs.MM

    SPC-NeRF: Spatial Predictive Compression for Voxel Based Radiance Field

    Authors: Zetian Song, Wenhong Duan, Yuhuai Zhang, Shiqi Wang, Siwei Ma, Wen Gao

    Abstract: Representing the Neural Radiance Field (NeRF) with the explicit voxel grid (EVG) is a promising direction for improving NeRFs. However, the EVG representation is not efficient for storage and transmission because of the terrific memory cost. Current methods for compressing EVG mainly inherit the methods designed for neural network compression, such as pruning and quantization, which do not take fu… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  42. arXiv:2402.16097  [pdf, other

    cs.IT eess.SP

    Molecular Code-Division Multiple-Access: Signaling, Detection, and Performance

    Authors: Weidong Gao, Lu Shi, Lie-Liang Yang

    Abstract: To accomplish relatively complex tasks, in Internet of Bio-Nano Things (IoBNT), information collected by different nano-machines (NMs) is usually sent via multiple-access channels to fusion centers (FCs) for further processing. Relying on two types of molecules, in this paper, a molecular code-division multiple-access (MoCDMA) scheme is designed for multiple NMs to simultaneously send information… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  43. arXiv:2402.10172  [pdf, other

    cs.AI cs.MA

    OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models

    Authors: Ali AhmadiTeshnizi, Wenzhi Gao, Madeleine Udell

    Abstract: Optimization problems are pervasive in sectors from manufacturing and distribution to healthcare. However, most such problems are still solved heuristically by hand rather than optimally by state-of-the-art solvers because the expertise required to formulate and solve these problems limits the widespread adoption of optimization tools and techniques. This paper introduces OptiMUS, a Large Language… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  44. arXiv:2402.07108  [pdf, other

    cs.LG math.OC

    Decoupling Learning and Decision-Making: Breaking the $\mathcal{O}(\sqrt{T})$ Barrier in Online Resource Allocation with First-Order Methods

    Authors: Wenzhi Gao, Chunlin Sun, Chenyu Xue, Dongdong Ge, Yinyu Ye

    Abstract: Online linear programming plays an important role in both revenue management and resource allocation, and recent research has focused on develo** efficient first-order online learning algorithms. Despite the empirical success of first-order methods, they typically achieve a regret no better than $\mathcal{O}(\sqrt{T})$, which is suboptimal compared to the $\mathcal{O}(\log T)$ bound guaranteed b… ▽ More

    Submitted 28 May, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  45. arXiv:2402.01331  [pdf, other

    cs.CV

    A general framework for rotation invariant point cloud analysis

    Authors: Shuqing Luo, Wei Gao

    Abstract: We propose a general method for deep learning based point cloud analysis, which is invariant to rotation on the inputs. Classical methods are vulnerable to rotation, as they usually take aligned point clouds as input. Principle Component Analysis (PCA) is a practical approach to achieve rotation invariance. However, there are still some gaps between theory and practical algorithms. In this work, w… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 5 pages, 1 figure, accepted by ICASSP 2024

  46. arXiv:2402.01296  [pdf, other

    cs.LG cs.CR cs.CV

    Bi-CryptoNets: Leveraging Different-Level Privacy for Encrypted Inference

    Authors: Man-Jie Yuan, Zheng Zou, Wei Gao

    Abstract: Privacy-preserving neural networks have attracted increasing attention in recent years, and various algorithms have been developed to keep the balance between accuracy, computational complexity and information security from the cryptographic view. This work takes a different view from the input data and structure of neural networks. We decompose the input data (e.g., some images) into sensitive an… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  47. arXiv:2401.15203  [pdf, other

    cs.LG

    FedGT: Federated Node Classification with Scalable Graph Transformer

    Authors: Zaixi Zhang, Qingyong Hu, Yang Yu, Weibo Gao, Qi Liu

    Abstract: Graphs are widely used to model relational data. As graphs are getting larger and larger in real-world scenarios, there is a trend to store and compute subgraphs in multiple local systems. For example, recently proposed \emph{subgraph federated learning} methods train Graph Neural Networks (GNNs) distributively on local subgraphs and aggregate GNN parameters with a central server. However, existin… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: ICLR 24 submission

  48. arXiv:2401.13971  [pdf, other

    math.OC cs.LG

    Stochastic Weakly Convex Optimization Beyond Lipschitz Continuity

    Authors: Wenzhi Gao, Qi Deng

    Abstract: This paper considers stochastic weakly convex optimization without the standard Lipschitz continuity assumption. Based on new adaptive regularization (stepsize) strategies, we show that a wide class of stochastic algorithms, including the stochastic subgradient method, preserve the $\mathcal{O} ( 1 / \sqrt{K})$ convergence rate with constant failure rate. Our analyses rest on rather weak assumptio… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  49. arXiv:2401.13298  [pdf, other

    cs.CL cs.AI

    Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models

    Authors: Hongzhan Lin, Ziyang Luo, Wei Gao, **g Ma, Bo Wang, Ruichao Yang

    Abstract: The age of social media is flooded with Internet memes, necessitating a clear grasp and effective identification of harmful ones. This task presents a significant challenge due to the implicit meaning embedded in memes, which is not explicitly conveyed through the surface text and image. However, existing harmful meme detection methods do not present readable explanations that unveil such implicit… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: The first work towards explainable harmful meme detection by harnessing advanced LLMs

    Journal ref: The ACM Web Conference 2024

  50. arXiv:2401.09724  [pdf, other

    cs.SI cs.CL

    Predicting Viral Rumors and Vulnerable Users for Infodemic Surveillance

    Authors: Xuan Zhang, Wei Gao

    Abstract: In the age of the infodemic, it is crucial to have tools for effectively monitoring the spread of rampant rumors that can quickly go viral, as well as identifying vulnerable users who may be more susceptible to spreading such misinformation. This proactive approach allows for timely preventive measures to be taken, mitigating the negative impact of false information on society. We propose a novel… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted by IP&M