Skip to main content

Showing 1–50 of 238 results for author: Ding, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Cheng**g Wu, Ting Liu, Luoqi Liu, Xinyu Liu, **g Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, **gnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  2. arXiv:2406.14555  [pdf, other

    cs.CV

    A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models

    Authors: Xincheng Shuai, Henghui Ding, Xingjun Ma, Rongcheng Tu, Yu-Gang Jiang, Dacheng Tao

    Abstract: Image editing aims to edit the given synthetic or real image to meet the specific requirements from users. It is widely studied in recent years as a promising and challenging field of Artificial Intelligence Generative Content (AIGC). Recent significant advancement in this field is based on the development of text-to-image (T2I) diffusion models, which generate images according to text prompts. Th… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Project Page: https://github.com/xinchengshuai/Awesome-Image-Editing

  3. arXiv:2405.20282  [pdf, other

    cs.CV

    SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow

    Authors: Chaoyang Wang, Xiangtai Li, Lu Qi, Henghui Ding, Yunhai Tong, Ming-Hsuan Yang

    Abstract: Semantic segmentation and semantic image synthesis are two representative tasks in visual perception and generation. While existing methods consider them as two distinct tasks, we propose a unified diffusion-based framework (SemFlow) and model them as a pair of reverse problems. Specifically, motivated by rectified flow theory, we train an ordinary differential equation (ODE) model to transport be… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  4. arXiv:2405.15349  [pdf, other

    cs.CL

    UnKE: Unstructured Knowledge Editing in Large Language Models

    Authors: **gcheng Deng, Zihao Wei, Liang Pang, Hanxing Ding, Huawei Shen, Xueqi Cheng

    Abstract: Recent knowledge editing methods have primarily focused on modifying structured knowledge in large language models, heavily relying on the assumption that structured knowledge is stored as key-value pairs locally in MLP layers or specific neurons. However, this task setting overlooks the fact that a significant portion of real-world knowledge is stored in an unstructured format, characterized by l… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  5. arXiv:2405.11448  [pdf, other

    cs.CV

    Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation

    Authors: Zejun Gu, Zhong-Qiu Zhao, Henghui Ding, Hao Shen, Zhao Zhang, De-Shuang Huang

    Abstract: In practical applications of human pose estimation, low-resolution inputs frequently occur, and existing state-of-the-art models perform poorly with low-resolution images. This work focuses on boosting the performance of low-resolution models by distilling knowledge from a high-resolution model. However, we face the challenge of feature size mismatch and class number mismatch when applying knowled… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  6. Towards Metric DBSCAN: Exact, Approximate, and Streaming Algorithms

    Authors: Guanlin Mo, Shihong Song, Hu Ding

    Abstract: DBSCAN is a popular density-based clustering algorithm that has many different applications in practice. However, the running time of DBSCAN in high-dimensional space or general metric space ({\em e.g.,} clustering a set of texts by using edit distance) can be as large as quadratic in the input size. Moreover, most of existing accelerating techniques for DBSCAN are only available for low-dimension… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  7. arXiv:2405.04434  [pdf, other

    cs.CL cs.AI

    DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

    Authors: DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding , et al. (132 additional authors not shown)

    Abstract: We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token, and supports a context length of 128K tokens. DeepSeek-V2 adopts innovative architectures including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees efficient inference… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

  8. arXiv:2404.17287  [pdf, other

    cs.CL

    When to Trust LLMs: Aligning Confidence with Response Quality

    Authors: Shuchang Tao, Liuyi Yao, Hanxing Ding, Yuexiang Xie, Qi Cao, Fei Sun, **yang Gao, Huawei Shen, Bolin Ding

    Abstract: Despite the success of large language models (LLMs) in natural language generation, much evidence shows that LLMs may produce incorrect or nonsensical text. This limitation highlights the importance of discerning when to trust LLMs, especially in safety-critical domains. Existing methods often express reliability by confidence level, however, their effectiveness is limited by the lack of objective… ▽ More

    Submitted 9 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by ACL 2024

  9. arXiv:2404.13401  [pdf, other

    cs.LG

    Approximate Algorithms For $k$-Sparse Wasserstein Barycenter With Outliers

    Authors: Qingyuan Yang, Hu Ding

    Abstract: Wasserstein Barycenter (WB) is one of the most fundamental optimization problems in optimal transportation. Given a set of distributions, the goal of WB is to find a new distribution that minimizes the average Wasserstein distance to them. The problem becomes even harder if we restrict the solution to be ``$k$-sparse''. In this paper, we study the $k$-sparse WB problem in the presence of outliers,… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  10. arXiv:2404.10830  [pdf, other

    cs.CL cs.AI cs.LG

    Fewer Truncations Improve Language Modeling

    Authors: Hantian Ding, Zijian Wang, Giovanni Paolini, Varun Kumar, Anoop Deoras, Dan Roth, Stefano Soatto

    Abstract: In large language model training, input documents are typically concatenated together and then split into sequences of equal length to avoid padding tokens. Despite its efficiency, the concatenation approach compromises data integrity -- it inevitably breaks many documents into incomplete pieces, leading to excessive truncations that hinder the model from learning to compose logically coherent and… ▽ More

    Submitted 2 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: ICML 2024

  11. arXiv:2404.09586  [pdf, other

    cs.CV cs.LG

    Mitigating the Curse of Dimensionality for Certified Robustness via Dual Randomized Smoothing

    Authors: Song Xia, Yi Yu, Xudong Jiang, Henghui Ding

    Abstract: Randomized Smoothing (RS) has been proven a promising method for endowing an arbitrary image classifier with certified robustness. However, the substantial uncertainty inherent in the high-dimensional isotropic Gaussian noise imposes the curse of dimensionality on RS. Specifically, the upper bound of ${\ell_2}$ certified robustness radius provided by RS exhibits a diminishing trend with the expans… ▽ More

    Submitted 15 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted to the International Conference on Learning Representations (ICLR), 2024

  12. arXiv:2404.08562  [pdf, other

    cs.CR cs.AI cs.LG

    Dynamic Neural Control Flow Execution: An Agent-Based Deep Equilibrium Approach for Binary Vulnerability Detection

    Authors: Litao Li, Steven H. H. Ding, Andrew Walenstein, Philippe Charland, Benjamin C. M. Fung

    Abstract: Software vulnerabilities are a challenge in cybersecurity. Manual security patches are often difficult and slow to be deployed, while new vulnerabilities are created. Binary code vulnerability detection is less studied and more complex compared to source code, and this has important practical implications. Deep learning has become an efficient and powerful tool in the security domain, where it pro… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  13. arXiv:2404.04990  [pdf, other

    cs.CL

    MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models

    Authors: Zihao Wei, **gcheng Deng, Liang Pang, Hanxing Ding, Huawei Shen, Xueqi Cheng

    Abstract: The extensive utilization of large language models (LLMs) underscores the crucial necessity for precise and contemporary knowledge embedded within their intrinsic parameters. Existing research on knowledge editing primarily concentrates on monolingual scenarios, neglecting the complexities presented by multilingual contexts and multi-hop reasoning. To address these challenges, our study introduces… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  14. arXiv:2404.03645  [pdf, other

    cs.CV

    Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation

    Authors: Shuting He, Henghui Ding

    Abstract: Referring video segmentation relies on natural language expressions to identify and segment objects, often emphasizing motion clues. Previous works treat a sentence as a whole and directly perform identification at the video-level, mixing up static image-level cues with temporal motion cues. However, image-level features cannot well comprehend motion cues in sentences, and static cues are not cruc… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, code: https://github.com/heshuting555/DsHmp

  15. arXiv:2404.02187  [pdf

    cs.LG cs.AI

    A Generative Deep Learning Approach for Crash Severity Modeling with Imbalanced Data

    Authors: Junlan Chen, Ziyuan Pu, Nan Zheng, Xiao Wen, Hongliang Ding, Xiucheng Guo

    Abstract: Crash data is often greatly imbalanced, with the majority of crashes being non-fatal crashes, and only a small number being fatal crashes due to their rarity. Such data imbalance issue poses a challenge for crash severity modeling since it struggles to fit and interpret fatal crash outcomes with very limited samples. Usually, such data imbalance issues are addressed by data resampling methods, suc… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  16. arXiv:2404.00335  [pdf, other

    cs.CV

    Learning Trimaps via Clicks for Image Matting

    Authors: Chenyi Zhang, Yihan Hu, Henghui Ding, Humphrey Shi, Yao Zhao, Yunchao Wei

    Abstract: Despite significant advancements in image matting, existing models heavily depend on manually-drawn trimaps for accurate results in natural image scenarios. However, the process of obtaining trimaps is time-consuming, lacking user-friendliness and device compatibility. This reliance greatly limits the practical application of all trimap-based matting methods. To address this issue, we introduce Cl… ▽ More

    Submitted 6 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

  17. arXiv:2403.18811  [pdf, other

    cs.CV cs.GR cs.SD eess.AS

    Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment

    Authors: Li Siyao, Tianpei Gu, Zhitao Yang, Zhengyu Lin, Ziwei Liu, Henghui Ding, Lei Yang, Chen Change Loy

    Abstract: We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between t… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: ICLR 2024

  18. arXiv:2403.15484  [pdf, other

    cs.CL cs.LG

    RakutenAI-7B: Extending Large Language Models for Japanese

    Authors: Rakuten Group, Aaron Levine, Connie Huang, Chenguang Wang, Eduardo Batista, Ewa Szymanska, Hongyi Ding, Hou Wei Chou, Jean-François Pessiot, Johanes Effendi, Justin Chiu, Kai Torben Ohlhus, Karan Chopra, Keiji Shinzato, Koji Murakami, Lee Xiong, Lei Chen, Maki Kubota, Maksim Tkachenko, Miroku Lee, Naoki Takahashi, Prathyusha Jwalapuram, Ryutaro Tatsushima, Saurabh Jain, Sunil Kumar Yadav , et al. (5 additional authors not shown)

    Abstract: We introduce RakutenAI-7B, a suite of Japanese-oriented large language models that achieve the best performance on the Japanese LM Harness benchmarks among the open 7B models. Along with the foundation model, we release instruction- and chat-tuned models, RakutenAI-7B-instruct and RakutenAI-7B-chat respectively, under the Apache 2.0 license.

    Submitted 21 March, 2024; originally announced March 2024.

  19. arXiv:2403.11122  [pdf, other

    cs.CV

    LERENet: Eliminating Intra-class Differences for Metal Surface Defect Few-shot Semantic Segmentation

    Authors: Hanze Ding, Zhangkai Wu, Jiyan Zhang, Ming **, Yanfang Liu

    Abstract: Few-shot segmentation models excel in metal defect detection due to their rapid generalization ability to new classes and pixel-level segmentation, rendering them ideal for addressing data scarcity issues and achieving refined object delineation in industrial applications. Existing works neglect the \textit{Intra-Class Differences}, inherent in metal surface defect data, which hinders the model fr… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  20. arXiv:2403.10468  [pdf, other

    cs.SE

    An Empirical Study on Developers Shared Conversations with ChatGPT in GitHub Pull Requests and Issues

    Authors: Huizi Hao, Kazi Amit Hasan, Hong Qin, Marcos Macedo, Yuan Tian, Steven H. H. Ding, Ahmed E. Hassan

    Abstract: ChatGPT has significantly impacted software development practices, providing substantial assistance to developers in a variety of tasks, including coding, testing, and debugging. Despite its widespread adoption, the impact of ChatGPT as an assistant in collaborative coding remains largely unexplored. In this paper, we analyze a dataset of 210 and 370 developers shared conversations with ChatGPT in… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  21. arXiv:2403.09616  [pdf, other

    cs.CV

    Explore In-Context Segmentation via Latent Diffusion Models

    Authors: Chaoyang Wang, Xiangtai Li, Henghui Ding, Lu Qi, Jiangning Zhang, Yunhai Tong, Chen Change Loy, Shuicheng Yan

    Abstract: In-context segmentation has drawn more attention with the introduction of vision foundation models. Most existing approaches adopt metric learning or masked image modeling to build the correlation between visual prompts and input image queries. In this work, we explore this problem from a new perspective, using one representative generation model, the latent diffusion model (LDM). We observe a tas… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  22. arXiv:2403.08845  [pdf, other

    cs.LG cs.AI

    Bifurcated Attention for Single-Context Large-Batch Sampling

    Authors: Ben Athiwaratkun, Sujan Kumar Gonugondla, Sanjay Krishna Gouda, Haifeng Qian, Hantian Ding, Qing Sun, Jun Wang, Jiacheng Guo, Liangfu Chen, Parminder Bhatia, Ramesh Nallapati, Sudipta Sengupta, Bing Xiang

    Abstract: In our study, we present bifurcated attention, a method developed for language model inference in single-context batch sampling contexts. This approach aims to reduce redundant memory IO costs, a significant factor in latency for high batch sizes and long context lengths. Bifurcated attention achieves this by dividing the attention mechanism during incremental decoding into two distinct GEMM opera… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  23. arXiv:2403.08799  [pdf, other

    cs.SE cs.CR

    Automating SBOM Generation with Zero-Shot Semantic Similarity

    Authors: Devin Pereira, Christopher Molloy, Sudipta Acharya, Steven H. H. Ding

    Abstract: It is becoming increasingly important in the software industry, especially with the growing complexity of software ecosystems and the emphasis on security and compliance for manufacturers to inventory software used on their systems. A Software-Bill-of-Materials (SBOM) is a comprehensive inventory detailing a software application's components and dependencies. Current approaches rely on case-based… ▽ More

    Submitted 3 February, 2024; originally announced March 2024.

    Comments: 8 pages, 2 figures

  24. arXiv:2403.02265  [pdf, other

    cs.CV cs.GR

    DaReNeRF: Direction-aware Representation for Dynamic Scenes

    Authors: Ange Lou, Benjamin Planche, Zhongpai Gao, Yamin Li, Tianyu Luan, Hao Ding, Terrence Chen, Jack Noble, Ziyan Wu

    Abstract: Addressing the intricate challenge of modeling and re-rendering dynamic scenes, most recent approaches have sought to simplify these complexities using plane-based explicit representations, overcoming the slow training time issues associated with methods like Neural Radiance Fields (NeRF) and implicit representations. However, the straightforward decomposition of 4D dynamic scenes into multiple 2D… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR 2024. Paper + supplementary material

  25. arXiv:2403.01560  [pdf, other

    cs.CV

    Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

    Authors: Kun-Yu Lin, Henghui Ding, Jiaming Zhou, Yu-Ming Tang, Yi-Xing Peng, Zhilin Zhao, Chen Change Loy, Wei-Shi Zheng

    Abstract: Building upon the impressive success of CLIP (Contrastive Language-Image Pretraining), recent pioneer works have proposed to adapt the powerful CLIP to video data, leading to efficient and effective video learners for open-vocabulary action recognition. Inspired by that humans perform actions in diverse environments, our work delves into an intriguing question: Can CLIP-based video learners effect… ▽ More

    Submitted 24 May, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

  26. arXiv:2402.17531  [pdf, other

    cs.SE cs.AI cs.CL

    Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides

    Authors: Kaikai An, Fangkai Yang, Junting Lu, Liqun Li, Zhixing Ren, Hao Huang, Lu Wang, Pu Zhao, Yu Kang, Hua Ding, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

    Abstract: Effective incident management is pivotal for the smooth operation of enterprises-level cloud services. In order to expedite incident mitigation, service teams compile troubleshooting knowledge into Troubleshooting Guides (TSGs) accessible to on-call engineers (OCEs). While automated pipelines are enabled to resolve the most frequent and easy incidents, there still exist complex incidents that requ… ▽ More

    Submitted 10 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Work in progress

  27. arXiv:2402.13048  [pdf, other

    cs.CL

    Stable Knowledge Editing in Large Language Models

    Authors: Zihao Wei, Liang Pang, Hanxing Ding, **gcheng Deng, Huawei Shen, Xueqi Cheng

    Abstract: Efficient knowledge editing of large language models is crucial for replacing obsolete information or incorporating specialized knowledge on a large scale. However, previous methods implicitly assume that knowledge is localized and isolated within the model, an assumption that oversimplifies the interconnected nature of model knowledge. The premise of localization results in an incomplete knowledg… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  28. arXiv:2402.10612  [pdf, other

    cs.CL

    Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models

    Authors: Hanxing Ding, Liang Pang, Zihao Wei, Huawei Shen, Xueqi Cheng

    Abstract: Hallucinations pose a significant challenge for the practical implementation of large language models (LLMs). The utilization of parametric knowledge in generating factual content is constrained by the limited knowledge of LLMs, potentially resulting in internal hallucinations. While incorporating external information can help fill knowledge gaps, it also introduces the risk of irrelevant informat… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  29. arXiv:2402.02414  [pdf, other

    cs.HC cs.CV

    Navigate Biopsy with Ultrasound under Augmented Reality Device: Towards Higher System Performance

    Authors: Haowei Li, Wenqing Yan, Jiasheng Zhao, Yuqi Ji, Long Qian, Hui Ding, Zhe Zhao, Guangzhi Wang

    Abstract: Purpose: Biopsies play a crucial role in determining the classification and staging of tumors. Ultrasound is frequently used in this procedure to provide real-time anatomical information. Using augmented reality (AR), surgeons can visualize ultrasound data and spatial navigation information seamlessly integrated with real tissues. This innovation facilitates faster and more precise biopsy operatio… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  30. arXiv:2402.01935  [pdf, other

    cs.CL

    Code Representation Learning At Scale

    Authors: Dejiao Zhang, Wasi Ahmad, Ming Tan, Hantian Ding, Ramesh Nallapati, Dan Roth, Xiaofei Ma, Bing Xiang

    Abstract: Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code representation learning train models at a hundred million parameter scale using very limited pretraining corpora. In this work, we fuel code representation learning with a vast amount of code data via a two-st… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 10 pages

    Journal ref: ICLR 2024

  31. arXiv:2401.10229  [pdf, other

    cs.CV

    OMG-Seg: Is One Model Good Enough For All Segmentation?

    Authors: Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy

    Abstract: In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models. We propose OMG-Seg, One Model that is Good enough to efficiently and effectively handle all the segmentation tasks, including image semantic, instance, and panoptic segmentation, as well as their video counterparts, open vocabulary settings, prompt-driven, interactive segmentati… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Project Page: https://lxtgh.github.io/project/omg_seg/

  32. arXiv:2401.07450  [pdf, other

    cs.CV cs.AI

    Hierarchical Fashion Design with Multi-stage Diffusion Models

    Authors: Zhifeng Xie, Hao Li, Huiming Ding, Mengtian Li, Ying Cao

    Abstract: Cross-modal fashion synthesis and editing offer intelligent support to fashion designers by enabling the automatic generation and local modification of design drafts.While current diffusion models demonstrate commendable stability and controllability in image synthesis,they still face significant challenges in generating fashion design from abstract design elements and fine-grained editing.Abstrac… ▽ More

    Submitted 20 January, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

  33. arXiv:2401.06374  [pdf, other

    cs.CV

    SamLP: A Customized Segment Anything Model for License Plate Detection

    Authors: Haoxuan Ding, Junyu Gao, Yuan Yuan, Qi Wang

    Abstract: With the emergence of foundation model, this novel paradigm of deep learning has encouraged many powerful achievements in natural language processing and computer vision. There are many advantages of foundation model, such as excellent feature extraction power, mighty generalization ability, great few-shot and zero-shot learning capacity, etc. which are beneficial to vision tasks. As the unique id… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  34. arXiv:2401.02954  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Authors: DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li , et al. (63 additional authors not shown)

    Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  35. arXiv:2312.15883  [pdf, other

    cs.CL cs.AI

    HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses

    Authors: Xinke Jiang, Ruizhe Zhang, Yongxin Xu, Rihong Qiu, Yue Fang, Zhiyuan Wang, **yi Tang, Hongxin Ding, Xu Chu, Junfeng Zhao, Yasha Wang

    Abstract: In this paper, we investigate the retrieval-augmented generation (RAG) based on Knowledge Graphs (KGs) to improve the accuracy and reliability of Large Language Models (LLMs). Recent approaches suffer from insufficient and repetitive knowledge retrieval, tedious and time-consuming query parsing, and monotonous knowledge utilization. To this end, we develop a Hypothesis Knowledge Graph Enhanced (Hy… ▽ More

    Submitted 19 April, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

    Comments: version 2

  36. arXiv:2312.14345  [pdf, other

    cs.AI cs.CL cs.HC

    Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs

    Authors: Behnam Rahdari, Hao Ding, Ziwei Fan, Yifei Ma, Zhuotong Chen, Anoop Deoras, Branislav Kveton

    Abstract: The unique capabilities of Large Language Models (LLMs), such as the natural language text generation ability, position them as strong candidates for providing explanation for recommendations. However, despite the size of the LLM, most existing models struggle to produce zero-shot explanations reliably. To address this issue, we propose a framework called Logic-Scaffolding, that combines the ideas… ▽ More

    Submitted 17 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: The 17th ACM International Conference on Web Search and Data Mining (WSDM 2024)

  37. arXiv:2312.12425  [pdf, other

    cs.CV

    SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process

    Authors: Mengyu Wang, Henghui Ding, Jun Hao Liew, Jiajun Liu, Yao Zhao, Yunchao Wei

    Abstract: In this paper, we explore a principal way to enhance the quality of object masks produced by different segmentation models. We propose a model-agnostic solution called SegRefiner, which offers a novel perspective on this problem by interpreting segmentation refinement as a data generation process. As a result, the refinement process can be smoothly implemented through a series of denoising diffusi… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023, Code: https://github.com/MengyuWang826/SegRefiner

  38. arXiv:2312.10948  [pdf, other

    cs.CV cs.AI

    A Multimodal Approach for Advanced Pest Detection and Classification

    Authors: **li Duan, Haoyu Ding, Sung Kim

    Abstract: This paper presents a novel multi modal deep learning framework for enhanced agricultural pest detection, combining tiny-BERT's natural language processing with R-CNN and ResNet-18's image processing. Addressing limitations of traditional CNN-based visual methods, this approach integrates textual context for more accurate pest identification. The R-CNN and ResNet-18 integration tackles deep CNN is… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  39. arXiv:2312.04819  [pdf, other

    cs.MA

    Attention-Guided Contrastive Role Representations for Multi-Agent Reinforcement Learning

    Authors: Zican Hu, Zongzhang Zhang, Huaxiong Li, Chunlin Chen, Hongyu Ding, Zhi Wang

    Abstract: Real-world multi-agent tasks usually involve dynamic team composition with the emergence of roles, which should also be a key to efficient cooperation in multi-agent reinforcement learning (MARL). Drawing inspiration from the correlation between roles and agent's behavior patterns, we propose a novel framework of **A**ttention-guided **CO**ntrastive **R**ole representation learning for **M**ARL (*… ▽ More

    Submitted 2 March, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

  40. arXiv:2312.01474  [pdf, other

    cs.RO

    LVDiffusor: Distilling Functional Rearrangement Priors from Large Models into Diffusor

    Authors: Yiming Zeng, Mingdong Wu, Long Yang, Jiyao Zhang, Hao Ding, Hui Cheng, Hao Dong

    Abstract: Object rearrangement, a fundamental challenge in robotics, demands versatile strategies to handle diverse objects, configurations, and functional needs. To achieve this, the AI robot needs to learn functional rearrangement priors in order to specify precise goals that meet the functional requirements. Previous methods typically learn such priors from either laborious human annotations or manually… ▽ More

    Submitted 8 March, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

  41. arXiv:2311.17401  [pdf, ps, other

    cs.LG cs.AI

    Gene-MOE: A sparsely gated prognosis and classification framework exploiting pan-cancer genomic information

    Authors: Xiangyu Meng, Xue Li, Qing Yang, Huanhuan Dai, Lian Qiao, Hongzhen Ding, Long Hao, Xun Wang

    Abstract: Benefiting from the advancements in deep learning, various genomic analytical techniques, such as survival analysis, classification of tumors and their subtypes, and exploration of specific pathways, have significantly enhanced our understanding of the biological mechanisms driving cancer. However, the overfitting issue, arising from the limited number of patient samples, poses a challenge in impr… ▽ More

    Submitted 18 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

  42. arXiv:2311.10723  [pdf, other

    q-fin.GN cs.AI cs.CL

    Large Language Models in Finance: A Survey

    Authors: Yinheng Li, Shaofei Wang, Han Ding, Hang Chen

    Abstract: Recent advances in large language models (LLMs) have opened new possibilities for artificial intelligence applications in finance. In this paper, we provide a practical survey focused on two key aspects of utilizing LLMs for financial tasks: existing solutions and guidance for adoption. First, we review current approaches employing LLMs in finance, including leveraging pretrained models via zero… ▽ More

    Submitted 28 September, 2023; originally announced November 2023.

    Comments: Accepted by 4th ACM International Conference on AI in Finance (ICAIF-23) https://ai-finance.org

  43. arXiv:2311.07514  [pdf, other

    cs.CV

    VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search

    Authors: Shuting He, Hao Luo, Wei Jiang, Xudong Jiang, Henghui Ding

    Abstract: Text-based Person Search (TBPS) aims to retrieve images of target pedestrian indicated by textual descriptions. It is essential for TBPS to extract fine-grained local features and align them crossing modality. Existing methods utilize external tools or heavy cross-modal interaction to achieve explicit alignment of cross-modal fine-grained features, which is inefficient and time-consuming. In this… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: Accepted to IEEE TIP

  44. arXiv:2311.06070  [pdf, other

    cs.CV

    Learning-Based Biharmonic Augmentation for Point Cloud Classification

    Authors: Jiacheng Wei, Guosheng Lin, Henghui Ding, Jie Hu, Kim-Hui Yap

    Abstract: Point cloud datasets often suffer from inadequate sample sizes in comparison to image datasets, making data augmentation challenging. While traditional methods, like rigid transformations and scaling, have limited potential in increasing dataset diversity due to their constraints on altering individual sample shapes, we introduce the Biharmonic Augmentation (BA) method. BA is a novel and efficient… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  45. arXiv:2310.19251  [pdf, other

    cs.IR cs.AI

    Pre-trained Recommender Systems: A Causal Debiasing Perspective

    Authors: Ziqian Lin, Hao Ding, Nghia Trong Hoang, Branislav Kveton, Anoop Deoras, Hao Wang

    Abstract: Recent studies on pre-trained vision/language models have demonstrated the practical benefit of a new, promising solution-building paradigm in AI where models can be pre-trained on broad data describing a generic task space and then adapted successfully to solve a wide range of downstream tasks, even when training data is severely limited (e.g., in zero- or few-shot learning scenarios). Inspired b… ▽ More

    Submitted 8 January, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

    Comments: 8 pages, WSDM 24

  46. arXiv:2310.18446  [pdf, other

    cs.DS cs.AI cs.CG math.OC

    A Novel Skip Orthogonal List for Dynamic Optimal Transport Problem

    Authors: Xiaoyang Xu, Hu Ding

    Abstract: Optimal transport is a fundamental topic that has attracted a great amount of attention from the optimization community in the past decades. In this paper, we consider an interesting discrete dynamic optimal transport problem: can we efficiently update the optimal transport plan when the weights or the locations of the data points change? This problem is naturally motivated by several applications… ▽ More

    Submitted 26 January, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

  47. arXiv:2310.11248  [pdf, other

    cs.LG cs.CL cs.SE

    CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion

    Authors: Yangruibo Ding, Zijian Wang, Wasi Uddin Ahmad, Hantian Ding, Ming Tan, Nihal Jain, Murali Krishna Ramanathan, Ramesh Nallapati, Parminder Bhatia, Dan Roth, Bing Xiang

    Abstract: Code completion models have made significant progress in recent years, yet current popular evaluation datasets, such as HumanEval and MBPP, predominantly focus on code completion tasks within a single file. This over-simplified setting falls short of representing the real-world software development scenario where repositories span multiple files with numerous cross-file dependencies, and accessing… ▽ More

    Submitted 16 November, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: To appear at NeurIPS 2023 (Datasets and Benchmarks Track)

  48. arXiv:2310.02593  [pdf

    cs.AI

    A ModelOps-based Framework for Intelligent Medical Knowledge Extraction

    Authors: Hongxin Ding, Peinie Zou, Zhiyuan Wang, Junfeng Zhao, Yasha Wang, Qiang Zhou

    Abstract: Extracting medical knowledge from healthcare texts enhances downstream tasks like medical knowledge graph construction and clinical decision-making. However, the construction and application of knowledge extraction models lack automation, reusability and unified management, leading to inefficiencies for researchers and high barriers for non-AI experts such as doctors, to utilize knowledge extracti… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  49. arXiv:2309.16643  [pdf, other

    cs.CV

    Deep Geometrized Cartoon Line Inbetweening

    Authors: Li Siyao, Tianpei Gu, Weiye Xiao, Henghui Ding, Ziwei Liu, Chen Change Loy

    Abstract: We aim to address a significant but understudied problem in the anime industry, namely the inbetweening of cartoon line drawings. Inbetweening involves generating intermediate frames between two black-and-white line drawings and is a time-consuming and expensive process that can benefit from automation. However, existing frame interpolation methods that rely on matching and war** whole raster im… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  50. arXiv:2309.13599  [pdf, other

    cs.LG cs.AI

    From Cluster Assumption to Graph Convolution: Graph-based Semi-Supervised Learning Revisited

    Authors: Zheng Wang, Hongming Ding, Li Pan, Jianhua Li, Zhiguo Gong, Philip S. Yu

    Abstract: Graph-based semi-supervised learning (GSSL) has long been a hot research topic. Traditional methods are generally shallow learners, based on the cluster assumption. Recently, graph convolutional networks (GCNs) have become the predominant techniques for their promising performance. In this paper, we theoretically discuss the relationship between these two types of methods in a unified optimization… ▽ More

    Submitted 2 June, 2024; v1 submitted 24 September, 2023; originally announced September 2023.