Skip to main content

Showing 1–11 of 11 results for author: Sheng, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19905  [pdf, other

    cs.CV

    Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

    Authors: Longrong Yang, Dong Sheng, Chaoxiang Cai, Fan Yang, Size Li, Di Zhang, Xi Li

    Abstract: The Mixture-of-Experts (MoE) has gained increasing attention in the study of Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and thus they e… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2406.07202  [pdf

    cs.CV

    Can Foundation Models Reliably Identify Spatial Hazards? A Case Study on Curb Segmentation

    Authors: Diwei Sheng, Giles Hamilton-Fletcher, Mahya Beheshti, Chen Feng, John-Ross Rizzo

    Abstract: Curbs serve as vital borders that delineate safe pedestrian zones from potential vehicular traffic hazards. Curbs also represent a primary spatial hazard during dynamic navigation with significant stumbling potential. Such vulnerabilities are particularly exacerbated for persons with blindness and low vision (PBLV). Accurate visual-based discrimination of curbs is paramount for assistive technolog… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: 21 pages, 8 figures, submitted to Assistive Technology

  3. arXiv:2404.00504  [pdf, other

    cs.CV

    NYC-Indoor-VPR: A Long-Term Indoor Visual Place Recognition Dataset with Semi-Automatic Annotation

    Authors: Diwei Sheng, Anbang Yang, John-Ross Rizzo, Chen Feng

    Abstract: Visual Place Recognition (VPR) in indoor environments is beneficial to humans and robots for better localization and navigation. It is challenging due to appearance changes at various frequencies, and difficulties of obtaining ground truth metric trajectories for training and evaluation. This paper introduces the NYC-Indoor-VPR dataset, a unique and rich collection of over 36,000 images compiled f… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 7 pages, 7 figures, published in 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)

  4. arXiv:2312.02520  [pdf, other

    cs.CV

    Towards More Unified In-context Visual Understanding

    Authors: Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu, Qi Chu, Jianmin Bao, Tao Gong, Bin Liu, Shengwei Xu, Nenghai Yu

    Abstract: The rapid advancement of large language models (LLMs) has accelerated the emergence of in-context learning (ICL) as a cutting-edge approach in the natural language processing domain. Recently, ICL has been employed in visual understanding tasks, such as semantic segmentation and image captioning, yielding promising results. However, existing visual ICL framework can not enable producing content ac… ▽ More

    Submitted 16 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted by CVPR 2024

  5. arXiv:2306.11043  [pdf, other

    cs.DC cs.OS

    DFlow: Efficient Dataflow-based Invocation Workflow Execution for Function-as-a-Service

    Authors: Xiaoxiang Shi, Chao Li, Zijun Li, Zihan Liu, Dianmo Sheng, Quan Chen, **gwen Leng, Minyi Guo

    Abstract: The Serverless Computing is becoming increasingly popular due to its ease of use and fine-grained billing. These features make it appealing for stateful application or serverless workflow. However, current serverless workflow systems utilize a controlflow-based invocation pattern to invoke functions. In this execution pattern, the function invocation depends on the state of the function. A functio… ▽ More

    Submitted 4 July, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

    Comments: 22 pages, 13 figures

  6. arXiv:2212.03863  [pdf, other

    cs.CV cs.LG

    X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

    Authors: Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu

    Abstract: Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous wor… ▽ More

    Submitted 31 May, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: ICML 2023, code is available at https://github.com/yoctta/XPaste

  7. i-Razor: A Differentiable Neural Input Razor for Feature Selection and Dimension Search in DNN-Based Recommender Systems

    Authors: Yao Yao, Bin Liu, Haoxun He, Dakui Sheng, Ke Wang, Li Xiao, Huanhuan Cao

    Abstract: Input features play a crucial role in DNN-based recommender systems with thousands of categorical and continuous fields from users, items, contexts, and interactions. Noisy features and inappropriate embedding dimension assignments can deteriorate the performance of recommender systems and introduce unnecessary complexity in model training and online serving. Optimizing the input configuration of… ▽ More

    Submitted 11 November, 2023; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE)

  8. arXiv:2110.09004  [pdf, other

    cs.CV

    NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences

    Authors: Diwei Sheng, Yuxiang Chai, Xinru Li, Chen Feng, Jianzhe Lin, Claudio Silva, John-Ross Rizzo

    Abstract: Visual place recognition (VPR) is critical in not only localization and map** for autonomous driving vehicles, but also in assistive navigation for the visually impaired population. To enable a long-term VPR system on a large scale, several challenges need to be addressed. First, different applications could require different image view directions, such as front views for self-driving cars while… ▽ More

    Submitted 25 July, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

    Comments: 8 pages, 10 figures, published in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

  9. arXiv:1907.02046  [pdf

    cs.CL cs.IR cs.LG

    Deep neural network-based classification model for Sentiment Analysis

    Authors: Donghang Pan, **gling Yuan, Lin Li, Deming Sheng

    Abstract: The growing prosperity of social networks has brought great challenges to the sentimental tendency mining of users. As more and more researchers pay attention to the sentimental tendency of online users, rich research results have been obtained based on the sentiment classification of explicit texts. However, research on the implicit sentiment of users is still in its infancy. Aiming at the diffic… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

  10. Convolutional neural networks with fractional order gradient method

    Authors: Dian Sheng, Yiheng Wei, Yuquan Chen, Yong Wang

    Abstract: This paper proposes a fractional order gradient method for the backward propagation of convolutional neural networks. To overcome the problem that fractional order gradient method cannot converge to real extreme point, a simplified fractional order gradient method is designed based on Caputo's definition. The parameters within layers are updated by the designed gradient method, but the propagation… ▽ More

    Submitted 16 September, 2019; v1 submitted 13 May, 2019; originally announced May 2019.

  11. arXiv:1905.01022  [pdf, other

    eess.AS cs.LG cs.SD

    A Feature Learning Siamese Model for Intelligent Control of the Dynamic Range Compressor

    Authors: Di Sheng, György Fazekas

    Abstract: In this paper, a siamese DNN model is proposed to learn the characteristics of the audio dynamic range compressor (DRC). This facilitates an intelligent control system that uses audio examples to configure the DRC, a widely used non-linear audio signal conditioning technique in the areas of music production, speech communication and broadcasting. Several alternative siamese DNN architectures are p… ▽ More

    Submitted 1 May, 2019; originally announced May 2019.

    Comments: 8 pages, accepted in IJCNN 2019