Skip to main content

Showing 1–50 of 213 results for author: Sun, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19070  [pdf, other

    cs.CV

    FAGhead: Fully Animate Gaussian Head from Monocular Videos

    Authors: Yixin Xuan, Xinyang Li, Gongxin Yao, Shiwei Zhou, Donghui Sun, Xiaoxin Chen, Yu Pan

    Abstract: High-fidelity reconstruction of 3D human avatars has a wild application in visual reality. In this paper, we introduce FAGhead, a method that enables fully controllable human portraits from monocular videos. We explicit the traditional 3D morphable meshes (3DMM) and optimize the neutral 3D Gaussians to reconstruct with complex expressions. Furthermore, we employ a novel Point-based Learnable Repre… ▽ More

    Submitted 28 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.18549  [pdf

    eess.IV cs.CV

    Advancements in Feature Extraction Recognition of Medical Imaging Systems Through Deep Learning Technique

    Authors: Qishi Zhan, Dan Sun, Erdi Gao, Yuhan Ma, Yaxin Liang, Haowei Yang

    Abstract: This study introduces a novel unsupervised medical image feature extraction method that employs spatial stratification techniques. An objective function based on weight is proposed to achieve the purpose of fast image recognition. The algorithm divides the pixels of the image into multiple subdomains and uses a quadtree to access the image. A technique for threshold optimization utilizing a simple… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

    Comments: conference

  3. arXiv:2406.08838  [pdf

    cs.CL cs.AI cs.LG

    Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning

    Authors: Dan Sun, Yaxin Liang, Yining Yang, Yuhan Ma, Qishi Zhan, Erdi Gao

    Abstract: This project intends to study the image representation based on attention mechanism and multimodal data. By adding multiple pattern layers to the attribute model, the semantic and hidden layers of image content are integrated. The word vector is quantified by the Word2Vec method and then evaluated by a word embedding convolutional neural network. The published experimental results of the two group… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  4. arXiv:2406.08837  [pdf

    eess.IV cs.CV cs.LG

    Research on Deep Learning Model of Feature Extraction Based on Convolutional Neural Network

    Authors: Houze Liu, Iris Li, Yaxin Liang, Dan Sun, Yining Yang, Haowei Yang

    Abstract: Neural networks with relatively shallow layers and simple structures may have limited ability in accurately identifying pneumonia. In addition, deep neural networks also have a large demand for computing resources, which may cause convolutional neural networks to be unable to be implemented on terminals. Therefore, this paper will carry out the optimal classification of convolutional neural networ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2406.07268  [pdf, other

    cs.MM cs.CL cs.CV

    Advancing Grounded Multimodal Named Entity Recognition via LLM-Based Reformulation and Box-Based Segmentation

    Authors: **yuan Li, Ziyan Li, Han Li, Jianfei Yu, Rui Xia, Di Sun, Gang Pan

    Abstract: Grounded Multimodal Named Entity Recognition (GMNER) task aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging attributes: 1) The tenuous correlation between images and text on social media contributes to a notable proportion of named entities being ungroundable. 2) There exists a distinction between coarse-grained noun phrases u… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Extension of our Findings of EMNLP 2023 & ACL 2024 paper

  6. arXiv:2406.04299  [pdf, other

    cs.LG cs.SI

    NoisyGL: A Comprehensive Benchmark for Graph Neural Networks under Label Noise

    Authors: Zhonghao Wang, Danyu Sun, Sheng Zhou, Haobo Wang, Jiapei Fan, Longtao Huang, Jiajun Bu

    Abstract: Graph Neural Networks (GNNs) exhibit strong potential in node classification task through a message-passing mechanism. However, their performance often hinges on high-quality node labels, which are challenging to obtain in real-world scenarios due to unreliable sources or adversarial attacks. Consequently, label noise is common in real-world graph data, negatively impacting GNNs by propagating inc… ▽ More

    Submitted 6 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: 28 pages, 15 figures

  7. arXiv:2405.19194  [pdf, other

    cs.CV

    LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model

    Authors: Hongen Liu, Di Sun, Jiahao Wang, Yi Liu, Gang Pan

    Abstract: Video text spotting (VTS) aims to simultaneously localize, recognize and track text instances in videos. To address the limited recognition capability of end-to-end methods, recent methods track the zero-shot results of state-of-the-art image text spotters directly, and achieve impressive performance. However, owing to the domain gap between different datasets, these methods usually obtain limited… ▽ More

    Submitted 10 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  8. arXiv:2405.13290  [pdf, other

    cs.LG cs.AI

    Theoretical Analysis of Meta Reinforcement Learning: Generalization Bounds and Convergence Guarantees

    Authors: Cangqing Wang, Mingxiu Sui, Dan Sun, Zecheng Zhang, Yan Zhou

    Abstract: This research delves deeply into Meta Reinforcement Learning (Meta RL) through a exploration focusing on defining generalization limits and ensuring convergence. By employing a approach this article introduces an innovative theoretical framework to meticulously assess the effectiveness and performance of Meta RL algorithms. We present an explanation of generalization limits measuring how well thes… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by the 2024 International Conference on Modeling, Natural Language Processing and Machine Learning(CMNM 2024)

  9. arXiv:2405.12850  [pdf, other

    cs.CV

    Weakly supervised alignment and registration of MR-CT for cervical cancer radiotherapy

    Authors: Jjahao Zhang, Yin Gu, Deyu Sun, Yuhua Gao, Ming Gao, Ming Cui, Teng Zhang, He Ma

    Abstract: Cervical cancer is one of the leading causes of death in women, and brachytherapy is currently the primary treatment method. However, it is important to precisely define the extent of paracervical tissue invasion to improve cancer diagnosis and treatment options. The fusion of the information characteristics of both computed tomography (CT) and magnetic resonance imaging(MRI) modalities may be use… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  10. arXiv:2405.06598  [pdf, other

    cs.CV

    A Lightweight Transformer for Remote Sensing Image Change Captioning

    Authors: Dongwei Sun, Yajie Bao, Xiangyong Cao

    Abstract: Remote sensing image change captioning (RSICC) aims to automatically generate sentences that describe content differences in remote sensing bitemporal images. Recently, attention-based transformers have become a prevalent idea for capturing the features of global change. However, existing transformer-based RSICC methods face challenges, e.g., high parameters and high computational complexity cause… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  11. arXiv:2404.14647  [pdf, other

    cs.RO eess.SY

    Human Behavior Modeling via Identification of Task Objective and Variability

    Authors: Sooyung Byeon, Dawei Sun, Inseok Hwang

    Abstract: Human behavior modeling is important for the design and implementation of human-automation interactive control systems. In this context, human behavior refers to a human's control input to systems. We propose a novel method for human behavior modeling that uses human demonstrations for a given task to infer the unknown task objective and the variability. The task objective represents the human's i… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 10 pages

  12. arXiv:2404.08636  [pdf, other

    cs.CV

    Probing the 3D Awareness of Visual Foundation Models

    Authors: Mohamed El Banani, Amit Raj, Kevis-Kokitsi Maninis, Abhishek Kar, Yuanzhen Li, Michael Rubinstein, Deqing Sun, Leonidas Guibas, Justin Johnson, Varun Jampani

    Abstract: Recent advances in large-scale pretraining have yielded visual foundation models with strong capabilities. Not only can recent models generalize to arbitrary images for their training task, their intermediate representations are useful for other visual tasks such as detection and segmentation. Given that such models can classify, delineate, and localize objects in 2D, we ask whether they also repr… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024. Project page: https://github.com/mbanani/probe3d

  13. arXiv:2404.04997  [pdf, other

    cs.LG cs.AI cs.CL

    Adapting LLMs for Efficient Context Processing through Soft Prompt Compression

    Authors: Cangqing Wang, Yutian Yang, Ruisi Li, Dan Sun, Ruicong Cai, Yuzhu Zhang, Chengqian Fu, Lillian Floyd

    Abstract: The rapid advancement of Large Language Models (LLMs) has inaugurated a transformative epoch in natural language processing, fostering unprecedented proficiency in text generation, comprehension, and contextual scrutiny. Nevertheless, effectively handling extensive contexts, crucial for myriad applications, poses a formidable obstacle owing to the intrinsic constraints of the models' context windo… ▽ More

    Submitted 18 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by the 2024 International Conference on Image Processing and Computer Applications (IPCA 2024)

  14. arXiv:2404.01723  [pdf, other

    eess.IV cs.CV

    Contextual Embedding Learning to Enhance 2D Networks for Volumetric Image Segmentation

    Authors: Zhuoyuan Wang, Dong Sun, Xiangyun Zeng, Ruodai Wu, Yi Wang

    Abstract: The segmentation of organs in volumetric medical images plays an important role in computer-aided diagnosis and treatment/surgery planning. Conventional 2D convolutional neural networks (CNNs) can hardly exploit the spatial correlation of volumetric data. Current 3D CNNs have the advantage to extract more powerful volumetric representations but they usually suffer from occupying excessive memory a… ▽ More

    Submitted 17 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 15 pages, 9 figures

  15. arXiv:2403.16082  [pdf, other

    cs.SE

    SoK: Comprehensive Analysis of Rug Pull Causes, Datasets, and Detection Tools in DeFi

    Authors: Dianxiang Sun, Wei Ma, Liming Nie, Yang Liu

    Abstract: Rug pulls pose a grave threat to the cryptocurrency ecosystem, leading to substantial financial loss and undermining trust in decentralized finance (DeFi) projects. With the emergence of new rug pull patterns, research on rug pull is out of state. To fill this gap, we first conducted an extensive analysis of the literature review, encompassing both scholarly and industry sources. By examining exis… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  16. arXiv:2403.14727  [pdf, other

    cs.CY cs.CL cs.LG

    Protected group bias and stereotypes in Large Language Models

    Authors: Hadas Kotek, David Q. Sun, Zidi Xiu, Margit Bowler, Christopher Klein

    Abstract: As modern Large Language Models (LLMs) shatter many state-of-the-art benchmarks in a variety of domains, this paper investigates their behavior in the domains of ethics and fairness, focusing on protected group bias. We conduct a two-part study: first, we solicit sentence continuations describing the occupations of individuals from different protected groups, including gender, sexuality, religion,… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  17. arXiv:2403.03652  [pdf

    physics.optics cs.HC physics.app-ph

    3D Printed Waveguide for Augmented Reality

    Authors: Dechuan Sun, Gregory Tanyi, Alan Lee, Chris French, Younger Liang, Christina Lim, Ranjith R Unnithan

    Abstract: Mass production of augmented reality (AR) waveguides has been challenging due to the intricate nature of the fabrication technique and the high precision required for its optical characteristics. In this paper, we have presented a novel and low-cost approach for fabricating geometric optical waveguides designed for AR applications utilizing 3D printing techniques. To strike a balance between optic… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  18. arXiv:2402.18318  [pdf

    cs.RO

    SD-SLAM: A Semantic SLAM Approach for Dynamic Scenes Based on LiDAR Point Clouds

    Authors: Feiya Li, Chunyun Fu, Dongye Sun, Jian Li, Jianwen Wang

    Abstract: Point cloud maps generated via LiDAR sensors using extensive remotely sensed data are commonly used by autonomous vehicles and robots for localization and navigation. However, dynamic objects contained in point cloud maps not only downgrade localization accuracy and navigation performance but also jeopardize the map quality. In response to this challenge, we propose in this paper a novel semantic… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  19. arXiv:2402.11841  [pdf, other

    cs.SE

    ASGNet: Adaptive Semantic Gate Networks for Log-Based Anomaly Diagnosis

    Authors: Haitian Yang, Degang Sun, Wen Liu, Yanshu Li, Yan Wang, Weiqing Huang

    Abstract: Logs are widely used in the development and maintenance of software systems. Logs can help engineers understand the runtime behavior of systems and diagnose system failures. For anomaly diagnosis, existing methods generally use log event data extracted from historical logs to build diagnostic models. However, we find that existing methods do not make full use of two types of features, (1) statisti… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  20. arXiv:2402.11139  [pdf, other

    cs.LG cs.AI

    LiGNN: Graph Neural Networks at LinkedIn

    Authors: Fedor Borisyuk, Shihai He, Yunbo Ouyang, Morteza Ramezani, Peng Du, Xiaochen Hou, Chengming Jiang, Nitin Pasumarthy, Priya Bannur, Birjodh Tiwana, ** Liu, Siddharth Dangi, Daqi Sun, Zhoutao Pei, Xiao Shi, Sirou Zhu, Qianqi Shen, Kuang-Hsuan Lee, David Stein, Baolei Li, Haichao Wei, Amol Ghoting, Souvik Ghosh

    Abstract: In this paper, we present LiGNN, a deployed large-scale Graph Neural Networks (GNNs) Framework. We share our insight on develo** and deployment of GNNs at large scale at LinkedIn. We present a set of algorithmic improvements to the quality of GNN representation learning including temporal graph architectures with long term losses, effective cold start solutions via graph densification, ID embedd… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  21. arXiv:2402.09989  [pdf, other

    cs.CV cs.CL

    LLMs as Bridges: Reformulating Grounded Multimodal Named Entity Recognition

    Authors: **yuan Li, Han Li, Di Sun, Jiahao Wang, Wenkun Zhang, Zan Wang, Gang Pan

    Abstract: Grounded Multimodal Named Entity Recognition (GMNER) is a nascent multimodal task that aims to identify named entities, entity types and their corresponding visual regions. GMNER task exhibits two challenging properties: 1) The weak correlation between image-text pairs in social media results in a significant portion of named entities being ungroundable. 2) There exists a distinction between coars… ▽ More

    Submitted 29 May, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted to Findings of ACL 2024

  22. arXiv:2402.07207  [pdf, other

    cs.CV

    GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting

    Authors: Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong, **lin He, Zhiwei Lin, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang

    Abstract: We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an instance-scene compositional optimization mechanism with condi… ▽ More

    Submitted 11 June, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  23. arXiv:2402.06859  [pdf, other

    cs.LG cs.AI cs.IR

    LiRank: Industrial Large Scale Ranking Models at LinkedIn

    Authors: Fedor Borisyuk, Mingzhou Zhou, Qingquan Song, Siyu Zhu, Birjodh Tiwana, Ganesh Parameswaran, Siddharth Dangi, Lars Hertel, Qiang Xiao, Xiaochen Hou, Yunbo Ouyang, Aman Gupta, Sheallika Singh, Dan Liu, Hailing Cheng, Lei Le, Jonathan Hung, Sathiya Keerthi, Ruoyan Wang, Fengyu Zhang, Mohit Kothari, Chen Zhu, Daqi Sun, Yun Dai, Xun Luan , et al. (9 additional authors not shown)

    Abstract: We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    ACM Class: H.3.3

  24. arXiv:2401.15753  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    An objective comparison of methods for augmented reality in laparoscopic liver resection by preoperative-to-intraoperative image fusion

    Authors: Sharib Ali, Yamid Espinel, Yueming **, Peng Liu, Bianca Güttner, Xukun Zhang, Lihua Zhang, Tom Dowrick, Matthew J. Clarkson, Shiting Xiao, Yifan Wu, Yijun Yang, Lei Zhu, Dai Sun, Lan Li, Micha Pfeiffer, Shahid Farid, Lena Maier-Hein, Emmanuel Buc, Adrien Bartoli

    Abstract: Augmented reality for laparoscopic liver resection is a visualisation mode that allows a surgeon to localise tumours and vessels embedded within the liver by projecting them on top of a laparoscopic image. Preoperative 3D models extracted from CT or MRI data are registered to the intraoperative laparoscopic images during this process. In terms of 3D-2D fusion, most of the algorithms make use of an… ▽ More

    Submitted 7 February, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: 24 pages

  25. arXiv:2401.12945  [pdf, other

    cs.CV

    Lumiere: A Space-Time Diffusion Model for Video Generation

    Authors: Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Guanghui Liu, Amit Raj, Yuanzhen Li, Michael Rubinstein, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri

    Abstract: We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synth… ▽ More

    Submitted 5 February, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

    Comments: Webpage: https://lumiere-video.github.io/ | Video: https://www.youtube.com/watch?v=wxLr02Dz2Sc

  26. arXiv:2401.10171  [pdf, other

    cs.CV cs.GR

    SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild

    Authors: Andreas Engelhardt, Amit Raj, Mark Boss, Yunzhi Zhang, Abhishek Kar, Yuanzhen Li, Deqing Sun, Ricardo Martin Brualla, Jonathan T. Barron, Hendrik P. A. Lensch, Varun Jampani

    Abstract: We present SHINOBI, an end-to-end framework for the reconstruction of shape, material, and illumination from object images captured with varying lighting, pose, and background. Inverse rendering of an object based on unconstrained image collections is a long-standing challenge in computer vision and graphics and requires a joint optimization over shape, radiance, and pose. We show that an implicit… ▽ More

    Submitted 29 March, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted by IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024). Updated supplementary material and acknowledgements

  27. arXiv:2401.05960  [pdf, other

    cs.AI

    Machine Learning Insides OptVerse AI Solver: Design Principles and Applications

    Authors: Xijun Li, Fangzhou Zhu, Hui-Ling Zhen, Weilin Luo, Meng Lu, Yimin Huang, Zhenan Fan, Zirui Zhou, Yufei Kuang, Zhihai Wang, Zijie Geng, Yang Li, Haoyang Liu, Zhiwu An, Muming Yang, Jianshu Li, Jie Wang, Junchi Yan, Defeng Sun, Tao Zhong, Yong Zhang, Jia Zeng, Mingxuan Yuan, Jianye Hao, Jun Yao , et al. (1 additional authors not shown)

    Abstract: In an era of digital ubiquity, efficient resource management and decision-making are paramount across numerous industries. To this end, we present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI Solver, which aims to mitigate the scarcity of real-world mathematical programming instances, and to surpass the capabilities of traditional opt… ▽ More

    Submitted 17 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  28. arXiv:2401.03664  [pdf

    eess.IV cs.CV cs.LG

    Dual-Channel Reliable Breast Ultrasound Image Classification Based on Explainable Attribution and Uncertainty Quantification

    Authors: Shuge Lei, Haonan Hu, Dasheng Sun, Huabin Zhang, Kehong Yuan, Jian Dai, Jijun Tang, Yan Tong

    Abstract: This paper focuses on the classification task of breast ultrasound images and researches on the reliability measurement of classification results. We proposed a dual-channel evaluation framework based on the proposed inference reliability and predictive reliability scores. For the inference reliability evaluation, human-aligned and doctor-agreed inference rationales based on the improved feature a… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  29. arXiv:2401.01674  [pdf, other

    cs.CV

    Transformer RGBT Tracking with Spatio-Temporal Multimodal Tokens

    Authors: Dengdi Sun, Yajie Pan, Andong Lu, Chenglong Li, Bin Luo

    Abstract: Many RGBT tracking researches primarily focus on modal fusion design, while overlooking the effective handling of target appearance changes. While some approaches have introduced historical frames or fuse and replace initial templates to incorporate temporal information, they have the risk of disrupting the original target appearance and accumulating errors over time. To alleviate these limitation… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

  30. arXiv:2401.01461  [pdf, other

    cs.CV

    Efficient Hybrid Zoom using Camera Fusion on Mobile Phones

    Authors: Xiaotong Wu, Wei-Sheng Lai, YiChang Shih, Charles Herrmann, Michael Krainin, Deqing Sun, Chia-Kai Liang

    Abstract: DSLR cameras can achieve multiple zoom levels via shifting lens distances or swap** lens types. However, these techniques are not possible on smartphone devices due to space constraints. Most smartphone manufacturers adopt a hybrid zoom system: commonly a Wide (W) camera at a low zoom level and a Telephoto (T) camera at a high zoom level. To simulate zoom levels between W and T, these systems cr… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: Accepted to SIGGRAPH Asia 2023 (ACM TOG). Project website: https://www.wslai.net/publications/fusion_zoom

  31. arXiv:2401.00935  [pdf, other

    cs.CV

    Boundary Attention: Learning to Localize Boundaries under High Noise

    Authors: Mia Gaia Polansky, Charles Herrmann, Junhwa Hur, Deqing Sun, Dor Verbin, Todd Zickler

    Abstract: We present a differentiable model that infers explicit boundaries, including curves, corners and junctions, using a mechanism that we call boundary attention. Boundary attention is a boundary-aware local attention operation that, when applied densely and repeatedly, progressively refines a field of variables that specify an unrasterized description of the local boundary structure in every overlapp… ▽ More

    Submitted 18 March, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

    Comments: Project website at boundaryattention.github.io: http://boundaryattention.github.io

  32. arXiv:2312.16787  [pdf

    cs.RO

    L-LO: Enhancing Pose Estimation Precision via a Landmark-Based LiDAR Odometry

    Authors: Feiya Li, Chunyun Fu, Dongye Sun

    Abstract: The majority of existing LiDAR odometry solutions are based on simple geometric features such as points, lines or planes which cannot fully reflect the characteristics of surrounding environments. In this study, we propose a novel LiDAR odometry which effectively utilizes the overall exterior characteristics of environmental landmarks. The vehicle pose estimation is accomplished by means of two se… ▽ More

    Submitted 27 December, 2023; originally announced December 2023.

  33. arXiv:2312.13252  [pdf, other

    cs.CV

    Zero-Shot Metric Depth with a Field-of-View Conditioned Diffusion Model

    Authors: Saurabh Saxena, Junhwa Hur, Charles Herrmann, Deqing Sun, David J. Fleet

    Abstract: While methods for monocular depth estimation have made significant strides on standard benchmarks, zero-shot metric depth estimation remains unsolved. Challenges include the joint modeling of indoor and outdoor scenes, which often exhibit significantly different distributions of RGB and depth, and the depth-scale ambiguity due to unknown camera intrinsics. Recent work has proposed specialized mult… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  34. arXiv:2312.07920  [pdf, other

    cs.CV

    DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes

    Authors: Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang

    Abstract: We present DrivingGaussian, an efficient and effective framework for surrounding dynamic autonomous driving scenes. For complex scenes with moving objects, we first sequentially and progressively model the static background of the entire scene with incremental static 3D Gaussians. We then leverage a composite dynamic Gaussian graph to handle multiple moving objects, individually reconstructing eac… ▽ More

    Submitted 20 March, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  35. arXiv:2312.06553  [pdf, other

    cs.CV

    HOI-Diff: Text-Driven Synthesis of 3D Human-Object Interactions using Diffusion Models

    Authors: Xiaogang Peng, Yiming Xie, Zizhao Wu, Varun Jampani, Deqing Sun, Huaizu Jiang

    Abstract: We address the problem of generating realistic 3D human-object interactions (HOIs) driven by textual prompts. To this end, we take a modular design and decompose the complex task into simpler sub-tasks. We first develop a dual-branch diffusion model (HOI-DM) to generate both human and object motions conditioned on the input text, and encourage coherent motions by a cross-attention communication mo… ▽ More

    Submitted 15 March, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Project Page: https://neu-vi.github.io/HOI-Diff/

  36. arXiv:2312.03884  [pdf, other

    cs.CV cs.GR

    WonderJourney: Going from Anywhere to Everywhere

    Authors: Hong-Xing Yu, Haoyi Duan, Junhwa Hur, Kyle Sargent, Michael Rubinstein, William T. Freeman, Forrester Cole, Deqing Sun, Noah Snavely, Jiajun Wu, Charles Herrmann

    Abstract: We introduce WonderJourney, a modularized framework for perpetual 3D scene generation. Unlike prior work on view generation that focuses on a single type of scenes, we start at any user-provided location (by a text description or an image) and generate a journey through a long sequence of diverse yet coherently connected 3D scenes. We leverage an LLM to generate textual descriptions of the scenes… ▽ More

    Submitted 12 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Project website with video results: https://kovenyu.com/WonderJourney/

  37. arXiv:2312.02919  [pdf, other

    cs.CV

    Fine-grained Controllable Video Generation via Object Appearance and Context

    Authors: Hsin-** Huang, Yu-Chuan Su, Deqing Sun, Lu Jiang, Xuhui Jia, Yukun Zhu, Ming-Hsuan Yang

    Abstract: Text-to-video generation has shown promising results. However, by taking only natural languages as input, users often face difficulties in providing detailed information to precisely control the model's output. In this work, we propose fine-grained controllable video generation (FACTOR) to achieve detailed control. Specifically, FACTOR aims to control objects' appearances and context, including th… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Project page: https://hhsin**.github.io/factor

  38. arXiv:2311.17776  [pdf, other

    cs.CV

    One-Shot Open Affordance Learning with Foundation Models

    Authors: Gen Li, Deqing Sun, Laura Sevilla-Lara, Varun Jampani

    Abstract: We introduce One-shot Open Affordance Learning (OOAL), where a model is trained with just one example per base object category, but is expected to identify novel objects and affordances. While vision-language models excel at recognizing novel objects and scenes, they often struggle to understand finer levels of granularity such as affordances. To handle this issue, we conduct a comprehensive analy… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  39. arXiv:2311.17034  [pdf, other

    cs.CV

    Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence

    Authors: Junyi Zhang, Charles Herrmann, Junhwa Hur, Eric Chen, Varun Jampani, Deqing Sun, Ming-Hsuan Yang

    Abstract: While pre-trained large-scale vision models have shown significant promise for semantic correspondence, their features often struggle to grasp the geometry and orientation of instances. This paper identifies the importance of being geometry-aware for semantic correspondence and reveals a limitation of the features of current foundation models under simple post-processing. We show that incorporatin… ▽ More

    Submitted 24 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR 24, project page: https://telling-left-from-right.github.io/

  40. arXiv:2311.16081  [pdf, other

    cs.CV cs.AI

    ViT-Lens: Towards Omni-modal Representations

    Authors: Weixian Lei, Yixiao Ge, Kun Yi, Jianfeng Zhang, Difei Gao, Dylan Sun, Yuying Ge, Ying Shan, Mike Zheng Shou

    Abstract: Aiming to advance AI agents, large foundation models significantly improve reasoning and instruction execution, yet the current focus on vision and language neglects the potential of perceiving diverse modalities in open-world environments. However, the success of data-driven vision and language models is costly or even infeasible to be reproduced for rare modalities. In this paper, we present ViT… ▽ More

    Submitted 26 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: This work is a follow-up of arXiv:2308.10185. Accepted to CVPR2024

  41. arXiv:2311.12076  [pdf, other

    cs.CV

    Towards Few-shot Out-of-Distribution Detection

    Authors: Jiuqing Dong, Yongbin Gao, Heng Zhou, Jun Cen, Yifan Yao, Sook Yoon, Park Dong Sun

    Abstract: Out-of-distribution (OOD) detection is critical for ensuring the reliability of open-world intelligent systems. Despite the notable advancements in existing OOD detection methodologies, our study identifies a significant performance drop under the scarcity of training samples. In this context, we introduce a novel few-shot OOD detection benchmark, carefully constructed to address this gap. Our emp… ▽ More

    Submitted 30 January, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

  42. arXiv:2311.07198  [pdf, other

    cs.CV

    MonoDiffusion: Self-Supervised Monocular Depth Estimation Using Diffusion Model

    Authors: Shuwei Shao, Zhongcai Pei, Weihai Chen, Dingchi Sun, Peter C. Y. Chen, Zhengguo Li

    Abstract: Over the past few years, self-supervised monocular depth estimation that does not depend on ground-truth during the training phase has received widespread attention. Most efforts focus on designing different types of network architectures and loss functions or handling edge cases, e.g., occlusion and dynamic objects. In this work, we introduce a novel self-supervised depth estimation framework, du… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: 10 pages, 8 figures

  43. arXiv:2310.18130  [pdf, other

    cs.CL cs.HC

    DELPHI: Data for Evaluating LLMs' Performance in Handling Controversial Issues

    Authors: David Q. Sun, Artem Abzaliev, Hadas Kotek, Zidi Xiu, Christopher Klein, Jason D. Williams

    Abstract: Controversy is a reflection of our zeitgeist, and an important aspect to any discourse. The rise of large language models (LLMs) as conversational systems has increased public reliance on these systems for answers to their various questions. Consequently, it is crucial to systematically examine how these models respond to questions that pertaining to ongoing debates. However, few such datasets exi… ▽ More

    Submitted 7 November, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP Industry Track 2023

  44. arXiv:2310.17994  [pdf, other

    cs.CV cs.GR

    ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image

    Authors: Kyle Sargent, Zizhang Li, Tanmay Shah, Charles Herrmann, Hong-Xing Yu, Yunzhi Zhang, Eric Ryan Chan, Dmitry Lagun, Li Fei-Fei, Deqing Sun, Jiajun Wu

    Abstract: We introduce a 3D-aware diffusion model, ZeroNVS, for single-image novel view synthesis for in-the-wild scenes. While existing methods are designed for single objects with masked backgrounds, we propose new techniques to address challenges introduced by in-the-wild multi-object scenes with complex backgrounds. Specifically, we train a generative prior on a mixture of data sources that capture obje… ▽ More

    Submitted 23 April, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: Accepted to CVPR 2024. 12 pages

  45. arXiv:2310.08580  [pdf, other

    cs.CV cs.GR

    OmniControl: Control Any Joint at Any Time for Human Motion Generation

    Authors: Yiming Xie, Varun Jampani, Lei Zhong, Deqing Sun, Huaizu Jiang

    Abstract: We present a novel approach named OmniControl for incorporating flexible spatial control signals into a text-conditioned human motion generation model based on the diffusion process. Unlike previous methods that can only control the pelvis trajectory, OmniControl can incorporate flexible spatial control signals over different joints at different times with only one model. Specifically, we propose… ▽ More

    Submitted 14 April, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: ICLR 2024. Project page: https://neu-vi.github.io/omnicontrol/

  46. arXiv:2310.05717  [pdf, other

    cs.RO cs.AI cs.CV

    STOPNet: Multiview-based 6-DoF Suction Detection for Transparent Objects on Production Lines

    Authors: Yuxuan Kuang, Qin Han, Danshi Li, Qiyu Dai, Lian Ding, Dong Sun, Hanlin Zhao, He Wang

    Abstract: In this work, we present STOPNet, a framework for 6-DoF object suction detection on production lines, with a focus on but not limited to transparent objects, which is an important and challenging problem in robotic systems and modern industry. Current methods requiring depth input fail on transparent objects due to depth cameras' deficiency in sensing their geometry, while we proposed a novel fram… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Under Review. ICRA 2024 submission

  47. arXiv:2310.05551  [pdf, other

    cs.CE cs.AI cs.PL

    PST: Improving Quantitative Trading via Program Sketch-based Tuning

    Authors: Zhiming Li, Junzhe Jiang, Yushi Cao, Aixin Cui, Bozhi Wu, Bo Li, Yang Liu, Dongning Sun

    Abstract: Deep reinforcement learning (DRL) has revolutionized quantitative finance by achieving decent performance without significant human expert knowledge. Despite its achievements, we observe that the current state-of-the-art DRL models are still ineffective in identifying the market trend, causing them to miss good trading opportunities or suffer from large drawdowns when encountering market crashes.… ▽ More

    Submitted 24 April, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  48. arXiv:2310.05262  [pdf, other

    cs.CV

    Structure-Preserving Instance Segmentation via Skeleton-Aware Distance Transform

    Authors: Zudi Lin, Donglai Wei, Aarush Gupta, Xingyu Liu, Deqing Sun, Hanspeter Pfister

    Abstract: Objects with complex structures pose significant challenges to existing instance segmentation methods that rely on boundary or affinity maps, which are vulnerable to small errors around contacting pixels that cause noticeable connectivity change. While the distance transform (DT) makes instance interiors and boundaries more distinguishable, it tends to overlook the intra-object connectivity for in… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: MICCAI 2023 (Oral Presentation)

  49. arXiv:2309.16071  [pdf, other

    cs.SI

    Influence Pathway Discovery on Social Media

    Authors: Xinyi Liu, Ruijie Wang, Dachun Sun, **ning Li, Christina Youn, You Lyu, Jianyuan Zhan, Dayou Wu, Xinhe Xu, Mingjun Liu, Xinshuo Lei, Zhihao Xu, Yutong Zhang, Zehao Li, Qikai Yang, Tarek Abdelzaher

    Abstract: This paper addresses influence pathway discovery, a key emerging problem in today's online media. We propose a discovery algorithm that leverages recently published work on unsupervised interpretable ideological embedding, a map** of ideological beliefs (done in a self-supervised fashion) into interpretable low-dimensional spaces. Computing the ideological embedding at scale allows one to analyz… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: This paper is accepted by IEEE CIC as an invited vision paper

  50. arXiv:2309.13515  [pdf, other

    cs.RO eess.SY

    Learning-based Inverse Perception Contracts and Applications

    Authors: Dawei Sun, Benjamin C. Yang, Sayan Mitra

    Abstract: Perception modules are integral in many modern autonomous systems, but their accuracy can be subject to the vagaries of the environment. In this paper, we propose a learning-based approach that can automatically characterize the error of a perception module from data and use this for safe control. The proposed approach constructs an inverse perception contract (IPC) which generates a set that cont… ▽ More

    Submitted 3 March, 2024; v1 submitted 23 September, 2023; originally announced September 2023.