Skip to main content

Showing 1–50 of 249 results for author: Peng, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.00500  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Intrinsic PAPR for Point-level 3D Scene Albedo and Shading Editing

    Authors: Alireza Moazeni, Shichong Peng, Ke Li

    Abstract: Recent advancements in neural rendering have excelled at novel view synthesis from multi-view RGB images. However, they often lack the capability to edit the shading or colour of the scene at a detailed point-level, while ensuring consistency across different viewpoints. In this work, we address the challenge of point-level 3D scene albedo and shading editing from multi-view RGB images, focusing o… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  2. arXiv:2406.17600  [pdf, other

    cs.CL

    "Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?

    Authors: Beiduo Chen, Xinpeng Wang, Siyao Peng, Robert Litschko, Anna Korhonen, Barbara Plank

    Abstract: Human label variation (HLV) is a valuable source of information that arises when multiple human annotators provide different labels for valid reasons. In Natural Language Inference (NLI) earlier approaches to capturing HLV involve either collecting annotations from many crowd workers to represent human judgment distribution (HJD) or use expert linguists to provide detailed explanations for their c… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 22 pages, 9 figures

  3. arXiv:2406.16732  [pdf, other

    cs.CL

    CLIMATELI: Evaluating Entity Linking on Climate Change Data

    Authors: Shijia Zhou, Siyao Peng, Barbara Plank

    Abstract: Climate Change (CC) is a pressing topic of global importance, attracting increasing attention across research fields, from social sciences to Natural Language Processing (NLP). CC is also discussed in various settings and communication platforms, from academic publications to social media forums. Understanding who and what is mentioned in such data is a first critical step to gaining new insights… ▽ More

    Submitted 27 June, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

    Comments: 8 pages, accepted at ClimateNLP 2024 workshop @ ACL 2024

  4. arXiv:2406.12928  [pdf, other

    cs.LG cs.AI cs.CL

    Evaluating the Generalization Ability of Quantized LLMs: Benchmark, Analysis, and Toolbox

    Authors: Yijun Liu, Yuan Meng, Fang Wu, Shenhao Peng, Hang Yao, Chaoyu Guan, Chen Tang, Xinzhu Ma, Zhi Wang, Wenwu Zhu

    Abstract: Large language models (LLMs) have exhibited exciting progress in multiple scenarios, while the huge computational demands hinder their deployments in lots of real-world applications. As an effective means to reduce memory footprint and inference cost, quantization also faces challenges in performance degradation at low bit-widths. Understanding the impact of quantization on LLM capabilities, espec… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  5. arXiv:2406.12032  [pdf, other

    cs.IR

    Balancing Embedding Spectrum for Recommendation

    Authors: Shaowen Peng, Kazunari Sugiyama, Xin Liu, Tsunenori Mine

    Abstract: Modern recommender systems heavily rely on high-quality representations learned from high-dimensional sparse data. While significant efforts have been invested in designing powerful algorithms for extracting user preferences, the factors contributing to good representations have remained relatively unexplored. In this work, we shed light on an issue in the existing pair-wise learning paradigm (i.e… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  6. arXiv:2406.08827  [pdf, other

    cs.IR

    How Powerful is Graph Filtering for Recommendation

    Authors: Shaowen Peng, Xin Liu, Kazunari Sugiyama, Tsunenori Mine

    Abstract: It has been shown that the effectiveness of graph convolutional network (GCN) for recommendation is attributed to the spectral graph filtering. Most GCN-based methods consist of a graph filter or followed by a low-rank map** optimized based on supervised training. However, we show two limitations suppressing the power of graph filtering: (1) Lack of generality. Due to the varied noise distributi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to KDD'24

  7. arXiv:2406.05533  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    PAPR in Motion: Seamless Point-level 3D Scene Interpolation

    Authors: Shichong Peng, Yanshu Zhang, Ke Li

    Abstract: We propose the problem of point-level 3D scene interpolation, which aims to simultaneously reconstruct a 3D scene in two states from multiple views, synthesize smooth point-level interpolations between them, and render the scene from novel viewpoints, all without any supervision between the states. The primary challenge is on achieving a smooth transition between states that may involve significan… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  8. arXiv:2406.03250  [pdf, other

    cs.CV cs.AI

    Prompt-based Visual Alignment for Zero-shot Policy Transfer

    Authors: Haihan Gao, Rui Zhang, Qi Yi, Hantao Yao, Haochen Li, Jiaming Guo, Shaohui Peng, Yunkai Gao, QiCheng Wang, Xing Hu, Yuanbo Wen, Zihao Zhang, Zidong Du, Ling Li, Qi Guo, Yunji Chen

    Abstract: Overfitting in RL has become one of the main obstacles to applications in reinforcement learning(RL). Existing methods do not provide explicit semantic constrain for the feature extractor, hindering the agent from learning a unified cross-domain representation and resulting in performance degradation on unseen domains. Besides, abundant data from multiple domains are needed. To address these issue… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by ICML2024

  9. arXiv:2406.01832  [pdf, other

    cs.RO cs.AI cs.HC

    A Robust Filter for Marker-less Multi-person Tracking in Human-Robot Interaction Scenarios

    Authors: Enrico Martini, Harshil Parekh, Shaoting Peng, Nicola Bombieri, Nadia Figueroa

    Abstract: Pursuing natural and marker-less human-robot interaction (HRI) has been a long-standing robotics research focus, driven by the vision of seamless collaboration without physical markers. Marker-less approaches promise an improved user experience, but state-of-the-art struggles with the challenges posed by intrinsic errors in human pose estimation (HPE) and depth cameras. These errors can lead to is… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Published by and copyright protected by IEEE, 6 pages, 3 figures, 33rd IEEE International Conference on Robot & Human Interactive Communication (RO-MAN 2024)

  10. arXiv:2405.20141  [pdf, other

    cs.CV

    OpenDAS: Domain Adaptation for Open-Vocabulary Segmentation

    Authors: Gonca Yilmaz, Songyou Peng, Francis Engelmann, Marc Pollefeys, Hermann Blum

    Abstract: The advent of Vision Language Models (VLMs) transformed image understanding from closed-set classifications to dynamic image-language interactions, enabling open-vocabulary segmentation. Despite this flexibility, VLMs often fall behind closed-set classifiers in accuracy due to their reliance on ambiguous image captions and lack of domain-specific knowledge. We, therefore, introduce a new task doma… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  11. arXiv:2405.19718  [pdf, other

    cs.CV

    LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising

    Authors: Yuxing Duan, Shihan Peng, Lin Zhu, Wei Zhang, Yi Chang, Sheng Zhong, Luxin Yan

    Abstract: Event camera has significant advantages in capturing dynamic scene information while being prone to noise interference, particularly in challenging conditions like low threshold and low illumination. However, most existing research focuses on gentle situations, hindering event camera applications in realistic complex scenarios. To tackle this limitation and advance the field, we construct a new pa… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024

  12. arXiv:2405.19295  [pdf, other

    cs.CV

    3D Neural Edge Reconstruction

    Authors: Lei Li, Songyou Peng, Zehao Yu, Shaohui Liu, Rémi Pautrat, Xiaochuan Yin, Marc Pollefeys

    Abstract: Real-world objects and environments are predominantly composed of edge features, including straight lines and curves. Such edges are crucial elements for various applications, such as CAD modeling, surface meshing, lane map**, etc. However, existing traditional methods only prioritize lines over curves for simplicity in geometric modeling. To this end, we introduce EMAP, a new method for learnin… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: Project page: https://neural-edge-map.github.io

  13. arXiv:2405.18715  [pdf, other

    cs.CV

    NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild

    Authors: Weining Ren, Zihan Zhu, Boyang Sun, Jiaqi Chen, Marc Pollefeys, Songyou Peng

    Abstract: Neural Radiance Fields (NeRFs) have shown remarkable success in synthesizing photorealistic views from multi-view images of static scenes, but face challenges in dynamic, real-world environments with distractors like moving objects, shadows, and lighting changes. Existing methods manage controlled environments and low occlusion ratios but fall short in render quality, especially under high occlusi… ▽ More

    Submitted 2 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: CVPR 2024, first two authors contributed equally. Project Page: https://rwn17.github.io/nerf-on-the-go/

  14. arXiv:2405.17374  [pdf, other

    cs.LG

    Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

    Authors: ShengYun Peng, Pin-Yu Chen, Matthew Hull, Duen Horng Chau

    Abstract: Safety alignment is the key to guiding the behaviors of large language models (LLMs) that are in line with human preferences and restrict harmful behaviors at inference time, but recent studies show that it can be easily compromised by finetuning with only a few adversarially designed training examples. We aim to measure the risks in finetuning LLMs through navigating the LLM safety landscape. We… ▽ More

    Submitted 28 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  15. arXiv:2405.15414  [pdf, other

    cs.AI

    Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification

    Authors: Yuxuan Guo, Shaohui Peng, Jiaming Guo, Di Huang, Xishan Zhang, Rui Zhang, Yifan Hao, Ling Li, Zikang Tian, Mingju Gao, Yutai Li, Yiming Gan, Shuai Liang, Zihao Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen

    Abstract: Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing. Existing LLM agents excel at long-horizon tasks with well-defined goals (e.g., `mine diamonds' in Minecraft). However, they encounter difficulties on creative tasks with open goals and abstract criteria due to the inability to bridge the gap between them, thus lacking feedback for self… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  16. arXiv:2405.10255  [pdf, other

    cs.CV cs.RO

    When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

    Authors: Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, **dong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu

    Abstract: As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context lear… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  17. arXiv:2405.07784  [pdf, other

    cs.CV

    Generating Human Motion in 3D Scenes from Text Descriptions

    Authors: Zhi Cen, Huai** Pi, Sida Peng, Zehong Shen, Minghui Yang, Shuai Zhu, Hujun Bao, Xiaowei Zhou

    Abstract: Generating human motions from textual descriptions has gained growing research interest due to its wide range of applications. However, only a few works consider human-scene interactions together with text conditions, which is crucial for visual and physical realism. This paper focuses on the task of generating human motions in 3D indoor scenes given text descriptions of the human-scene interactio… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: Project page: https://zju3dv.github.io/text_scene_motion

  18. arXiv:2405.07702  [pdf, other

    cs.CV cs.LG

    FORESEE: Multimodal and Multi-view Representation Learning for Robust Prediction of Cancer Survival

    Authors: Liangrui Pan, Yijun Peng, Yan Li, Yiyi Liang, Liwen Xu, Qingchun Liang, Shaoliang Peng

    Abstract: Integrating the different data modalities of cancer patients can significantly improve the predictive performance of patient survival. However, most existing methods ignore the simultaneous utilization of rich semantic features at different scales in pathology images. When collecting multimodal data and extracting features, there is a likelihood of encountering intra-modality missing data, introdu… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  19. arXiv:2405.03181  [pdf, other

    cs.DC

    Collaborative Satellite Computing through Adaptive DNN Task Splitting and Offloading

    Authors: Shifeng Peng, Xuefeng Hou, Zhishu Shen, Qiushi Zheng, Jiong **, Atsushi Tagami, **gling Yuan

    Abstract: Satellite computing has emerged as a promising technology for next-generation wireless networks. This innovative technology provides data processing capabilities, which facilitates the widespread implementation of artificial intelligence (AI)-based applications, especially for image processing tasks involving deep neural network (DNN). With the limited computing resources of an individual satellit… ▽ More

    Submitted 20 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted by 29th IEEE Symposium on Computers and Communications (ISCC)

  20. arXiv:2405.01333  [pdf, other

    cs.RO cs.CV

    NeRF in Robotics: A Survey

    Authors: Guangming Wang, Lei Pan, Songyou Peng, Shaohui Liu, Chenfeng Xu, Yanzi Miao, Wei Zhan, Masayoshi Tomizuka, Marc Pollefeys, Hesheng Wang

    Abstract: Meticulous 3D environment representations have been a longstanding goal in computer vision and robotics fields. The recent emergence of neural implicit representations has introduced radical innovation to this field as implicit representations enable numerous capabilities. Among these, the Neural Radiance Field (NeRF) has sparked a trend because of the huge representational advantages, such as sim… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 21 pages, 19 figures

  21. arXiv:2404.19594  [pdf, other

    cs.RO eess.SY

    Reactive Temporal Logic-based Planning and Control for Interactive Robotic Tasks

    Authors: Farhad Nawaz, Shaoting Peng, Lars Lindemann, Nadia Figueroa, Nikolai Matni

    Abstract: Robots interacting with humans must be safe, reactive and adapt online to unforeseen environmental and task changes. Achieving these requirements concurrently is a challenge as interactive planners lack formal safety guarantees, while safe motion planners lack flexibility to adapt. To tackle this, we propose a modular control architecture that generates both safe and reactive motion plans for huma… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  22. arXiv:2404.17569  [pdf, other

    cs.CV

    MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

    Authors: Shangzhan Zhang, Sida Peng, Tao Xu, Yuanbo Yang, Tianrun Chen, Nan Xue, Yujun Shen, Hujun Bao, Ruizhen Hu, Xiaowei Zhou

    Abstract: This paper aims to generate materials for 3D meshes from text descriptions. Unlike existing methods that synthesize texture maps, we propose to generate segment-wise procedural material graphs as the appearance representation, which supports high-quality rendering and provides substantial flexibility in editing. Instead of relying on extensive paired data, i.e., 3D meshes with material graphs and… ▽ More

    Submitted 25 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: SIGGRAPH 2024. Project page: https://zju3dv.github.io/MaPa

  23. arXiv:2404.16748  [pdf, other

    cs.CV

    TELA: Text to Layer-wise 3D Clothed Human Generation

    Authors: Junting Dong, Qi Fang, Zehuan Huang, Xudong Xu, **gbo Wang, Sida Peng, Bo Dai

    Abstract: This paper addresses the task of 3D clothed human generation from textural descriptions. Previous works usually encode the human body and clothes as a holistic model and generate the whole model in a single-stage optimization, which makes them struggle for clothing editing and meanwhile lose fine-grained control over the whole generation process. To solve this, we propose a layer-wise clothed huma… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  24. arXiv:2404.16069  [pdf, other

    cs.HC cs.AI

    Interactive Visual Learning for Stable Diffusion

    Authors: Seongmin Lee, Benjamin Hoover, Hendrik Strobelt, Zijie J. Wang, ShengYun Peng, Austin Wright, Kevin Li, Haekyu Park, Haoyang Yang, Polo Chau

    Abstract: Diffusion-based generative models' impressive ability to create convincing images has garnered global attention. However, their complex internal structures and operations often pose challenges for non-experts to grasp. We introduce Diffusion Explainer, the first interactive visualization tool designed to elucidate how Stable Diffusion transforms text prompts into images. It tightly integrates a vi… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 4 pages, 3 figures. arXiv admin note: substantial text overlap with arXiv:2305.03509

  25. arXiv:2404.11884  [pdf, other

    cs.CV

    Seeing Motion at Nighttime with an Event Camera

    Authors: Haoyue Liu, Shihan Peng, Lin Zhu, Yi Chang, Hanyu Zhou, Luxin Yan

    Abstract: We focus on a very challenging task: imaging at nighttime dynamic scenes. Most previous methods rely on the low-light enhancement of a conventional RGB camera. However, they would inevitably face a dilemma between the long exposure time of nighttime and the motion blur of dynamic scenes. Event cameras react to dynamic changes with higher temporal resolution (microsecond) and higher dynamic range (… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  26. arXiv:2404.11593  [pdf, other

    cs.CV

    IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination

    Authors: Xi Chen, Sida Peng, Dongchen Yang, Yuan Liu, Bowen Pan, Chengfei Lv, Xiaowei Zhou

    Abstract: This paper aims to recover object materials from posed images captured under an unknown static lighting condition. Recent methods solve this task by optimizing material parameters through differentiable physically based rendering. However, due to the coupling between object geometry, materials, and environment lighting, there is inherent ambiguity during the inverse rendering process, preventing p… ▽ More

    Submitted 22 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Project page: https://zju3dv.github.io/IntrinsicAnything

  27. arXiv:2404.07932  [pdf, other

    cs.CV eess.IV

    FusionMamba: Efficient Image Fusion with State Space Model

    Authors: Siran Peng, Xiangyu Zhu, Haoyu Deng, Zhen Lei, Liang-Jian Deng

    Abstract: Image fusion aims to generate a high-resolution multi/hyper-spectral image by combining a high-resolution image with limited spectral information and a low-resolution image with abundant spectral data. Current deep learning (DL)-based methods for image fusion primarily rely on CNNs or Transformers to extract features and merge different types of data. While CNNs are efficient, their receptive fiel… ▽ More

    Submitted 10 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  28. arXiv:2404.05415  [pdf

    cs.CL cs.AI

    Relation Extraction Using Large Language Models: A Case Study on Acupuncture Point Locations

    Authors: Yiming Li, Xueqing Peng, Jianfu Li, Xu Zuo, Suyuan Peng, Donghong Pei, Cui Tao, Hua Xu, Na Hong

    Abstract: In acupuncture therapy, the accurate location of acupoints is essential for its effectiveness. The advanced language understanding capabilities of large language models (LLMs) like Generative Pre-trained Transformers (GPT) present a significant opportunity for extracting relations related to acupoint locations from textual knowledge sources. This study aims to compare the performance of GPT with t… ▽ More

    Submitted 14 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  29. arXiv:2404.04319  [pdf, other

    cs.CV

    SpatialTracker: Tracking Any 2D Pixels in 3D Space

    Authors: Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou

    Abstract: Recovering dense and long-range pixel motion in videos is a challenging problem. Part of the difficulty arises from the 3D-to-2D projection process, leading to occlusions and discontinuities in the 2D motion domain. While 2D motion can be intricate, we posit that the underlying 3D motion can often be simple and low-dimensional. In this work, we propose to estimate point trajectories in 3D space to… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024 (selected as highlight paper). Project page: https://henry123-boy.github.io/SpaTracker/

  30. arXiv:2404.01361  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    LLM Attributor: Interactive Visual Attribution for LLM Generation

    Authors: Seongmin Lee, Zijie J. Wang, Aishwarya Chakravarthy, Alec Helbling, ShengYun Peng, Mansi Phute, Duen Horng Chau, Minsuk Kahng

    Abstract: While large language models (LLMs) have shown remarkable capability to generate convincing text across diverse domains, concerns around its potential risks have highlighted the importance of understanding the rationale behind text generation. We present LLM Attributor, a Python library that provides interactive visualizations for training data attribution of an LLM's text generation. Our library o… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 8 pages, 3 figures, For a video demo, see https://youtu.be/mIG2MDQKQxM

  31. arXiv:2404.00054  [pdf, other

    cs.HC cs.GR cs.LG

    Choreographing the Digital Canvas: A Machine Learning Approach to Artistic Performance

    Authors: Siyuan Peng, Kate Ladenheim, Snehesh Shrestha, Cornelia Fermüller

    Abstract: This paper introduces the concept of a design tool for artistic performances based on attribute descriptions. To do so, we used a specific performance of falling actions. The platform integrates a novel machine-learning (ML) model with an interactive interface to generate and visualize artistic movements. Our approach's core is a cyclic Attribute-Conditioned Variational Autoencoder (AC-VAE) model… ▽ More

    Submitted 25 March, 2024; originally announced April 2024.

  32. arXiv:2403.19374  [pdf, other

    cs.ET eess.SY

    A noise-tolerant, resource-saving probabilistic binary neural network implemented by the SOT-MRAM compute-in-memory system

    Authors: Yu Gu, Puyang Huang, Tianhao Chen, Chenyi Fu, Aitian Chen, Shouzhong Peng, Xixiang Zhang, Xufeng Kou

    Abstract: We report a spin-orbit torque(SOT) magnetoresistive random-access memory(MRAM)-based probabilistic binary neural network(PBNN) for resource-saving and hardware noise-tolerant computing applications. With the presence of thermal fluctuation, the non-destructive SOT-driven magnetization switching characteristics lead to a random weight matrix with controllable probability distribution. In the meanwh… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 5 pages, 10 figures

    MSC Class: 94C60 ACM Class: B.2.4; B.3.0

  33. arXiv:2403.17245  [pdf, other

    cs.CL

    SPLICE: A Singleton-Enhanced PipeLIne for Coreference REsolution

    Authors: Yilun Zhu, Siyao Peng, Sameer Pradhan, Amir Zeldes

    Abstract: Singleton mentions, i.e.~entities mentioned only once in a text, are important to how humans understand discourse from a theoretical perspective. However previous attempts to incorporate their detection in end-to-end neural coreference resolution for English have been hampered by the lack of singleton mention spans in the OntoNotes benchmark. This paper addresses this limitation by combining predi… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  34. arXiv:2403.16112  [pdf, other

    cs.CV cs.AI cs.LG

    Opportunities and challenges in the application of large artificial intelligence models in radiology

    Authors: Liangrui Pan, Zhenyu Zhao, Ying Lu, Kewei Tang, Liyong Fu, Qingchun Liang, Shaoliang Peng

    Abstract: Influenced by ChatGPT, artificial intelligence (AI) large models have witnessed a global upsurge in large model research and development. As people enjoy the convenience by this AI large model, more and more large models in subdivided fields are gradually being proposed, especially large models in radiology imaging field. This article first introduces the development history of large models, techn… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  35. arXiv:2403.14621  [pdf, other

    cs.CV

    GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

    Authors: Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein

    Abstract: We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1s. GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Project page: https://justimyhxu.github.io/projects/grm/ Code: https://github.com/justimyhxu/GRM

  36. arXiv:2403.13560  [pdf, other

    cs.CL

    eRST: A Signaled Graph Theory of Discourse Relations and Organization

    Authors: Amir Zeldes, Tatsuya Aoyama, Yang Janet Liu, Siyao Peng, Debopam Das, Luke Gessler

    Abstract: In this article we present Enhanced Rhetorical Structure Theory (eRST), a new theoretical framework for computational discourse analysis, based on an expansion of Rhetorical Structure Theory (RST). The framework encompasses discourse relation graphs with tree-breaking, nonprojective and concurrent relations, as well as implicit and explicit signals which give explainable rationales to our analyses… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  37. arXiv:2403.12957  [pdf, other

    cs.CV

    GVGEN: Text-to-3D Generation with Volumetric Representation

    Authors: Xianglong He, Junyi Chen, Sida Peng, Di Huang, Yangguang Li, Xiaoshui Huang, Chun Yuan, Wanli Ouyang, Tong He

    Abstract: In recent years, 3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. To address these shortcomings, this paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input. We propose two innovative techniques:(1) Structured Vo… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: project page: https://gvgen.github.io/

  38. arXiv:2403.12749  [pdf, other

    cs.CL

    Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data

    Authors: Siyao Peng, Zihang Sun, Huangyan Shan, Marie Kolm, Verena Blaschke, Ekaterina Artemova, Barbara Plank

    Abstract: Named Entity Recognition (NER) is a fundamental task to extract key information from texts, but annotated resources are scarce for dialects. This paper introduces the first dialectal NER dataset for German, BarNER, with 161K tokens annotated on Bavarian Wikipedia articles (bar-wiki) and tweets (bar-tweet), using a schema adapted from German CoNLL 2006 and GermEval. The Bavarian dialect differs fro… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: LREC-COLING 2024

  39. arXiv:2403.10293  [pdf, other

    cs.CL

    MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank

    Authors: Verena Blaschke, Barbara Kovačić, Siyao Peng, Hinrich Schütze, Barbara Plank

    Abstract: Despite the success of the Universal Dependencies (UD) project exemplified by its impressive language breadth, there is still a lack in `within-language breadth': most treebanks focus on standard languages. Even for German, the language with the most annotations in UD, so far no treebank exists for one of its language varieties spoken by over 10M people: Bavarian. To contribute to closing this gap… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: LREC-COLING 2024

  40. arXiv:2403.09593  [pdf, other

    cs.CV

    Renovating Names in Open-Vocabulary Segmentation Benchmarks

    Authors: Haiwen Huang, Songyou Peng, Dan Zhang, Andreas Geiger

    Abstract: Names are essential to both human cognition and vision-language models. Open-vocabulary models utilize class names as text prompts to generalize to categories unseen during training. However, the precision of these names is often overlooked in existing datasets. In this paper, we address this underexplored problem by presenting a framework for "renovating" names in open-vocabulary segmentation ben… ▽ More

    Submitted 24 May, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

  41. arXiv:2403.09290  [pdf, other

    cs.CV cs.AI cs.LG

    SELECTOR: Heterogeneous graph network with convolutional masked autoencoder for multimodal robust prediction of cancer survival

    Authors: Liangrui Pan, Yijun Peng, Yan Li, Xiang Wang, Wenjuan Liu, Liwen Xu, Qingchun Liang, Shaoliang Peng

    Abstract: Accurately predicting the survival rate of cancer patients is crucial for aiding clinicians in planning appropriate treatment, reducing cancer-related medical expenses, and significantly enhancing patients' quality of life. Multimodal prediction of cancer patient survival offers a more comprehensive and precise approach. However, existing methods still grapple with challenges related to missing mu… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted on Computers in Biology and Medicine

  42. arXiv:2403.08231  [pdf, other

    cs.RO

    Object Permanence Filter for Robust Tracking with Interactive Robots

    Authors: Shaoting Peng, Margaret X. Wang, Julie A. Shah, Nadia Figueroa

    Abstract: Object permanence, which refers to the concept that objects continue to exist even when they are no longer perceivable through the senses, is a crucial aspect of human cognitive development. In this work, we seek to incorporate this understanding into interactive robots by proposing a set of assumptions and rules to represent object permanence in multi-object, multi-agent interactive scenarios. We… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 2024 IEEE International Conference on Robotics and Automation (ICRA)

  43. arXiv:2403.08149  [pdf, other

    cs.RO

    On the Feasibility of EEG-based Motor Intention Detection for Real-Time Robot Assistive Control

    Authors: Ho ** Choi, Satyajeet Das, Shaoting Peng, Ruzena Bajcsy, Nadia Figueroa

    Abstract: This paper explores the feasibility of employing EEG-based intention detection for real-time robot assistive control. We focus on predicting and distinguishing motor intentions of left/right arm movements by presenting: i) an offline data collection and training pipeline, used to train a classifier for left/right motion intention prediction, and ii) an online real-time prediction pipeline leveragi… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  44. arXiv:2403.07436  [pdf, other

    cs.CV

    JSTR: Joint Spatio-Temporal Reasoning for Event-based Moving Object Detection

    Authors: Hanyu Zhou, Zhiwei Shi, Hao Dong, Shihan Peng, Yi Chang, Luxin Yan

    Abstract: Event-based moving object detection is a challenging task, where static background and moving object are mixed together. Typically, existing methods mainly align the background events to the same spatial coordinate system via motion compensation to distinguish the moving object. However, they neglect the potential spatial tailing effect of moving object events caused by excessive motion, which may… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  45. arXiv:2403.05902  [pdf, other

    cs.CL

    MaiBaam Annotation Guidelines

    Authors: Verena Blaschke, Barbara Kovačić, Siyao Peng, Barbara Plank

    Abstract: This document provides the annotation guidelines for MaiBaam, a Bavarian corpus annotated with part-of-speech (POS) tags and syntactic dependencies. MaiBaam belongs to the Universal Dependencies (UD) project, and our annotations elaborate on the general and German UD version 2 guidelines. In this document, we detail how to preprocess and tokenize Bavarian data, provide an overview of the POS tags… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  46. arXiv:2403.04822  [pdf, other

    cs.CV cs.LG

    UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining

    Authors: ShengYun Peng, Aishwarya Chakravarthy, Seongmin Lee, Xiao**g Wang, Rajarajeswari Balasubramaniyan, Duen Horng Chau

    Abstract: Tables convey factual and quantitative data with implicit conventions created by humans that are often challenging for machines to parse. Prior work on table recognition (TR) has mainly centered around complex task-specific combinations of available inputs and tools. We present UniTable, a training framework that unifies both the training paradigm and training objective of TR. Its training paradig… ▽ More

    Submitted 27 May, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  47. arXiv:2403.04765  [pdf, other

    cs.CV

    Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed

    Authors: Yifan Wang, Xingyi He, Sida Peng, Dongli Tan, Xiaowei Zhou

    Abstract: We present a novel method for efficiently producing semi-dense matches across images. Previous detector-free matcher LoFTR has shown remarkable matching capability in handling large-viewpoint change and texture-poor scenarios but suffers from low efficiency. We revisit its design choices and derive multiple improvements for both efficiency and accuracy. One key observation is that performing the t… ▽ More

    Submitted 11 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: CVPR 2024; Project page: https://zju3dv.github.io/efficientloftr

  48. arXiv:2403.02708  [pdf, other

    cs.SI

    Backfire Effect Reveals Early Controversy in Online Media

    Authors: Songtao Peng, Chenbo Fua, Han Han, Ye Wu, Kailun Zhu, Qi Xuan, Yong Min

    Abstract: The rapid development of online media has significantly facilitated the public's information consumption, knowledge acquisition, and opinion exchange. However, it has also led to more violent conflicts in online discussions. Therefore, controversy detection becomes important for computational and social sciences. Previous research on detection methods has primarily focused on larger datasets and m… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 17 pages, 6 figures

  49. arXiv:2403.01931  [pdf, other

    cs.CL

    VariErr NLI: Separating Annotation Error from Human Label Variation

    Authors: Leon Weber-Genzel, Siyao Peng, Marie-Catherine de Marneffe, Barbara Plank

    Abstract: Human label variation arises when annotators assign different labels to the same item for valid reasons, while annotation errors occur when labels are assigned for invalid reasons. These two issues are prevalent in NLP benchmarks, yet existing research has studied them in isolation. To the best of our knowledge, there exists no prior work that focuses on teasing apart error from signal, especially… ▽ More

    Submitted 6 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 14 pages, accepted at ACL 2024 main

  50. arXiv:2402.17985  [pdf, other

    cs.LG cs.AI cs.CL

    FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization

    Authors: Yi Zhang, Fei Yang, Shuang Peng, Fangyu Wang, Aimin Pan

    Abstract: Large language models (LLMs) have demonstrated state-of-the-art performance across various tasks. However, the latency of inference and the large GPU memory consumption of LLMs restrict their deployment performance. Recently, there have been some efficient attempts to quantize LLMs, yet inference with large batch size or long sequence still has the issue of being compute-bound. Fine-grained quanti… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.