Skip to main content

Showing 1–50 of 96 results for author: Cheng, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.20076  [pdf, other

    cs.CV

    EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model

    Authors: Yuxuan Zhang, Tianheng Cheng, Rui Hu, ei Liu, Heng Liu, Long** Ran, Xiaoxin Chen, Wenyu Liu, Xinggang Wang

    Abstract: Segment Anything Model (SAM) has attracted widespread attention for its superior interactive segmentation capabilities with visual prompts while lacking further exploration of text prompts. In this paper, we empirically investigate what text prompt encoders (e.g., CLIP or LLM) are good for adapting SAM for referring expression segmentation and introduce the Early Vision-language Fusion-based SAM (… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Preprint

  2. arXiv:2406.09611  [pdf, other

    cs.HC

    Recy-ctronics: Designing Fully Recyclable Electronics With Varied Form Factors

    Authors: Tingyu Cheng, Zhihan Zhang, Han Huang, Yingting Gao, Wei Sun, Gregory D. Abowd, HyunJoo Oh, Josiah Hester

    Abstract: For today's electronics manufacturing process, the emphasis on stable functionality, durability, and fixed physical forms is designed to ensure long-term usability. However, this focus on robustness and permanence complicates the disassembly and recycling processes, leading to significant environmental repercussions. In this paper, we present three approaches that leverage easily recyclable materi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2405.17921  [pdf

    cs.AI cs.CY

    Towards Clinical AI Fairness: Filling Gaps in the Puzzle

    Authors: Mingxuan Liu, Yilin Ning, Salinelat Teixayavong, Xiaoxuan Liu, Mayli Mertens, Yuqing Shang, Xin Li, Di Miao, Jie Xu, Daniel Shu Wei Ting, Lionel Tim-Ee Cheng, Jasmine Chiat Ling Ong, Zhen Ling Teo, Ting Fang Tan, Narrendar RaviChandran, Fei Wang, Leo Anthony Celi, Marcus Eng Hock Ong, Nan Liu

    Abstract: The ethical integration of Artificial Intelligence (AI) in healthcare necessitates addressing fairness-a concept that is highly context-specific across medical fields. Extensive studies have been conducted to expand the technical components of AI fairness, while tremendous calls for AI fairness have been raised from healthcare. Despite this, a significant disconnect persists between technical adva… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  4. arXiv:2404.06425  [pdf, other

    cs.CV

    ZeST: Zero-Shot Material Transfer from a Single Image

    Authors: Ta-Ying Cheng, Prafull Sharma, Andrew Markham, Niki Trigoni, Varun Jampani

    Abstract: We propose ZeST, a method for zero-shot material transfer to an object in the input image given a material exemplar image. ZeST leverages existing diffusion adapters to extract implicit material representation from the exemplar image. This representation is used to transfer the material using pre-trained inpainting diffusion model on the object in the input image using depth estimates as geometry… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Project Page: https://ttchengab.github.io/zest

  5. arXiv:2403.13438  [pdf, other

    cs.CV

    SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors

    Authors: Chenyang Ma, Kai Lu, Ta-Ying Cheng, Niki Trigoni, Andrew Markham

    Abstract: Current state-of-the-art spatial reasoning-enhanced VLMs are trained to excel at spatial visual question answering (VQA). However, we believe that higher-level 3D-aware tasks, such as articulating dynamic scene changes and motion planning, require a fundamental and explicit 3D understanding beyond current spatial VQA datasets. In this work, we present SpatialPIN, a framework designed to enhance th… ▽ More

    Submitted 6 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Project Page: https://dannymcy.github.io/zeroshot_task_hallucination/

  6. arXiv:2402.16028  [pdf, other

    cs.CR

    FedFDP: Fairness-Aware Federated Learning with Differential Privacy

    Authors: Xinpeng Ling, Jie Fu, Kuncan Wang, Huifa Li, Tong Cheng, Zhili Chen

    Abstract: Federated learning (FL) is a new machine learning paradigm to overcome the challenge of data silos and has garnered significant attention. However, through our observations, a globally effective trained model may performance disparities in different clients. This implies that the jointly trained models by clients may lead to unfair outcomes. On the other hand, relevant studies indicate that the tr… ▽ More

    Submitted 20 May, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

  7. arXiv:2402.15504  [pdf, other

    cs.CV cs.AI

    Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

    Authors: Chun-Hsiao Yeh, Ta-Ying Cheng, He-Yen Hsieh, Chuan-En Lin, Yi Ma, Andrew Markham, Niki Trigoni, H. T. Kung, Yubei Chen

    Abstract: Recent text-to-image diffusion models are able to learn and synthesize images containing novel, personalized concepts (e.g., their own pets or specific items) with just a few examples for training. This paper tackles two interconnected issues within this realm of personalizing text-to-image diffusion models. First, current personalization techniques fail to reliably extend to multiple concepts --… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: Preprint; Project Page: https://danielchyeh.github.io/Gen4Gen/

  8. arXiv:2402.08654  [pdf, other

    cs.CV

    Learning Continuous 3D Words for Text-to-Image Generation

    Authors: Ta-Ying Cheng, Matheus Gadelha, Thibault Groueix, Matthew Fisher, Radomir Mech, Andrew Markham, Niki Trigoni

    Abstract: Current controls over diffusion models (e.g., through text or ControlNet) for image generation fall short in recognizing abstract, continuous attributes like illumination direction or non-rigid shape change. In this paper, we present an approach for allowing users of text-to-image models to have fine-grained control of several attributes in an image. We do this by engineering special sets of input… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Project Page: https://ttchengab.github.io/continuous_3d_words

  9. arXiv:2402.04517  [pdf

    cs.RO

    Automating the audit of electronic invoices with a soft robot

    Authors: Tian Jun Cheng, Chia Jung Chen, Yao Lin Ong, Yi Fang Yang, Guang Yih Sheu

    Abstract: Taiwan's Chi Mei Medical Center has completed four challenges mentioned in published robotic process automation (RPA) studies including automating a dynamic process, designing feasible human-robot collaboration, incorporating other emerging technologies, and bringing positive business impacts. Its executives called a committee to implement the electronic invoicing. This implementation includes the… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 11 pages, 6 figures, 1 table

  10. arXiv:2402.01297  [pdf, other

    cs.LG stat.ML

    Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum

    Authors: Tin Sum Cheng, Aurelien Lucchi, Anastasis Kratsios, David Belius

    Abstract: We derive new bounds for the condition number of kernel matrices, which we then use to enhance existing non-asymptotic test error bounds for kernel ridgeless regression (KRR) in the over-parameterized regime for a fixed input dimension. For kernels with polynomial spectral decay, we recover the bound from previous work; for exponential decay, our bound is non-trivial and novel. Our contribution is… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  11. arXiv:2401.17270  [pdf, other

    cs.CV

    YOLO-World: Real-Time Open-Vocabulary Object Detection

    Authors: Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan

    Abstract: The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling an… ▽ More

    Submitted 22 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Work still in progress. Code & models are available at: https://github.com/AILab-CVC/YOLO-World

  12. arXiv:2401.09126  [pdf, other

    cs.CV cs.GR

    Objects With Lighting: A Real-World Dataset for Evaluating Reconstruction and Rendering for Object Relighting

    Authors: Benjamin Ummenhofer, Sanskar Agrawal, Rene Sepulveda, Yixing Lao, Kai Zhang, Tianhang Cheng, Stephan Richter, Shenlong Wang, German Ros

    Abstract: Reconstructing an object from photos and placing it virtually in a new environment goes beyond the standard novel view synthesis task as the appearance of the object has to not only adapt to the novel viewpoint but also to the new lighting conditions and yet evaluations of inverse rendering methods rely on novel view synthesis data or simplistic synthetic datasets for quantitative analysis. This w… ▽ More

    Submitted 13 April, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: Accepted at 3DV 2024, Oral presentation. For the project page see https://github.com/isl-org/objects-with-lighting

  13. arXiv:2401.05236  [pdf, other

    cs.CV

    Structure from Duplicates: Neural Inverse Graphics from a Pile of Objects

    Authors: Tianhang Cheng, Wei-Chiu Ma, Kaiyu Guan, Antonio Torralba, Shenlong Wang

    Abstract: Our world is full of identical objects (\emphe.g., cans of coke, cars of same model). These duplicates, when seen together, provide additional and strong cues for us to effectively reason about 3D. Inspired by this observation, we introduce Structure from Duplicates (SfD), a novel inverse graphics framework that reconstructs geometry, material, and illumination from a single image containing multi… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Code: https://github.com/Tianhang-Cheng/SfD

  14. arXiv:2312.08917  [pdf, other

    cs.CV cs.AI

    An Incremental Unified Framework for Small Defect Inspection

    Authors: Jiaqi Tang, Hao Lu, Xiaogang Xu, Ruizheng Wu, Sixing Hu, Tong Zhang, Tsz Wa Cheng, Ming Ge, Ying-Cong Chen, Fugee Tsung

    Abstract: Artificial Intelligence (AI)-driven defect inspection is pivotal in industrial manufacturing. Yet, many methods, tailored to specific pipelines, grapple with diverse product portfolios and evolving processes. Addressing this, we present the Incremental Unified Framework (IUF), which can reduce the feature conflict problem when continuously integrating new objects in the pipeline, making it advanta… ▽ More

    Submitted 24 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

  15. arXiv:2312.06053  [pdf, other

    cs.CL cs.LG

    IEKG: A Commonsense Knowledge Graph for Idiomatic Expressions

    Authors: Ziheng Zeng, Kellen Tan Cheng, Srihari Venkat Nanniyur, Jianing Zhou, Suma Bhat

    Abstract: Idiomatic expression (IE) processing and comprehension have challenged pre-trained language models (PTLMs) because their meanings are non-compositional. Unlike prior works that enable IE comprehension through fine-tuning PTLMs with sentences containing IEs, in this work, we construct IEKG, a commonsense knowledge graph for figurative interpretations of IEs. This extends the established ATOMIC2020… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

  16. arXiv:2312.04814  [pdf, other

    cs.GR

    A Unified Particle-Based Solver for Non-Newtonian Behaviors Simulation

    Authors: Chunlei Li, Yang Gao, Jiayi He, Tianwei Cheng, Shuai Li, Aimin Hao, Hong Qin

    Abstract: In this paper, we present a unified framework to simulate non-Newtonian behaviors. We combine viscous and elasto-plastic stress into a unified particle solver to achieve various non-Newtonian behaviors ranging from fluid-like to solid-like. Our constitutive model is based on a Generalized Maxwell model, which incorporates viscosity, elasticity and plasticity in one non-linear framework by a unifie… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: 12 pages

  17. arXiv:2310.19188  [pdf, other

    cs.CV

    3DMiner: Discovering Shapes from Large-Scale Unannotated Image Datasets

    Authors: Ta-Ying Cheng, Matheus Gadelha, Soren Pirk, Thibault Groueix, Radomir Mech, Andrew Markham, Niki Trigoni

    Abstract: We present 3DMiner -- a pipeline for mining 3D shapes from challenging large-scale unannotated image datasets. Unlike other unsupervised 3D reconstruction methods, we assume that, within a large-enough dataset, there must exist images of objects with similar shapes but varying backgrounds, textures, and viewpoints. Our approach leverages the recent advances in learning self-supervised image repres… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: In ICCV 2023

  18. arXiv:2310.17369  [pdf, other

    cs.CL

    Language and Mental Health: Measures of Emotion Dynamics from Text as Linguistic Biosocial Markers

    Authors: Daniela Teodorescu, Tiffany Cheng, Alona Fyshe, Saif M. Mohammad

    Abstract: Research in psychopathology has shown that, at an aggregate level, the patterns of emotional change over time -- emotion dynamics -- are indicators of one's mental health. One's patterns of emotion change have traditionally been determined through self-reports of emotions; however, there are known issues with accuracy, bias, and ease of data collection. Recent approaches to determining emotion dyn… ▽ More

    Submitted 4 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: 9 pages, 5 figures

  19. arXiv:2310.00987  [pdf, other

    cs.LG stat.ML

    A Theoretical Analysis of the Test Error of Finite-Rank Kernel Ridge Regression

    Authors: Tin Sum Cheng, Aurelien Lucchi, Ivan Dokmanić, Anastasis Kratsios, David Belius

    Abstract: Existing statistical learning guarantees for general kernel regressors often yield loose bounds when used with finite-rank kernels. Yet, finite-rank kernels naturally appear in several machine learning problems, e.g.\ when fine-tuning a pre-trained deep neural network's last layer to adapt it to a novel task when performing transfer learning. We address this gap for finite-rank kernel ridge regres… ▽ More

    Submitted 3 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

  20. arXiv:2309.15197  [pdf, other

    cs.HC cs.CY cs.SI

    A Tale of Two Cultures: Comparing Interpersonal Information Disclosure Norms on Twitter

    Authors: Mainack Mondal, Anju Punuru, Tyng-Wen Scott Cheng, Kenneth Vargas, Chaz Gundry, Nathan S Driggs, Noah Schill, Nathaniel Carlson, Josh Bedwell, Jaden Q Lorenc, Isha Ghosh, Yao Li, Nancy Fulda, Xinru Page

    Abstract: We present an exploration of cultural norms surrounding online disclosure of information about one's interpersonal relationships (such as information about family members, colleagues, friends, or lovers) on Twitter. The literature identifies the cultural dimension of individualism versus collectivism as being a major determinant of offline communication differences in terms of emotion, topic, and… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: This work will be presented at the 26th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW 2023). This paper will also be published in The Proceedings of the ACM on Human Computer Interaction

  21. arXiv:2309.08096  [pdf, other

    cs.RO

    GelSplitter: Tactile Reconstruction from Near Infrared and Visible Images

    Authors: Yuankai Lin, Yulin Zhou, Kaiji Huang, Qi Zhong, Tao Cheng, Hua Yang, Zhou** Yin

    Abstract: The GelSight-like visual tactile (VT) sensor has gained popularity as a high-resolution tactile sensing technology for robots, capable of measuring touch geometry using a single RGB camera. However, the development of multi-modal perception for VT sensors remains a challenge, limited by the mono camera. In this paper, we propose the GelSplitter, a new framework approach the multi-modal VT sensor w… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  22. arXiv:2308.15197  [pdf, other

    cs.AI cs.SI physics.soc-ph

    Where Would I Go Next? Large Language Models as Human Mobility Predictors

    Authors: Xinglei Wang, Meng Fang, Zichao Zeng, Tao Cheng

    Abstract: Accurate human mobility prediction underpins many important applications across a variety of domains, including epidemic modelling, transport planning, and emergency responses. Due to the sparsity of mobility data and the stochastic nature of people's daily activities, achieving precise predictions of people's locations remains a challenge. While recently developed large language models (LLMs) hav… ▽ More

    Submitted 9 January, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

    Comments: Major changes: Used the entire FSQ-NYC dataset (table 1). Used Geolife for ablation study (figure 5). Incorporated time-unknown prediction performance (table 2), robustness testing(section 5.6), and ethical statement (appendix). Reformatted the paper using double column template

  23. arXiv:2308.03434  [pdf, ps, other

    math.CO cs.DM

    Tyshkevich's Graph Decomposition and the Distinguishing Numbers of Unigraphs

    Authors: Christine T. Cheng

    Abstract: A $c$-labeling $φ: V(G) \rightarrow \{1, 2, \hdots, c \}$ of graph $G$ is distinguishing if, for every non-trivial automorphism $Ï€$ of $G$, there is some vertex $v$ so that $φ(v) \neq φ(Ï€(v))$. The distinguishing number of $G$, $D(G)$, is the smallest $c$ such that $G$ has a distinguishing $c$-labeling. We consider a compact version of Tyshkevich's graph decomposition theorem where trivial compo… ▽ More

    Submitted 26 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: 22 pages plus an appendix with 8 pages

  24. arXiv:2306.15670  [pdf, other

    cs.CV cs.RO

    Symphonize 3D Semantic Scene Completion with Contextual Instance Queries

    Authors: Haoyi Jiang, Tianheng Cheng, Naiyu Gao, Haoyang Zhang, Tianwei Lin, Wenyu Liu, Xinggang Wang

    Abstract: `3D Semantic Scene Completion (SSC) has emerged as a nascent and pivotal undertaking in autonomous driving, aiming to predict voxel occupancy within volumetric scenes. However, prevailing methodologies primarily focus on voxel-wise feature aggregation, while neglecting instance semantics and scene context. In this paper, we present a novel paradigm termed Symphonies (Scene-from-Insts), that delves… ▽ More

    Submitted 22 November, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

    Comments: Technical report. Code and models at: https://github.com/hustvl/Symphonies

  25. arXiv:2306.14649  [pdf, other

    cs.NE

    CIMulator: A Comprehensive Simulation Platform for Computing-In-Memory Circuit Macros with Low Bit-Width and Real Memory Materials

    Authors: Hoang-Hiep Le, Md. Aftab Baig, Wei-Chen Hong, Cheng-Hsien Tsai, Cheng-Jui Yeh, Fu-Xiang Liang, I-Ting Huang, Wei-Tzu Tsai, Ting-Yin Cheng, Sourav De, Nan-Yow Chen, Wen-Jay Lee, Ing-Chao Lin, Da-Wei Chang, Darsen D. Lu

    Abstract: This paper presents a simulation platform, namely CIMulator, for quantifying the efficacy of various synaptic devices in neuromorphic accelerators for different neural network architectures. Nonvolatile memory devices, such as resistive random-access memory, ferroelectric field-effect transistor, and volatile static random-access memory devices, can be selected as synaptic devices. A multilayer pe… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

  26. arXiv:2306.13653  [pdf, other

    cs.CV

    ProRes: Exploring Degradation-aware Visual Prompt for Universal Image Restoration

    Authors: Jiaqi Ma, Tianheng Cheng, Guoli Wang, Qian Zhang, Xinggang Wang, Lefei Zhang

    Abstract: Image restoration aims to reconstruct degraded images, e.g., denoising or deblurring. Existing works focus on designing task-specific methods and there are inadequate attempts at universal methods. However, simply unifying multiple tasks into one universal architecture suffers from uncontrollable and undesired predictions. To address those issues, we explore prompt learning in universal architectu… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  27. arXiv:2306.05584  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Multi-body SE(3) Equivariance for Unsupervised Rigid Segmentation and Motion Estimation

    Authors: Jia-Xing Zhong, Ta-Ying Cheng, Yuhang He, Kai Lu, Kaichen Zhou, Andrew Markham, Niki Trigoni

    Abstract: A truly generalizable approach to rigid segmentation and motion estimation is fundamental to 3D understanding of articulated objects and moving scenes. In view of the closely intertwined relationship between segmentation and motion estimates, we present an SE(3) equivariant architecture and a training strategy to tackle this task in an unsupervised manner. Our architecture is composed of two inter… ▽ More

    Submitted 31 October, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: To appear at NeurIPS 2023

  28. arXiv:2304.13493  [pdf

    cs.CY cs.AI

    Towards clinical AI fairness: A translational perspective

    Authors: Mingxuan Liu, Yilin Ning, Salinelat Teixayavong, Mayli Mertens, Jie Xu, Daniel Shu Wei Ting, Lionel Tim-Ee Cheng, Jasmine Chiat Ling Ong, Zhen Ling Teo, Ting Fang Tan, Ravi Chandran Narrendar, Fei Wang, Leo Anthony Celi, Marcus Eng Hock Ong, Nan Liu

    Abstract: Artificial intelligence (AI) has demonstrated the ability to extract insights from data, but the issue of fairness remains a concern in high-stakes fields such as healthcare. Despite extensive discussion and efforts in algorithm development, AI fairness and clinical concerns have not been adequately addressed. In this paper, we discuss the misalignment between technical and clinical perspectives o… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

  29. arXiv:2304.13289  [pdf, other

    cs.LG cs.NE

    Membrane Potential Distribution Adjustment and Parametric Surrogate Gradient in Spiking Neural Networks

    Authors: Siqi Wang, Tee Hiang Cheng, Meng-Hiot Lim

    Abstract: As an emerging network model, spiking neural networks (SNNs) have aroused significant research attentions in recent years. However, the energy-efficient binary spikes do not augur well with gradient descent-based training approaches. Surrogate gradient (SG) strategy is investigated and applied to circumvent this issue and train SNNs from scratch. Due to the lack of well-recognized SG selection rul… ▽ More

    Submitted 26 April, 2023; originally announced April 2023.

    Comments: 10 pages, 8 figures

  30. arXiv:2304.09807  [pdf, other

    cs.CV

    VMA: Divide-and-Conquer Vectorized Map Annotation System for Large-Scale Driving Scene

    Authors: Shaoyu Chen, Yunchi Zhang, Bencheng Liao, Jiafeng Xie, Tianheng Cheng, Wei Sui, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang

    Abstract: High-definition (HD) map serves as the essential infrastructure of autonomous driving. In this work, we build up a systematic vectorized map annotation framework (termed VMA) for efficiently generating HD map of large-scale driving scene. We design a divide-and-conquer annotation scheme to solve the spatial extensibility problem of HD map generation, and abstract map elements with a variety of geo… ▽ More

    Submitted 27 August, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

    Comments: https://github.com/hustvl/VMA

  31. arXiv:2304.03428  [pdf, other

    cs.CV

    TinyDet: Accurate Small Object Detection in Lightweight Generic Detectors

    Authors: Shaoyu Chen, Tianheng Cheng, Jiemin Fang, Qian Zhang, Yuan Li, Wenyu Liu, Xinggang Wang

    Abstract: Small object detection requires the detection head to scan a large number of positions on image feature maps, which is extremely hard for computation- and energy-efficient lightweight generic detectors. To accurately detect small objects with limited computation, we propose a two-stage lightweight detection framework with extremely low computation complexity, termed as TinyDet. It enables high-res… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

  32. arXiv:2303.17594  [pdf, other

    cs.CV

    MobileInst: Video Instance Segmentation on the Mobile

    Authors: Renhong Zhang, Tianheng Cheng, Shusheng Yang, Haoyi Jiang, Shuai Zhang, Jiancheng Lyu, Xin Li, Xiaowen Ying, Dashan Gao, Wenyu Liu, Xinggang Wang

    Abstract: Video instance segmentation on mobile devices is an important yet very challenging edge AI problem. It mainly suffers from (1) heavy computation and memory costs for frame-by-frame pixel-level instance perception and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile… ▽ More

    Submitted 18 December, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: Accepted by AAAI 2024 Main Track; Code will be released

  33. arXiv:2303.08815  [pdf, other

    cs.CV cs.RO

    Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction

    Authors: Bencheng Liao, Shaoyu Chen, Bo Jiang, Tianheng Cheng, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang

    Abstract: Online lane graph construction is a promising but challenging task in autonomous driving. Previous methods usually model the lane graph at the pixel or piece level, and recover the lane graph by pixel-wise or piece-wise connection, which breaks down the continuity of the lane. Human drivers focus on and drive along the continuous and complete paths instead of considering lane pieces. Autonomous ve… ▽ More

    Submitted 17 December, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

  34. Beamforming and Device Selection Design in Federated Learning with Over-the-air Aggregation

    Authors: Faeze Moradi Kalarde, Min Dong, Ben Liang, Yahia A. Eldemerdash Ahmed, Ho Ting Cheng

    Abstract: Federated learning (FL) with over-the-air computation can efficiently utilize the communication bandwidth but is susceptible to analog aggregation error. Excluding those devices with weak channel conditions can reduce the aggregation error, but it also limits the amount of local training data for FL, which can reduce the training convergence rate. In this work, we jointly design uplink receiver be… ▽ More

    Submitted 6 March, 2024; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: 12 pages, 8 figures

  35. arXiv:2302.00695  [pdf, other

    cs.LG hep-ex hep-ph stat.ML

    Versatile Energy-Based Probabilistic Models for High Energy Physics

    Authors: Taoli Cheng, Aaron Courville

    Abstract: As a classical generative modeling approach, energy-based models have the natural advantage of flexibility in the form of the energy function. Recently, energy-based models have achieved great success in modeling high-dimensional data in computer vision and natural language processing. In line with these advancements, we build a multi-purpose energy-based probabilistic model for High Energy Physic… ▽ More

    Submitted 18 January, 2024; v1 submitted 1 February, 2023; originally announced February 2023.

    Comments: 17 pages, 9 figures. NeurIPS 2023 camera ready

  36. arXiv:2212.02181  [pdf, other

    cs.CV cs.RO

    Perceive, Interact, Predict: Learning Dynamic and Static Clues for End-to-End Motion Prediction

    Authors: Bo Jiang, Shaoyu Chen, Xinggang Wang, Bencheng Liao, Tianheng Cheng, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang

    Abstract: Motion prediction is highly relevant to the perception of dynamic objects and static map elements in the scenarios of autonomous driving. In this work, we propose PIP, the first end-to-end Transformer-based framework which jointly and interactively performs online map**, object detection and motion prediction. PIP leverages map queries, agent queries and mode queries to encode the instance-wise… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

  37. arXiv:2210.13441  [pdf, other

    stat.ML cs.LG hep-ex hep-ph physics.data-an

    Bridging Machine Learning and Sciences: Opportunities and Challenges

    Authors: Taoli Cheng

    Abstract: The application of machine learning in sciences has seen exciting advances in recent years. As a widely applicable technique, anomaly detection has been long studied in the machine learning community. Especially, deep neural nets-based out-of-distribution detection has made great progress for high-dimensional data. Recently, these techniques have been showing their potential in scientific discipli… ▽ More

    Submitted 2 November, 2023; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: 8 pages, 3 figures

  38. arXiv:2210.05174  [pdf, other

    cs.CV

    BoxTeacher: Exploring High-Quality Pseudo Labels for Weakly Supervised Instance Segmentation

    Authors: Tianheng Cheng, Xinggang Wang, Shaoyu Chen, Qian Zhang, Wenyu Liu

    Abstract: Labeling objects with pixel-wise segmentation requires a huge amount of human labor compared to bounding boxes. Most existing methods for weakly supervised instance segmentation focus on designing heuristic losses with priors from bounding boxes. While, we find that box-supervised methods can produce some fine segmentation masks and we wonder whether the detectors could learn from these fine masks… ▽ More

    Submitted 17 March, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted to CVPR 2023. Code and models: https://github.com/hustvl/BoxTeacher

  39. arXiv:2209.10307  [pdf, other

    cs.CV

    Understanding the Tricks of Deep Learning in Medical Image Segmentation: Challenges and Future Directions

    Authors: Dong Zhang, Yi Lin, Hao Chen, Zhuotao Tian, Xin Yang, **hui Tang, Kwang Ting Cheng

    Abstract: Over the past few years, the rapid development of deep learning technologies for computer vision has significantly improved the performance of medical image segmentation (MedISeg). However, the diverse implementation strategies of various models have led to an extremely complex MedISeg system, resulting in a potential problem of unfair result comparisons. In this paper, we collect a series of MedI… ▽ More

    Submitted 8 May, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: Under submission

  40. arXiv:2209.02604  [pdf, other

    cs.MM cs.AI cs.CV cs.SD eess.AS

    Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module

    Authors: Yihe Liu, Ziqi Yuan, Huisheng Mao, Zhiyun Liang, Wanqiuyue Yang, Yuanzhe Qiu, Tie Cheng, Xiaoteng Li, Hua Xu, Kai Gao

    Abstract: Multimodal sentiment analysis (MSA), which supposes to improve text-based sentiment analysis with associated acoustic and visual modalities, is an emerging research area due to its potential applications in Human-Computer Interaction (HCI). However, the existing researches observe that the acoustic and visual modalities contribute much less than the textual modality, termed as text-predominant. Un… ▽ More

    Submitted 21 August, 2022; originally announced September 2022.

    Comments: 16pages, 7 figures, accepted by ICMI 2022

  41. arXiv:2208.14437  [pdf, other

    cs.CV cs.RO

    MapTR: Structured Modeling and Learning for Online Vectorized HD Map Construction

    Authors: Bencheng Liao, Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Wenyu Liu, Chang Huang

    Abstract: High-definition (HD) map provides abundant and precise environmental information of the driving scene, serving as a fundamental and indispensable component for planning in autonomous driving system. We present MapTR, a structured end-to-end Transformer for efficient online vectorized HD map construction. We propose a unified permutation-equivalent modeling approach, i.e., modeling map element as a… ▽ More

    Submitted 29 January, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

    Comments: Accepted to ICLR 2023 as Spotlight Presentation. Code&demos: https://github.com/hustvl/MapTR

  42. arXiv:2207.02255  [pdf, other

    cs.CV

    OSFormer: One-Stage Camouflaged Instance Segmentation with Transformers

    Authors: Jialun Pei, Tianyang Cheng, Deng-** Fan, He Tang, Chuanbo Chen, Luc Van Gool

    Abstract: We present OSFormer, the first one-stage transformer framework for camouflaged instance segmentation (CIS). OSFormer is based on two key designs. First, we design a location-sensing transformer (LST) to obtain the location label and instance-aware parameters by introducing the location-guided queries and the blend-convolution feedforward network. Second, we develop a coarse-to-fine fusion (CFF) to… ▽ More

    Submitted 2 August, 2022; v1 submitted 5 July, 2022; originally announced July 2022.

    Comments: This paper has been accepted by ECCV2022

  43. arXiv:2207.01878  [pdf, other

    cs.CV

    Vision-based Uneven BEV Representation Learning with Polar Rasterization and Surface Estimation

    Authors: Zhi Liu, Shaoyu Chen, Xiaojie Guo, Xinggang Wang, Tianheng Cheng, Hongmei Zhu, Qian Zhang, Wenyu Liu, Yi Zhang

    Abstract: In this work, we propose PolarBEV for vision-based uneven BEV representation learning. To adapt to the foreshortening effect of camera imaging, we rasterize the BEV space both angularly and radially, and introduce polar embedding decomposition to model the associations among polar grids. Polar grids are rearranged to an array-like regular representation for efficient processing. Besides, to determ… ▽ More

    Submitted 5 July, 2022; originally announced July 2022.

  44. arXiv:2206.10965  [pdf, other

    cs.CV

    Polar Parametrization for Vision-based Surround-View 3D Detection

    Authors: Shaoyu Chen, Xinggang Wang, Tianheng Cheng, Qian Zhang, Chang Huang, Wenyu Liu

    Abstract: 3D detection based on surround-view camera system is a critical technique in autopilot. In this work, we present Polar Parametrization for 3D detection, which reformulates position parametrization, velocity decomposition, perception range, label assignment and loss function in polar coordinate system. Polar Parametrization establishes explicit associations between image patterns and prediction tar… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

  45. arXiv:2206.06258  [pdf, other

    cs.CV

    Featurized Query R-CNN

    Authors: Wenqiang Zhang, Tianheng Cheng, Xinggang Wang, Shaoyu Chen, Qian Zhang, Wenyu Liu

    Abstract: The query mechanism introduced in the DETR method is changing the paradigm of object detection and recently there are many query-based methods have obtained strong object detection performance. However, the current query-based detection pipelines suffer from the following two issues. Firstly, multi-stage decoders are required to optimize the randomly initialized object queries, incurring a large c… ▽ More

    Submitted 20 June, 2022; v1 submitted 13 June, 2022; originally announced June 2022.

    Comments: Tech Report

  46. arXiv:2206.04584  [pdf, other

    cs.CV

    Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer

    Authors: Shaoyu Chen, Tianheng Cheng, Xinggang Wang, Wenming Meng, Qian Zhang, Wenyu Liu

    Abstract: Learning Bird's Eye View (BEV) representation from surrounding-view cameras is of great importance for autonomous driving. In this work, we propose a Geometry-guided Kernel Transformer (GKT), a novel 2D-to-BEV representation learning mechanism. GKT leverages the geometric priors to guide the transformer to focus on discriminative regions and unfolds kernel features to generate BEV representation.… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: Tech report. Work in progress

  47. High Performance Consensus without Duplication: Multi-pipeline Hotstuff

    Authors: Taining Cheng

    Abstract: The state-of-the-art HotStuff operates an efficient pipeline in which a stable leader drives decisions with linear communication and two round-trips of message. However, the unifying proposing-voting pattern is not sufficient to improve the bandwidth and concurrency performance of the modern system. In addition, the delay corresponding to two rounds of message to produce a certified proposal in th… ▽ More

    Submitted 7 July, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

  48. arXiv:2204.11457  [pdf, other

    cs.CL

    Islander: A Real-Time News Monitoring and Analysis System

    Authors: Chao-Wei Huang, Kai-Chou Yang, Zi-Yuan Chen, Hao-Chien Cheng, Po-Yu Wu, Yu-Yang Huang, Chung-Kai Hsieh, Geng-Zhi Wildsky Fann, Ting-Yin Cheng, Ethan Tu, Yun-Nung Chen

    Abstract: With thousands of news articles from hundreds of sources distributed and shared every day, news consumption and information acquisition have been increasingly difficult for readers. Additionally, the content of news articles is becoming catchy or even inciting to attract readership, harming the accuracy of news reporting. We present Islander, an online news analyzing system. The system allows user… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

  49. arXiv:2204.04217  [pdf

    eess.IV cs.AI cs.CV

    Feature-enhanced Adversarial Semi-supervised Semantic Segmentation Network for Pulmonary Embolism Annotation

    Authors: Ting-Wei Cheng, Jerry Chang, Ching-Chun Huang, Chin Kuo, Yun-Chien Cheng

    Abstract: This study established a feature-enhanced adversarial semi-supervised semantic segmentation model to automatically annotate pulmonary embolism lesion areas in computed tomography pulmonary angiogram (CTPA) images. In current studies, all of the PE CTPA image segmentation methods are trained by supervised learning. However, the supervised learning models need to be retrained and the images need to… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

  50. arXiv:2203.16001  [pdf, other

    cs.CV cs.LG cs.RO

    Meta-Sampler: Almost-Universal yet Task-Oriented Sampling for Point Clouds

    Authors: Ta-Ying Cheng, Qingyong Hu, Qian Xie, Niki Trigoni, Andrew Markham

    Abstract: Sampling is a key operation in point-cloud task and acts to increase computational efficiency and tractability by discarding redundant points. Universal sampling algorithms (e.g., Farthest Point Sampling) work without modification across different tasks, models, and datasets, but by their very nature are agnostic about the downstream task/model. As such, they have no implicit knowledge about which… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.