Skip to main content

Showing 1–50 of 213 results for author: Ling, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.10324  [pdf, other

    cs.CV cs.LG

    L4GM: Large 4D Gaussian Reconstruction Model

    Authors: Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim, Huan Ling

    Abstract: We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second. Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Project page: https://research.nvidia.com/labs/toronto-ai/l4gm

  2. arXiv:2406.02040  [pdf, other

    cs.LG cs.AI

    DFA-GNN: Forward Learning of Graph Neural Networks by Direct Feedback Alignment

    Authors: Gongpei Zhao, Tao Wang, Congyan Lang, Yi **, Yidong Li, Haibin Ling

    Abstract: Graph neural networks are recognized for their strong performance across various applications, with the backpropagation algorithm playing a central role in the development of most GNN models. However, despite its effectiveness, BP has limitations that challenge its biological plausibility and affect the efficiency, scalability and parallelism of training neural networks for graph-based tasks. Whil… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  3. arXiv:2405.16940  [pdf, other

    cs.CV

    Adversarial Attacks on Both Face Recognition and Face Anti-spoofing Models

    Authors: Fengfan Zhou, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Lizhuang Ma, Hefei Ling

    Abstract: Adversarial attacks on Face Recognition (FR) systems have proven highly effective in compromising pure FR models, yet adversarial examples may be ineffective to the complete FR systems as Face Anti-Spoofing (FAS) models are often incorporated and can detect a significant number of them. To address this under-explored and essential problem, we propose a novel setting of adversarially attacking both… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  4. arXiv:2405.16226  [pdf, other

    cs.CV cs.LG

    Detecting Adversarial Data via Perturbation Forgery

    Authors: Qian Wang, Chen Li, Yuchen Luo, Hefei Ling, ** Li, Jiazhong Chen, Shijuan Huang, Ning Yu

    Abstract: As a defense strategy against adversarial attacks, adversarial detection aims to identify and filter out adversarial data from the data flow based on discrepancies in distribution and noise patterns between natural and adversarial data. Although previous detection methods achieve high performance in detecting gradient-based adversarial attacks, new attacks based on generative models with imbalance… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  5. arXiv:2405.15995  [pdf, other

    cs.CV

    Efficient Temporal Action Segmentation via Boundary-aware Query Voting

    Authors: Peiyao Wang, Yuewei Lin, Erik Blasch, Jie Wei, Haibin Ling

    Abstract: Although the performance of Temporal Action Segmentation (TAS) has improved in recent years, achieving promising results often comes with a high computational cost due to dense inputs, complex model structures, and resource-intensive post-processing requirements. To improve the efficiency while kee** the performance, we present a novel perspective centered on per-segment classification. By harne… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 17 pages, 8 figures, 11 tables

  6. arXiv:2405.05170  [pdf, other

    cs.MM cs.CV eess.IV

    Picking watermarks from noise (PWFN): an improved robust watermarking model against intensive distortions

    Authors: Si**g Xie, Chengxin Zhao, Nan Sun, Wei Li, Hefei Ling

    Abstract: Digital watermarking is the process of embedding secret information by altering images in an undetectable way to the human eye. To increase the robustness of the model, many deep learning-based watermarking methods use the encoder-noise-decoder architecture by adding different noises to the noise layer. The decoder then extracts the watermarked information from the distorted image. However, this m… ▽ More

    Submitted 17 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  7. arXiv:2405.03458  [pdf, other

    cs.CV

    SSyncOA: Self-synchronizing Object-aligned Watermarking to Resist Crop**-paste Attacks

    Authors: Chengxin Zhao, Hefei Ling, Si**g Xie, Han Fang, Yaokun Fang, Nan Sun

    Abstract: Modern image processing tools have made it easy for attackers to crop the region or object of interest in images and paste it into other images. The challenge this crop**-paste attack poses to the watermarking technology is that it breaks the synchronization of the image watermark, introducing multiple superimposed desynchronization distortions, such as rotation, scaling, and translation. Howeve… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 7 pages, 5 figures (Have been accepted by ICME 2024)

  8. arXiv:2405.03436  [pdf, other

    cs.CV cs.MM

    DBDH: A Dual-Branch Dual-Head Neural Network for Invisible Embedded Regions Localization

    Authors: Chengxin Zhao, Hefei Ling, Si**g Xie, Nan Sun, Zongyi Li, Yuxuan Shi, Jiazhong Chen

    Abstract: Embedding invisible hyperlinks or hidden codes in images to replace QR codes has become a hot topic recently. This technology requires first localizing the embedded region in the captured photos before decoding. Existing methods that train models to find the invisible embedded region struggle to obtain accurate localization results, leading to degraded decoding accuracy. This limitation is primari… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 7 pages, 6 figures (Have been accepted by IJCNN 2024)

  9. arXiv:2404.17484  [pdf, other

    cs.CV eess.IV

    Sparse Reconstruction of Optical Doppler Tomography Based on State Space Model

    Authors: Zhenghong Li, Jiaxiang Ren, Wensheng Cheng, Congwu Du, Yingtian Pan, Haibin Ling

    Abstract: Optical Doppler Tomography (ODT) is a blood flow imaging technique popularly used in bioengineering applications. The fundamental unit of ODT is the 1D frequency response along the A-line (depth), named raw A-scan. A 2D ODT image (B-scan) is obtained by first sensing raw A-scans along the B-line (width), and then constructing the B-scan from these raw A-scans via magnitude-phase analysis and post-… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 19 pages, 5 figures

  10. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi **, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, **g Lin, Alan Yuille, Ben Shao, ** Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  11. arXiv:2403.17155  [pdf, other

    cs.CL cs.CR

    Task-Agnostic Detector for Insertion-Based Backdoor Attacks

    Authors: Weimin Lyu, Xiao Lin, Songzhu Zheng, Lu Pang, Haibin Ling, Susmit Jha, Chao Chen

    Abstract: Textual backdoor attacks pose significant security threats. Current detection approaches, typically relying on intermediate feature representation or reconstructing potential triggers, are task-specific and less effective beyond sentence classification, struggling with tasks like question answering and named entity recognition. We introduce TABDet (Task-Agnostic Backdoor Detector), a pioneering ta… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Findings of NAACL 2024

  12. arXiv:2403.14559  [pdf, other

    cs.CV

    Visibility-Aware Keypoint Localization for 6DoF Object Pose Estimation

    Authors: Ruyi Lian, Haibin Ling

    Abstract: Localizing predefined 3D keypoints in a 2D image is an effective way to establish 3D-2D correspondences for 6DoF object pose estimation. However, unreliable localization results of invisible keypoints degrade the quality of correspondences. In this paper, we address this issue by localizing the important keypoints in terms of visibility. Since keypoint visibility information is currently missing i… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  13. arXiv:2403.13040  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Physics-Guided Neural Networks for Intraventricular Vector Flow Map**

    Authors: Hang Jung Ling, Salomé Bru, Julia Puig, Florian Vixège, Simon Mendez, Franck Nicoud, Pierre-Yves Courand, Olivier Bernard, Damien Garcia

    Abstract: Intraventricular vector flow map** (iVFM) seeks to enhance and quantify color Doppler in cardiac imaging. In this study, we propose novel alternatives to the traditional iVFM optimization scheme by utilizing physics-informed neural networks (PINNs) and a physics-guided nnU-Net-based supervised approach. When evaluated on simulated color Doppler images derived from a patient-specific computationa… ▽ More

    Submitted 27 June, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 12 pages, accepted for publication in IEEE TUFFC; camera ready corrections, corrected acknowledgments

  14. arXiv:2403.05231  [pdf, other

    cs.CV

    Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance

    Authors: Liting Lin, Heng Fan, Zhipeng Zhang, Yaowei Wang, Yong Xu, Haibin Ling

    Abstract: Motivated by the Parameter-Efficient Fine-Tuning (PEFT) in large language models, we propose LoRAT, a method that unveils the power of larger Vision Transformers (ViT) for tracking within laboratory-level resources. The essence of our work lies in adapting LoRA, a technique that fine-tunes a small subset of model parameters without adding inference latency, to the domain of visual tracking. Howeve… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  15. arXiv:2403.01777  [pdf, other

    cs.CL cs.CV

    NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models

    Authors: Lizhou Fan, Wenyue Hua, Xiang Li, Kaijie Zhu, Mingyu **, Lingyao Li, Haoyang Ling, **kui Chi, **dong Wang, Xin Ma, Yongfeng Zhang

    Abstract: Understanding the reasoning capabilities of Multimodal Large Language Models (MLLMs) is an important area of research. In this study, we introduce a dynamic benchmark, NPHardEval4V, aimed at addressing the existing gaps in evaluating the pure reasoning abilities of MLLMs. Our benchmark aims to provide a venue to disentangle the effect of various factors such as image recognition and instruction fo… ▽ More

    Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: 16 pages, 10 figures, 2 tables

  16. Envy-Free House Allocation with Minimum Subsidy

    Authors: Davin Choo, Yan Hao Ling, Warut Suksompong, Nicholas Teh, Jian Zhang

    Abstract: House allocation refers to the problem where $m$ houses are to be allocated to $n$ agents so that each agent receives one house. Since an envy-free house allocation does not always exist, we consider finding such an allocation in the presence of subsidy. We show that computing an envy-free allocation with minimum subsidy is NP-hard in general, but can be done efficiently if $m$ differs from $n$ by… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Journal ref: Operations Research Letters, 54:107103 (2024)

  17. arXiv:2402.16586  [pdf, other

    cs.CV

    Improving the JPEG-resistance of Adversarial Attacks on Face Recognition by Interpolation Smoothing

    Authors: Kefu Guo, Fengfan Zhou, Hefei Ling, ** Li, Hui Liu

    Abstract: JPEG compression can significantly impair the performance of adversarial face examples, which previous adversarial attacks on face recognition (FR) have not adequately addressed. Considering this challenge, we propose a novel adversarial attack on FR that aims to improve the resistance of adversarial examples against JPEG compression. Specifically, during the iterative process of generating advers… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  18. BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision

    Authors: Xin Zhao, Shiyu Hu, Yipei Wang, **g Zhang, Yimin Hu, Rongshuai Liu, Haibin Ling, Yin Li, Renshu Li, Kun Liu, Jiadong Li

    Abstract: Single object tracking (SOT) is a fundamental problem in computer vision, with a wide range of applications, including autonomous driving, augmented reality, and robot navigation. The robustness of SOT faces two main challenges: tiny target and fast motion. These challenges are especially manifested in videos captured by unmanned aerial vehicles (UAV), where the target is usually far away from the… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: This paper is published in IJCV (refer to DOI). Please cite the published IJCV

    Journal ref: Int J Comput Vis (2023)

  19. arXiv:2401.17527  [pdf, other

    cs.AI

    Learning to Stop Cut Generation for Efficient Mixed-Integer Linear Programming

    Authors: Haotian Ling, Zhihai Wang, Jie Wang

    Abstract: Cutting planes (cuts) play an important role in solving mixed-integer linear programs (MILPs), as they significantly tighten the dual bounds and improve the solving performance. A key problem for cuts is when to stop cuts generation, which is important for the efficiency of solving MILPs. However, many modern MILP solvers employ hard-coded heuristics to tackle this problem, which tends to neglect… ▽ More

    Submitted 2 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  20. arXiv:2401.13503  [pdf, other

    cs.CV

    Learning Representations for Clustering via Partial Information Discrimination and Cross-Level Interaction

    Authors: Hai-Xin Zhang, Dong Huang, Hua-Bao Ling, Guang-Yu Zhang, Wei-jun Sun, Zi-hao Wen

    Abstract: In this paper, we present a novel deep image clustering approach termed PICI, which enforces the partial information discrimination and the cross-level interaction in a joint learning framework. In particular, we leverage a Transformer encoder as the backbone, through which the masked image modeling with two paralleled augmented views is formulated. After deriving the class tokens from the masked… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  21. arXiv:2401.08903  [pdf, other

    cs.CV cs.LG

    Rethinking Impersonation and Dodging Attacks on Face Recognition Systems

    Authors: Fengfan Zhou, Qianyu Zhou, Bangjie Yin, Hui Zheng, Xuequan Lu, Lizhuang Ma, Hefei Ling

    Abstract: Face Recognition (FR) systems can be easily deceived by adversarial examples that manipulate benign face images through imperceptible perturbations. Adversarial attacks on FR encompass two types: impersonation (targeted) attacks and dodging (untargeted) attacks. Previous methods often achieve a successful impersonation attack on FR; However, it does not necessarily guarantee a successful dodging a… ▽ More

    Submitted 25 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  22. arXiv:2312.14890  [pdf, other

    cs.AI cs.CC cs.CL cs.LG

    NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes

    Authors: Lizhou Fan, Wenyue Hua, Lingyao Li, Haoyang Ling, Yongfeng Zhang

    Abstract: Complex reasoning ability is one of the most important features of current LLMs, which has also been leveraged to play an integral role in complex decision-making tasks. Therefore, the investigation into the reasoning capabilities of Large Language Models (LLMs) is critical: numerous benchmarks have been established to assess the reasoning abilities of LLMs. However, current benchmarks are inadequ… ▽ More

    Submitted 12 February, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: 23 pages, 7 figures, 2 tables

  23. arXiv:2312.13763  [pdf, other

    cs.CV cs.LG

    Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models

    Authors: Huan Ling, Seung Wook Kim, Antonio Torralba, Sanja Fidler, Karsten Kreis

    Abstract: Text-guided diffusion models have revolutionized image and video generation and have also been successfully used for optimization-based 3D object synthesis. Here, we instead focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects using score distillation methods with an additional temporal dimension. Compared to previous work, we pursue a novel compositional gener… ▽ More

    Submitted 3 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Project page: https://research.nvidia.com/labs/toronto-ai/AlignYourGaussians/

  24. arXiv:2312.09481  [pdf, other

    cs.CV cs.CR cs.LG

    Continual Adversarial Defense

    Authors: Qian Wang, Yaoyao Liu, Hefei Ling, Yingwei Li, Qihao Liu, ** Li, Jiazhong Chen, Alan Yuille, Ning Yu

    Abstract: In response to the rapidly evolving nature of adversarial attacks against visual classifiers on a monthly basis, numerous defenses have been proposed to generalize against as many known attacks as possible. However, designing a defense method that generalizes to all types of attacks is not realistic because the environment in which defense systems operate is dynamic and comprises various unique at… ▽ More

    Submitted 13 March, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

  25. arXiv:2311.14251  [pdf, ps, other

    cs.IT

    Optimal 1-bit Error Exponent for 2-hop Relaying with Binary-Input Channels

    Authors: Yan Hao Ling, Jonathan Scarlett

    Abstract: In this paper, we study the problem of relaying a single bit over a tandem of binary-input channels, with the goal of attaining the highest possible error exponent in the exponentially decaying error probability. Our previous work gave an exact characterization of the best possible error exponent in various special cases, including when the two channels are identical, but the general case was left… ▽ More

    Submitted 6 June, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: IEEE Transactions on Information Theory

  26. arXiv:2311.13409  [pdf, other

    cs.CV cs.MM

    CompenHR: Efficient Full Compensation for High-resolution Projector

    Authors: Yuxi Wang, Haibin Ling, Bingyao Huang

    Abstract: Full projector compensation is a practical task of projector-camera systems. It aims to find a projector input image, named compensation image, such that when projected it cancels the geometric and photometric distortions due to the physical environment and hardware. State-of-the-art methods use deep learning to address this problem and show promising performance for low-resolution setups. However… ▽ More

    Submitted 28 November, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

  27. arXiv:2311.11282  [pdf

    cs.CY cs.HC cs.SI

    Individual misinformation tagging reinforces echo chambers; Collective tagging does not

    Authors: Junsol Kim, Zhao Wang, Haohan Shi, Hsin-Keng Ling, James Evans

    Abstract: Fears about the destabilizing impact of misinformation online have motivated individuals and platforms to respond. Individuals have become empowered to challenge others' online claims with fact-checks in pursuit of a healthier information ecosystem and to break down echo chambers of self-reinforcing opinion. Using Twitter data, here we show the consequences of individual misinformation tagging: ta… ▽ More

    Submitted 9 June, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

    Comments: 68 pages

  28. arXiv:2311.04391  [pdf, other

    cs.CV

    3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features

    Authors: Chenfeng Xu, Huan Ling, Sanja Fidler, Or Litany

    Abstract: We present 3DiffTection, a state-of-the-art method for 3D object detection from single images, leveraging features from a 3D-aware diffusion model. Annotating large-scale image data for 3D detection is resource-intensive and time-consuming. Recently, pretrained large image diffusion models have become prominent as effective feature extractors for 2D perception tasks. However, these features are in… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Project page: \url{https://research.nvidia.com/labs/toronto-ai/3difftection/}

  29. arXiv:2311.04246  [pdf, other

    cs.CV

    ADFactory: An Effective Framework for Generalizing Optical Flow with Nerf

    Authors: Han Ling

    Abstract: A significant challenge facing current optical flow methods is the difficulty in generalizing them well to the real world. This is mainly due to the high cost of hand-crafted datasets, and existing self-supervised methods are limited by indirect loss and occlusions, resulting in fuzzy outcomes. To address this challenge, we introduce a novel optical flow training framework: automatic data factory… ▽ More

    Submitted 14 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: 8 pages

  30. arXiv:2310.17801  [pdf

    cs.CV

    Image Prior and Posterior Conditional Probability Representation for Efficient Damage Assessment

    Authors: Jie Wei, Weicong Feng, Erik Blasch, Erika Ardiles-Cruz, Haibin Ling

    Abstract: It is important to quantify Damage Assessment (DA) for Human Assistance and Disaster Response (HADR) applications. In this paper, to achieve efficient and scalable DA in HADR, an image prior and posterior conditional probability (IP2CP) is developed as an effective computational imaging representation. Equipped with the IP2CP representation, the matching pre- and post-disaster images are effective… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: 6 pages, 2 figures

    MSC Class: I.4.6; I.5.3

  31. arXiv:2310.14480  [pdf, other

    cs.LG

    Attention-Enhancing Backdoor Attacks Against BERT-based Models

    Authors: Weimin Lyu, Songzhu Zheng, Lu Pang, Haibin Ling, Chao Chen

    Abstract: Recent studies have revealed that \textit{Backdoor Attacks} can threaten the safety of natural language processing (NLP) models. Investigating the strategies of backdoor attacks will help to understand the model's vulnerability. Most existing textual backdoor attacks focus on generating stealthy triggers or modifying model weights. In this paper, we directly target the interior structure of neural… ▽ More

    Submitted 24 October, 2023; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: Findings of EMNLP 2023

  32. Salient Object Detection in Optical Remote Sensing Images Driven by Transformer

    Authors: Gongyang Li, Zhen Bai, Zhi Liu, Xinpeng Zhang, Haibin Ling

    Abstract: Existing methods for Salient Object Detection in Optical Remote Sensing Images (ORSI-SOD) mainly adopt Convolutional Neural Networks (CNNs) as the backbone, such as VGG and ResNet. Since CNNs can only extract features within certain receptive fields, most ORSI-SOD methods generally follow the local-to-contextual paradigm. In this paper, we propose a novel Global Extraction Local Exploration Networ… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: 13 pages, 6 figures, Accepted by IEEE Transactions on Image Processing 2023

  33. arXiv:2309.07330  [pdf, other

    cs.CV

    Automated Assessment of Critical View of Safety in Laparoscopic Cholecystectomy

    Authors: Yunfan Li, Himanshu Gupta, Haibin Ling, IV Ramakrishnan, Prateek Prasanna, Georgios Georgakis, Aaron Sasson

    Abstract: Cholecystectomy (gallbladder removal) is one of the most common procedures in the US, with more than 1.2M procedures annually. Compared with classical open cholecystectomy, laparoscopic cholecystectomy (LC) is associated with significantly shorter recovery period, and hence is the preferred method. However, LC is also associated with an increase in bile duct injuries (BDIs), resulting in significa… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  34. arXiv:2309.06701  [pdf, ps, other

    cs.CV cs.RO

    Transparent Object Tracking with Enhanced Fusion Module

    Authors: Kalyan Garigapati, Erik Blasch, Jie Wei, Haibin Ling

    Abstract: Accurate tracking of transparent objects, such as glasses, plays a critical role in many robotic tasks such as robot-assisted living. Due to the adaptive and often reflective texture of such objects, traditional tracking algorithms that rely on general-purpose learned features suffer from reduced performance. Recent research has proposed to instill transparency awareness into existing general obje… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: IEEE IROS 2023

  35. arXiv:2309.04063  [pdf, other

    cs.CV

    INSURE: An Information Theory Inspired Disentanglement and Purification Model for Domain Generalization

    Authors: Xi Yu, Huan-Hsin Tseng, Shinjae Yoo, Haibin Ling, Yuewei Lin

    Abstract: Domain Generalization (DG) aims to learn a generalizable model on the unseen target domain by only training on the multiple observed source domains. Although a variety of DG methods have focused on extracting domain-invariant features, the domain-specific class-relevant features have attracted attention and been argued to benefit generalization to the unseen target domain. To take into account the… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: 10 pages, 4 figures

  36. Improving Visual Quality and Transferability of Adversarial Attacks on Face Recognition Simultaneously with Adversarial Restoration

    Authors: Fengfan Zhou, Hefei Ling, Yuxuan Shi, Jiazhong Chen, ** Li

    Abstract: Adversarial face examples possess two critical properties: Visual Quality and Transferability. However, existing approaches rarely address these properties simultaneously, leading to subpar results. To address this issue, we propose a novel adversarial attack technique known as Adversarial Restoration (AdvRestore), which enhances both visual quality and transferability of adversarial face examples… ▽ More

    Submitted 18 March, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

    Comments: ©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  37. arXiv:2307.10046  [pdf, other

    cs.CV

    Divert More Attention to Vision-Language Object Tracking

    Authors: Mingzhe Guo, Zhipeng Zhang, Li** **g, Haibin Ling, Heng Fan

    Abstract: Multimodal vision-language (VL) learning has noticeably pushed the tendency toward generic intelligence owing to emerging large foundation models. However, tracking, as a fundamental vision problem, surprisingly enjoys less bonus from recent flourishing VL learning. We argue that the reasons are two-fold: the lack of large-scale vision-language annotated videos and ineffective vision-language inte… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: 16 pages, 9 figures

  38. arXiv:2307.08423  [pdf, other

    cs.LG physics.comp-ph

    Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

    Authors: Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Haiyang Yu, YuQing Xie, Xiang Fu, Alex Strasser, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence , et al. (38 additional authors not shown)

    Abstract: Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Sc… ▽ More

    Submitted 15 November, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

  39. arXiv:2307.07487  [pdf, other

    cs.CV cs.LG

    DreamTeacher: Pretraining Image Backbones with Deep Generative Models

    Authors: Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler

    Abstract: In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones. We propose to distill knowledge from a trained generative model into standard image backbones that have been well engineered for specific perception tasks. We investigate two types of knowledge distillation: 1) distilling… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Project page: https://research.nvidia.com/labs/toronto-ai/DreamTeacher/

  40. arXiv:2307.06527  [pdf, other

    cs.CV

    Free-Form Composition Networks for Egocentric Action Recognition

    Authors: Haoran Wang, Qinghua Cheng, Baosheng Yu, Yibing Zhan, Dapeng Tao, Liang Ding, Haibin Ling

    Abstract: Egocentric action recognition is gaining significant attention in the field of human action recognition. In this paper, we address data scarcity issue in egocentric action recognition from a compositional generalization perspective. To tackle this problem, we propose a free-form composition network (FFCN) that can simultaneously learn disentangled verb, preposition, and noun representations, and t… ▽ More

    Submitted 14 October, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

  41. arXiv:2306.13695  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Phase Unwrap** of Color Doppler Echocardiography using Deep Learning

    Authors: Hang Jung Ling, Olivier Bernard, Nicolas Ducros, Damien Garcia

    Abstract: Color Doppler echocardiography is a widely used non-invasive imaging modality that provides real-time information about the intracardiac blood flow. In an apical long-axis view of the left ventricle, color Doppler is subject to phase wrap**, or aliasing, especially during cardiac filling and ejection. When setting up quantitative methods based on color Doppler, it is necessary to correct this wr… ▽ More

    Submitted 5 July, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: 11 pages, accepted for publication in IEEE TUFFC, modified graphical abstract

  42. arXiv:2306.06788  [pdf, other

    cs.LG cs.AI

    Graph Mixup with Soft Alignments

    Authors: Hongyi Ling, Zhimeng Jiang, Meng Liu, Shuiwang Ji, Na Zou

    Abstract: We study graph data augmentation by mixup, which has been used successfully on images. A key operation of mixup is to compute a convex combination of a pair of inputs. This operation is straightforward for grid-like data, such as images, but challenging for graph data. The key difficulty lies in the fact that different graphs typically have different numbers of nodes, and thus there lacks a node-l… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

  43. arXiv:2305.01997  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Extraction of volumetric indices from echocardiography: which deep learning solution for clinical use?

    Authors: Hang Jung Ling, Nathan Painchaud, Pierre-Yves Courand, Pierre-Marc Jodoin, Damien Garcia, Olivier Bernard

    Abstract: Deep learning-based methods have spearheaded the automatic analysis of echocardiographic images, taking advantage of the publication of multiple open access datasets annotated by experts (CAMUS being one of the largest public databases). However, these models are still considered unreliable by clinicians due to unresolved issues concerning i) the temporal consistency of their predictions, and ii)… ▽ More

    Submitted 8 May, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: 10 pages, accepted for FIMH 2023; camera ready corrections, corrected acknowledgments

  44. arXiv:2304.13520  [pdf

    cs.NE q-bio.PE

    An Artificial Life Simulation Library Based on Genetic Algorithm, 3-Character Genetic Code and Biological Hierarchy

    Authors: Maurice HT Ling

    Abstract: Genetic algorithm (GA) is inspired by biological evolution of genetic organisms by optimizing the genotypic combinations encoded within each individual with the help of evolutionary operators, suggesting that GA may be a suitable model for studying real-life evolutionary processes. This paper describes the design of a Python library for artificial life simulation, Digital Organism Simulation Envir… ▽ More

    Submitted 18 February, 2023; originally announced April 2023.

    Journal ref: The Python Papers 7: 5 (2012)

  45. arXiv:2304.12411  [pdf

    cs.CL

    ChatGPT (Feb 13 Version) is a Chinese Room

    Authors: Maurice HT Ling

    Abstract: ChatGPT has gained both positive and negative publicity after reports suggesting that it is able to pass various professional and licensing examinations. This suggests that ChatGPT may pass Turing Test in the near future. However, a computer program that passing Turing Test can either mean that it is a Chinese Room or artificially conscious. Hence, the question of whether the current state of Chat… ▽ More

    Submitted 18 February, 2023; originally announced April 2023.

    Comments: 19 pages, 14 figures

  46. arXiv:2304.11359  [pdf, other

    cs.CV cs.AI

    Detecting Adversarial Faces Using Only Real Face Self-Perturbations

    Authors: Qian Wang, Yongqin Xian, Hefei Ling, **yuan Zhang, Xiaorui Lin, ** Li, Jiazhong Chen, Ning Yu

    Abstract: Adversarial attacks aim to disturb the functionality of a target system by adding specific noise to the input samples, bringing potential threats to security and robustness when applied to facial recognition systems. Although existing defense techniques achieve high accuracy in detecting some specific adversarial faces (adv-faces), new attack methods especially GAN-based attacks with completely di… ▽ More

    Submitted 3 May, 2023; v1 submitted 22 April, 2023; originally announced April 2023.

    Comments: IJCAI2023

  47. arXiv:2304.08818  [pdf, other

    cs.CV cs.LG

    Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models

    Authors: Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler, Karsten Kreis

    Abstract: Latent Diffusion Models (LDMs) enable high-quality image synthesis while avoiding excessive compute demands by training a diffusion model in a compressed lower-dimensional latent space. Here, we apply the LDM paradigm to high-resolution video generation, a particularly resource-intensive task. We first pre-train an LDM on images only; then, we turn the image generator into a video generator by int… ▽ More

    Submitted 27 December, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2023. Project page: https://research.nvidia.com/labs/toronto-ai/VideoLDM/

  48. arXiv:2304.02226  [pdf, ps, other

    cs.IT

    Maxflow-Based Bounds for Low-Rate Information Propagation over Noisy Networks

    Authors: Yan Hao Ling, Jonathan Scarlett

    Abstract: We study error exponents for the problem of low-rate communication over a directed graph, where each edge in the graph represents a noisy communication channel, and there is a single source and destination. We derive maxflow-based achievability and converse bounds on the error exponent that match when there are two messages and all channels satisfy a symmetry condition called pairwise reversibilit… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

  49. arXiv:2304.02110  [pdf, other

    cs.CV

    DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation

    Authors: Peiyao Wang, Haibin Ling

    Abstract: Fully supervised action segmentation works on frame-wise action recognition with dense annotations and often suffers from the over-segmentation issue. Existing works have proposed a variety of solutions such as boundary-aware networks, multi-stage refinement, and temporal smoothness losses. However, most of them take advantage of frame-wise supervision, which cannot effectively tackle the evaluati… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  50. arXiv:2303.16874  [pdf, other

    cs.CV

    CheckerPose: Progressive Dense Keypoint Localization for Object Pose Estimation with Graph Neural Network

    Authors: Ruyi Lian, Haibin Ling

    Abstract: Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task. Recent studies have shown the great potential of dense correspondence-based solutions, yet improvements are still needed to reach practical deployment. In this paper, we propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects. Firstly, CheckerPose densely… ▽ More

    Submitted 13 August, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

    Comments: Accepted by ICCV2023