Skip to main content

Showing 1–50 of 54 results for author: Doermann, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.11327  [pdf, other

    cs.CV

    ClawMachine: Fetching Visual Tokens as An Entity for Referring and Grounding

    Authors: Tianren Ma, Lingxi Xie, Yunjie Tian, Boyu Yang, Yuan Zhang, David Doermann, Qixiang Ye

    Abstract: An essential topic for multimodal large language models (MLLMs) is aligning vision and language concepts at a finer level. In particular, we devote efforts to encoding visual referential information for tasks such as referring and grounding. Existing methods, including proxy encoding and geometry encoding, incorporate additional syntax to encode the object's location, bringing extra burdens in tra… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Project page: https://github.com/martian422/ClawMachine

  2. arXiv:2406.06890  [pdf, other

    cs.CV

    Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation

    Authors: Yuanhao Zhai, Kevin Lin, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Chung-Ching Lin, David Doermann, Junsong Yuan, Lijuan Wang

    Abstract: Image diffusion distillation achieves high-fidelity generation with very few sampling steps. However, applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to the limited visual quality in public video datasets. This affects the performance of both teacher and student video diffusion models. Our study aims to improve video diffusion distillation wh… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Project page: https://yhzhai.github.io/mcm/

  3. arXiv:2406.00258  [pdf, other

    cs.CV cs.AI

    Artemis: Towards Referential Understanding in Complex Videos

    Authors: Jihao Qiu, Yuan Zhang, Xi Tang, Lingxi Xie, Tianren Ma, Pengyu Yan, David Doermann, Qixiang Ye, Yunjie Tian

    Abstract: Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring. In this paper, we present Artemis, an MLLM that pushes video-based referential understanding to a finer level. Given a video, Artemis receives a natural-language quest… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

    Comments: 19 pages, 14 figures. Code and data are available at https://github.com/qiujihao19/Artemis

  4. arXiv:2404.16833  [pdf, other

    cs.CV cs.AI cs.LG

    Leaf-Based Plant Disease Detection and Explainable AI

    Authors: Saurav Sagar, Mohammed Javed, David S Doermann

    Abstract: The agricultural sector plays an essential role in the economic growth of a country. Specifically, in an Indian context, it is the critical source of livelihood for millions of people living in rural areas. Plant Disease is one of the significant factors affecting the agricultural sector. Plants get infected with diseases for various reasons, including synthetic fertilizers, archaic practices, env… ▽ More

    Submitted 16 December, 2023; originally announced April 2024.

    Comments: To appear in a Journal/Conference

  5. arXiv:2403.00209  [pdf, other

    cs.CV

    ChartReformer: Natural Language-Driven Chart Image Editing

    Authors: Pengyu Yan, Mahesh Bhosale, Jay Lal, Bikhyat Adhikari, David Doermann

    Abstract: Chart visualizations are essential for data interpretation and communication; however, most charts are only accessible in image format and lack the corresponding data tables and supplementary information, making it difficult to alter their appearance for different application scenarios. To eliminate the need for original underlying data and information to perform chart editing, we propose ChartRef… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: Published in ICDAR 2024. Code and model are available at https://github.com/pengyu965/ChartReformer

  6. arXiv:2312.14478  [pdf, other

    cs.LG

    Federated Learning via Input-Output Collaborative Distillation

    Authors: Xuan Gong, Shanglin Li, Yuxiang Bao, Barry Yao, Yawen Huang, Ziyan Wu, Baochang Zhang, Yefeng Zheng, David Doermann

    Abstract: Federated learning (FL) is a machine learning paradigm in which distributed local nodes collaboratively train a central model without sharing individually held private data. Existing FL methods either iteratively share local model parameters or deploy co-distillation. However, the former is highly susceptible to private data leakage, and the latter design relies on the prerequisites of task-releva… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted at AAAI 2024

  7. arXiv:2311.10234  [pdf

    cs.CV cs.AI

    The Analysis and Extraction of Structure from Organizational Charts

    Authors: Nikhil Manali, David Doermann, Mahesh Desai

    Abstract: Organizational charts, also known as org charts, are critical representations of an organization's structure and the hierarchical relationships between its components and positions. However, manually extracting information from org charts can be error-prone and time-consuming. To solve this, we present an automated and end-to-end approach that uses computer vision, deep learning, and natural langu… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  8. arXiv:2310.14469  [pdf, other

    cs.CV

    Player Re-Identification Using Body Part Appearences

    Authors: Mahesh Bhosale, Abhishek Kumar, David Doermann

    Abstract: We propose a neural network architecture that learns body part appearances for soccer player re-identification. Our model consists of a two-stream network (one stream for appearance map extraction and the other for body part map extraction) and a bilinear-pooling layer that generates and spatially pools the body part map. Each local feature of the body part map is obtained by a bilinear map** of… ▽ More

    Submitted 22 October, 2023; originally announced October 2023.

  9. arXiv:2309.01265  [pdf, other

    cs.CV

    SOAR: Scene-debiasing Open-set Action Recognition

    Authors: Yuanhao Zhai, Ziyi Liu, Zhenyu Wu, Yi Wu, Chunluan Zhou, David Doermann, Junsong Yuan, Gang Hua

    Abstract: Deep learning models have a risk of utilizing spurious clues to make predictions, such as recognizing actions based on the background scene. This issue can severely degrade the open-set action recognition performance when the testing samples have different scene distributions from the training samples. To mitigate this problem, we propose a novel method, called Scene-debiasing Open-set Action Reco… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV 2023, code:https://github.com/yhZhai/SOAR

  10. arXiv:2309.01246  [pdf, other

    cs.CV

    Towards Generic Image Manipulation Detection with Weakly-Supervised Self-Consistency Learning

    Authors: Yuanhao Zhai, Tianyu Luan, David Doermann, Junsong Yuan

    Abstract: As advanced image manipulation techniques emerge, detecting the manipulation becomes increasingly important. Despite the success of recent learning-based approaches for image manipulation detection, they typically require expensive pixel-level annotations to train, while exhibiting degraded performance when testing on images that are differently manipulated compared with training images. To addres… ▽ More

    Submitted 3 September, 2023; originally announced September 2023.

    Comments: Accepted to ICCV 2023, code: https://github.com/yhZhai/WSCL

  11. arXiv:2308.09611  [pdf, other

    cs.CV

    Language-guided Human Motion Synthesis with Atomic Actions

    Authors: Yuanhao Zhai, Mingzhen Huang, Tianyu Luan, Lu Dong, Ifeoma Nwogu, Siwei Lyu, David Doermann, Junsong Yuan

    Abstract: Language-guided human motion synthesis has been a challenging task due to the inherent complexity and diversity of human behaviors. Previous methods face limitations in generalization to novel actions, often resulting in unrealistic or incoherent motion sequences. In this paper, we propose ATOM (ATomic mOtion Modeling) to mitigate this problem, by decomposing actions into atomic actions, and emplo… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted to ACM MM 2023, code: https://github.com/yhZhai/ATOM

  12. arXiv:2308.01971  [pdf, other

    cs.CV cs.AI

    SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart Understanding

    Authors: Saleem Ahmed, Pengyu Yan, David Doermann, Srirangaraj Setlur, Venu Govindaraju

    Abstract: We introduce a novel bottom-up approach for the extraction of chart data. Our model utilizes images of charts as inputs and learns to detect keypoints (KP), which are used to reconstruct the components within the plot area. Our novelty lies in detecting a fusion of continuous and discrete KP as predicted heatmaps. A combination of sparse and dense per-pixel objectives coupled with a uni-modal self… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: Accepted ORAL at ICDAR 23

  13. arXiv:2307.07691  [pdf, other

    cs.CV cs.AI

    A Survey on Change Detection Techniques in Document Images

    Authors: Abhinandan Kumar Pun, Mohammed Javed, David S. Doermann

    Abstract: The problem of change detection in images finds application in different domains like diagnosis of diseases in the medical field, detecting growth patterns of cities through remote sensing, and finding changes in legal documents and contracts. However, this paper presents a survey on core techniques and rules to detect changes in different versions of a document image. Our discussions on change de… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Submitted to International Conference on Computer Vision and Machine Intelligence (CVMI) 2023

  14. arXiv:2307.05694  [pdf, other

    cs.IR cs.CV cs.LG

    A Survey on Figure Classification Techniques in Scientific Documents

    Authors: Anurag Dhote, Mohammed Javed, David S Doermann

    Abstract: Figures visually represent an essential piece of information and provide an effective means to communicate scientific facts. Recently there have been many efforts toward extracting data directly from figures, specifically from tables, diagrams, and plots, using different Artificial Intelligence and Machine Learning techniques. This is because removing information from figures could lead to deeper… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: Some contents of this paper appears in the accepted paper - "A Survey and Approach to Chart Classification" at 15th IAPR GREC 2023 at 17th ICDAR 2023, August 21-26, San Jose, USA. arXiv admin note: text overlap with arXiv:2307.04147

  15. arXiv:2307.04147  [pdf, other

    cs.CV cs.AI cs.LG

    A Survey and Approach to Chart Classification

    Authors: Anurag Dhote, Mohammed Javed, David S Doermann

    Abstract: Charts represent an essential source of visual information in documents and facilitate a deep understanding and interpretation of information typically conveyed numerically. In the scientific literature, there are many charts, each with its stylistic differences. Recently the document understanding community has begun to address the problem of automatic chart understanding, which begins with chart… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: Accepted in 15th IAPR Workshop on Graphics Recognition (GREC) 2023 in conjunction with 17th International Conference on Document Analysis and Recognition (ICDAR) 2023, August 21-26, 2023 San Jose, USA

  16. arXiv:2307.00198  [pdf, other

    cs.CV

    Filter Pruning for Efficient CNNs via Knowledge-driven Differential Filter Sampler

    Authors: Shaohui Lin, Wenxuan Huang, Jiao Xie, Baochang Zhang, Yunhang Shen, Zhou Yu, Jungong Han, David Doermann

    Abstract: Filter pruning simultaneously accelerates the computation and reduces the memory overhead of CNNs, which can be effectively applied to edge devices and cloud services. In this paper, we propose a novel Knowledge-driven Differential Filter Sampler~(KDFS) with Masked Filter Modeling~(MFM) framework for filter pruning, which globally prunes the redundant filters based on the prior knowledge of a pre-… ▽ More

    Submitted 30 June, 2023; originally announced July 2023.

  17. Context-Aware Chart Element Detection

    Authors: Pengyu Yan, Saleem Ahmed, David Doermann

    Abstract: As a prerequisite of chart data extraction, the accurate detection of chart basic elements is essential and mandatory. In contrast to object detection in the general image domain, chart element detection relies heavily on context information as charts are highly structured data visualization formats. To address this, we propose a novel method CACHED, which stands for Context-Aware Chart Element De… ▽ More

    Submitted 8 September, 2023; v1 submitted 6 May, 2023; originally announced May 2023.

    Comments: Published in ICDAR 2023. Code and model are available at https://github.com/pengyu965/ChartDete

  18. arXiv:2305.01837  [pdf, other

    cs.CV cs.AI

    LineFormer: Rethinking Line Chart Data Extraction as Instance Segmentation

    Authors: Jay Lal, Aditya Mitkari, Mahesh Bhosale, David Doermann

    Abstract: Data extraction from line-chart images is an essential component of the automated document understanding process, as line charts are a ubiquitous data visualization format. However, the amount of visual and structural variations in multi-line graphs makes them particularly challenging for automated parsing. Existing works, however, are not robust to all these variations, either taking an all-chart… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: Accepted to ICDAR 2023

  19. arXiv:2212.05223  [pdf, other

    cs.CV cs.AI

    Progressive Multi-view Human Mesh Recovery with Self-Supervision

    Authors: Xuan Gong, Liangchen Song, Meng Zheng, Benjamin Planche, Terrence Chen, Junsong Yuan, David Doermann, Ziyan Wu

    Abstract: To date, little attention has been given to multi-view 3D human mesh estimation, despite real-life applicability (e.g., motion capture, sport analysis) and robustness to single-view ambiguities. Existing solutions typically suffer from poor generalization performance to new settings, largely due to the limited diversity of image-mesh pairs in multi-view training data. To address this shortcoming,… ▽ More

    Submitted 10 December, 2022; originally announced December 2022.

    Comments: Accepted by AAAI2023

  20. arXiv:2210.08464  [pdf, other

    cs.LG cs.AI cs.CR

    Federated Learning with Privacy-Preserving Ensemble Attention Distillation

    Authors: Xuan Gong, Liangchen Song, Rishi Vedula, Abhishek Sharma, Meng Zheng, Benjamin Planche, Arun Innanje, Terrence Chen, Junsong Yuan, David Doermann, Ziyan Wu

    Abstract: Federated Learning (FL) is a machine learning paradigm where many local nodes collaboratively train a central model while kee** the training data decentralized. This is particularly relevant for clinical applications since patient data are usually not allowed to be transferred out of medical facilities, leading to the need for FL. Existing FL methods typically share model parameters or employ co… ▽ More

    Submitted 16 October, 2022; originally announced October 2022.

  21. arXiv:2209.10691  [pdf, other

    cs.CV cs.LG

    PREF: Predictability Regularized Neural Motion Fields

    Authors: Liangchen Song, Xuan Gong, Benjamin Planche, Meng Zheng, David Doermann, Junsong Yuan, Terrence Chen, Ziyan Wu

    Abstract: Knowing the 3D motions in a dynamic scene is essential to many vision applications. Recent progress is mainly focused on estimating the activity of some specific elements like humans. In this paper, we leverage a neural motion field for estimating the motion of all points in a multiview setting. Modeling the motion from a dynamic scene with multiview data is challenging due to the ambiguities in p… ▽ More

    Submitted 5 April, 2023; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: Accepted at ECCV 2022 (oral). Paper + supplementary material

  22. arXiv:2209.04599  [pdf, other

    cs.CR cs.CV cs.LG

    Preserving Privacy in Federated Learning with Ensemble Cross-Domain Knowledge Distillation

    Authors: Xuan Gong, Abhishek Sharma, Srikrishna Karanam, Ziyan Wu, Terrence Chen, David Doermann, Arun Innanje

    Abstract: Federated Learning (FL) is a machine learning paradigm where local nodes collaboratively train a central model while the training data remains decentralized. Existing FL methods typically share model parameters or employ co-distillation to address the issue of unbalanced data distribution. However, they suffer from communication bottlenecks. More importantly, they risk privacy leakage. In this wor… ▽ More

    Submitted 10 September, 2022; originally announced September 2022.

    Comments: Accepted by AAAI2022

  23. arXiv:2209.04596  [pdf, other

    cs.CV cs.AI

    Self-supervised Human Mesh Recovery with Cross-Representation Alignment

    Authors: Xuan Gong, Meng Zheng, Benjamin Planche, Srikrishna Karanam, Terrence Chen, David Doermann, Ziyan Wu

    Abstract: Fully supervised human mesh recovery methods are data-hungry and have poor generalizability due to the limited availability and diversity of 3D-annotated benchmark datasets. Recent progress in self-supervised human mesh recovery has been made using synthetic-data-driven training paradigms where the model is trained from synthetic paired 2D representation (e.g., 2D keypoints and segmentation masks)… ▽ More

    Submitted 10 September, 2022; originally announced September 2022.

    Comments: Accepted ECCV2022

  24. arXiv:2203.09082  [pdf, other

    cs.LG stat.ML

    Confidence Dimension for Deep Learning based on Hoeffding Inequality and Relative Evaluation

    Authors: Runqi Wang, Linlin Yang, Baochang Zhang, Wentao Zhu, David Doermann, Guodong Guo

    Abstract: Research on the generalization ability of deep neural networks (DNNs) has recently attracted a great deal of attention. However, due to their complex architectures and large numbers of parameters, measuring the generalization ability of specific DNN models remains an open challenge. In this paper, we propose to use multiple factors to measure and rank the relative generalization of DNNs based on a… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

  25. arXiv:2112.13989  [pdf, other

    cs.CV

    Associative Adversarial Learning Based on Selective Attack

    Authors: Runqi Wang, Xiaoyue Duan, Baochang Zhang, Song Xue, Wentao Zhu, David Doermann, Guodong Guo

    Abstract: A human's attention can intuitively adapt to corrupted areas of an image by recalling a similar uncorrupted image they have previously seen. This observation motivates us to improve the attention of adversarial images by considering their clean counterparts. To accomplish this, we introduce Associative Adversarial Learning (AAL) into adversarial learning to guide a selective attack. We formulate t… ▽ More

    Submitted 4 January, 2022; v1 submitted 27 December, 2021; originally announced December 2021.

  26. arXiv:2109.12055  [pdf, other

    cs.HC

    Using Physiological Information to Classify Task Difficulty in Human-Swarm Interaction

    Authors: Joseph P. Distefano, Hemanth Manjunatha, Souma Chowdhury, Karthik Dantu, David Doermann, Ehsan T. Esfahani

    Abstract: Human-swarm interaction has recently gained attention due to its plethora of new applications in disaster relief, surveillance, rescue, and exploration. However, if the task difficulty increases, the performance of the human operator decreases, thereby decreasing the overall efficacy of the human-swarm team. Thus, it is critical to identify the task difficulty and adaptively allocate the task to t… ▽ More

    Submitted 24 September, 2021; originally announced September 2021.

  27. arXiv:2109.05663  [pdf, other

    cs.MA cs.RO

    Learning Robot Swarm Tactics over Complex Adversarial Environments

    Authors: Amir Behjat, Hemanth Manjunatha, Prajit KrisshnaKumar, Apurv Jani, Leighton Collins, Payam Ghassemi, Joseph Distefano, David Doermann, Karthik Dantu, Ehsan Esfahani, Souma Chowdhury

    Abstract: To accomplish complex swarm robotic missions in the real world, one needs to plan and execute a combination of single robot behaviors, group primitives such as task allocation, path planning, and formation control, and mission-specific objectives such as target search and group coverage. Most such missions are designed manually by teams of robotics experts. Recent work in automated approaches to l… ▽ More

    Submitted 12 September, 2021; originally announced September 2021.

    Comments: Accepted to IEEE International Symposium on Multi-Robot and Multi-Agent Systems 2021

  28. arXiv:2107.10756   

    cs.CV cs.LG

    Semantic Text-to-Face GAN -ST^2FG

    Authors: Manan Oza, Sukalpa Chanda, David Doermann

    Abstract: Faces generated using generative adversarial networks (GANs) have reached unprecedented realism. These faces, also known as "Deep Fakes", appear as realistic photographs with very little pixel-level distortions. While some work has enabled the training of models that lead to the generation of specific properties of the subject, generating a facial image based on a natural language description has… ▽ More

    Submitted 13 December, 2023; v1 submitted 22 July, 2021; originally announced July 2021.

    Comments: Experiments needs to be redone

  29. arXiv:2106.10829  [pdf, other

    cs.CV

    Two-Stream Consensus Network: Submission to HACS Challenge 2021 Weakly-Supervised Learning Track

    Authors: Yuanhao Zhai, Le Wang, David Doermann, Junsong Yuan

    Abstract: This technical report presents our solution to the HACS Temporal Action Localization Challenge 2021, Weakly-Supervised Learning Track. The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos given only video-level labels. We adopt the two-stream consensus network (TSCN) as the main framework in this challenge. The TSCN… ▽ More

    Submitted 17 April, 2022; v1 submitted 20 June, 2021; originally announced June 2021.

    Comments: Second place solution to the HACS Weakly-Supervised Temporal Action Localization Challenge 2021. arXiv admin note: text overlap with arXiv:2010.11594

  30. arXiv:2106.10617  [pdf, other

    cs.LG

    Cogradient Descent for Dependable Learning

    Authors: Runqi Wang, Baochang Zhang, Li'an Zhuo, Qixiang Ye, David Doermann

    Abstract: Conventional gradient descent methods compute the gradients for multiple variables through the partial derivative. Treating the coupled variables independently while ignoring the interaction, however, leads to an insufficient optimization for bilinear models. In this paper, we propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem, p… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2006.09142

  31. arXiv:2106.03146  [pdf, other

    cs.CV

    Oriented Object Detection with Transformer

    Authors: Teli Ma, Mingyuan Mao, Honghui Zheng, Peng Gao, Xiaodi Wang, Shumin Han, Errui Ding, Baochang Zhang, David Doermann

    Abstract: Object detection with Transformers (DETR) has achieved a competitive performance over traditional detectors, such as Faster R-CNN. However, the potential of DETR remains largely unexplored for the more challenging task of arbitrary-oriented object detection problem. We provide the first attempt and implement Oriented Object DEtection with TRansformer ($\bf O^2DETR$) based on an end-to-end network.… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

  32. arXiv:2105.03139  [pdf, other

    cs.CV

    Probabilistic Ranking-Aware Ensembles for Enhanced Object Detections

    Authors: Mingyuan Mao, Baochang Zhang, David Doermann, Jie Guo, Shumin Han, Yuan Feng, Xiaodi Wang, Errui Ding

    Abstract: Model ensembles are becoming one of the most effective approaches for improving object detection performance already optimized for a single detector. Conventional methods directly fuse bounding boxes but typically fail to consider proposal qualities when combining detectors. This leads to a new problem of confidence discrepancy for the detector ensembles. The confidence has little effect on single… ▽ More

    Submitted 7 May, 2021; originally announced May 2021.

  33. arXiv:2103.14709  [pdf, other

    cs.RO cs.MA

    Scalable Coverage Path Planning of Multi-Robot Teams for Monitoring Non-Convex Areas

    Authors: Leighton Collins, Payam Ghassemi, Ehsan T. Esfahani, David Doermann, Karthik Dantu, Souma Chowdhury

    Abstract: This paper presents a novel multi-robot coverage path planning (CPP) algorithm - aka SCoPP - that provides a time-efficient solution, with workload balanced plans for each robot in a multi-robot system, based on their initial states. This algorithm accounts for discontinuities (e.g., no-fly zones) in a specified area of interest, and provides an optimized ordered list of way-points per robot using… ▽ More

    Submitted 26 March, 2021; originally announced March 2021.

    Comments: Accepted for publication in the proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA)

  34. arXiv:2102.02078  [pdf, other

    cs.MA cs.LG

    Multi-UAV Mobile Edge Computing and Path Planning Platform based on Reinforcement Learning

    Authors: Huan Chang, Yicheng Chen, Baochang Zhang, David Doermann

    Abstract: Unmanned Aerial vehicles (UAVs) are widely used as network processors in mobile networks, but more recently, UAVs have been used in Mobile Edge Computing as mobile servers. However, there are significant challenges to use UAVs in complex environments with obstacles and cooperation between UAVs. We introduce a new multi-UAV Mobile Edge Computing platform, which aims to provide better Quality-of-Ser… ▽ More

    Submitted 19 May, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: The source code can be found at https://github.com/bczhangbczhang

  35. arXiv:2012.04109  [pdf, other

    cs.CV

    Deformable Gabor Feature Networks for Biomedical Image Classification

    Authors: Xuan Gong, Xin Xia, Wentao Zhu, Baochang Zhang, David Doermann, Lian Zhuo

    Abstract: In recent years, deep learning has dominated progress in the field of medical image analysis. We find however, that the ability of current deep learning approaches to represent the complex geometric structures of many medical images is insufficient. One limitation is that deep learning models require a tremendous amount of data, and it is very difficult to obtain a sufficient amount with the neces… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

    Comments: 9 pages, 6 figures

  36. A Review of Recent Advances of Binary Neural Networks for Edge Computing

    Authors: Wenyu Zhao, Teli Ma, Xuan Gong, Baochang Zhang, David Doermann

    Abstract: Edge computing is promising to become one of the next hottest topics in artificial intelligence because it benefits various evolving domains such as real-time unmanned aerial systems, industrial applications, and the demand for privacy protection. This paper reviews recent advances on binary neural network (BNN) and 1-bit CNN technologies that are well suitable for front-end, edge-based computing.… ▽ More

    Submitted 23 November, 2020; originally announced November 2020.

  37. arXiv:2009.04247  [pdf, other

    cs.CV

    Binarized Neural Architecture Search for Efficient Object Recognition

    Authors: Hanlin Chen, Li'an Zhuo, Baochang Zhang, Xiawu Zheng, Jianzhuang Liu, Rongrong Ji, David Doermann, Guodong Guo

    Abstract: Traditional neural architecture search (NAS) has a significant impact in computer vision by automatically designing network architectures for various tasks. In this paper, binarized neural architecture search (BNAS), with a search space of binarized convolutions, is introduced to produce extremely compressed models to reduce huge computational cost on embedded devices for edge computing. The BNAS… ▽ More

    Submitted 8 September, 2020; originally announced September 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1911.10862

  38. arXiv:2008.00698  [pdf, other

    cs.CV cs.LG

    Anti-Bandit Neural Architecture Search for Model Defense

    Authors: Hanlin Chen, Baochang Zhang, Song Xue, Xuan Gong, Hong Liu, Rongrong Ji, David Doermann

    Abstract: Deep convolutional neural networks (DCNNs) have dominated as the best performers in machine learning, but can be challenged by adversarial attacks. In this paper, we defend against adversarial attacks using neural architecture search (NAS) which is based on a comprehensive search of denoising blocks, weight-free operations, Gabor filters and convolutions. The resulting anti-bandit NAS (ABanditNAS)… ▽ More

    Submitted 5 August, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

  39. arXiv:2006.12708  [pdf, other

    cs.CV

    iffDetector: Inference-aware Feature Filtering for Object Detection

    Authors: Mingyuan Mao, Yuxin Tian, Baochang Zhang, Qixiang Ye, Wanquan Liu, Guodong Guo, David Doermann

    Abstract: Modern CNN-based object detectors focus on feature configuration during training but often ignore feature optimization during inference. In this paper, we propose a new feature optimization approach to enhance features and suppress background noise in both the training and inference stages. We introduce a generic Inference-aware Feature Filtering (IFF) module that can easily be combined with moder… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

    Comments: 14 pages, 6 figures

  40. arXiv:2006.09142  [pdf, other

    cs.CV

    Cogradient Descent for Bilinear Optimization

    Authors: Li'an Zhuo, Baochang Zhang, Linlin Yang, Hanlin Chen, Qixiang Ye, David Doermann, Guodong Guo, Rongrong Ji

    Abstract: Conventional learning methods simplify the bilinear model by regarding two intrinsically coupled factors independently, which degrades the optimization procedure. One reason lies in the insufficient training due to the asynchronous gradient descent, which results in vanishing gradients for the coupled variables. In this paper, we introduce a Cogradient Descent algorithm (CoGD) to address the bilin… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

    Comments: 9 pages, 6 figures

  41. arXiv:2005.00057  [pdf, other

    cs.CV

    CP-NAS: Child-Parent Neural Architecture Search for Binary Neural Networks

    Authors: Li'an Zhuo, Baochang Zhang, Hanlin Chen, Linlin Yang, Chen Chen, Yanjun Zhu, David Doermann

    Abstract: Neural architecture search (NAS) proves to be among the best approaches for many tasks by generating an application-adaptive neural architecture, which is still challenged by high computational cost and memory consumption. At the same time, 1-bit convolutional neural networks (CNNs) with binarized weights and activations show their potential for resource-limited embedded devices. One natural appro… ▽ More

    Submitted 17 May, 2020; v1 submitted 30 April, 2020; originally announced May 2020.

    Comments: 7 pages, 6 figures

  42. arXiv:2003.00217  [pdf, ps, other

    cs.CV

    NAS-Count: Counting-by-Density with Neural Architecture Search

    Authors: Yutao Hu, Xiaolong Jiang, Xuhui Liu, Baochang Zhang, Jungong Han, Xianbin Cao, David Doermann

    Abstract: Most of the recent advances in crowd counting have evolved from hand-designed density estimation networks, where multi-scale features are leveraged to address the scale variation problem, but at the expense of demanding design efforts. In this work, we automate the design of counting models with Neural Architecture Search (NAS) and introduce an end-to-end searched encoder-decoder architecture, Aut… ▽ More

    Submitted 12 August, 2020; v1 submitted 29 February, 2020; originally announced March 2020.

    Comments: Accepted to European Conference on Computer Vision(ECCV) 2020

  43. arXiv:1911.10862  [pdf, other

    cs.CV

    Binarized Neural Architecture Search

    Authors: Hanlin Chen, Li'an Zhuo, Baochang Zhang, Xiawu Zheng, Jianzhuang Liu, David Doermann, Rongrong Ji

    Abstract: Neural architecture search (NAS) can have a significant impact in computer vision by automatically designing optimal neural network architectures for various tasks. A variant, binarized neural architecture search (BNAS), with a search space of binarized convolutions, can produce extremely compressed models. Unfortunately, this area remains largely unexplored. BNAS is more challenging than NAS due… ▽ More

    Submitted 11 February, 2020; v1 submitted 25 November, 2019; originally announced November 2019.

  44. arXiv:1910.10853  [pdf, other

    cs.CV

    Circulant Binary Convolutional Networks: Enhancing the Performance of 1-bit DCNNs with Circulant Back Propagation

    Authors: Chunlei Liu, Wenrui Ding, Xin Xia, Baochang Zhang, Jiaxin Gu, Jianzhuang Liu, Rongrong Ji, David Doermann

    Abstract: The rapidly decreasing computation and memory cost has recently driven the success of many applications in the field of deep learning. Practical applications of deep learning in resource-limited hardware, such as embedded devices and smart phones, however, remain challenging. For binary convolutional networks, the reason lies in the degraded representation caused by binarizing full-precision filte… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

    Comments: Published in CVPR2019

    Journal ref: ]Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 2691-2699

  45. arXiv:1903.09291  [pdf, other

    cs.CV

    Towards Optimal Structured CNN Pruning via Generative Adversarial Learning

    Authors: Shaohui Lin, Rongrong Ji, Chenqian Yan, Baochang Zhang, Liujuan Cao, Qixiang Ye, Feiyue Huang, David Doermann

    Abstract: Structured pruning of filters or neurons has received increased focus for compressing convolutional neural networks. Most existing methods rely on multi-stage optimizations in a layer-wise manner for iteratively pruning and retraining which may not be optimal and may be computation intensive. Besides, these methods are designed for pruning a specific structure, such as filter or block structures w… ▽ More

    Submitted 21 March, 2019; originally announced March 2019.

    Comments: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  46. arXiv:1903.00853  [pdf, other

    cs.CV

    Crowd Counting and Density Estimation by Trellis Encoder-Decoder Network

    Authors: Xiaolong Jiang, Zehao Xiao, Baochang Zhang, Xiantong Zhen, Xianbin Cao, David Doermann, Ling Shao

    Abstract: Crowd counting has recently attracted increasing interest in computer vision but remains a challenging problem. In this paper, we propose a trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps. The major contributions are four-fold. First, we develop a new trellis architecture that incorporates multiple decoding paths to hier… ▽ More

    Submitted 19 April, 2019; v1 submitted 3 March, 2019; originally announced March 2019.

    Comments: CVPR 2019, Accepted

  47. arXiv:1812.04368  [pdf, other

    cs.CV

    Exploiting Kernel Sparsity and Entropy for Interpretable CNN Compression

    Authors: Yuchao Li, Shaohui Lin, Baochang Zhang, Jianzhuang Liu, David Doermann, Yongjian Wu, Feiyue Huang, Rongrong Ji

    Abstract: Compressing convolutional neural networks (CNNs) has received ever-increasing research focus. However, most existing CNN compression methods do not interpret their inherent structures to distinguish the implicit redundancy. In this paper, we investigate the problem of CNN compression from a novel interpretable perspective. The relationship between the input feature maps and 2D kernels is revealed… ▽ More

    Submitted 1 April, 2019; v1 submitted 11 December, 2018; originally announced December 2018.

    Comments: 10 pagers

  48. arXiv:1811.12755  [pdf, other

    cs.CV

    Projection Convolutional Neural Networks for 1-bit CNNs via Discrete Back Propagation

    Authors: Jiaxin Gu, Ce Li, Baochang Zhang, Jungong Han, Xianbin Cao, Jianzhuang Liu, David Doermann

    Abstract: The advancement of deep convolutional neural networks (DCNNs) has driven significant improvement in the accuracy of recognition systems for many computer vision tasks. However, their practical applications are often restricted in resource-constrained environments. In this paper, we introduce projection convolutional neural networks (PCNNs) with a discrete back propagation via projection (DBPP) to… ▽ More

    Submitted 11 December, 2018; v1 submitted 30 November, 2018; originally announced November 2018.

  49. arXiv:1703.07431  [pdf, other

    cs.CV

    IOD-CNN: Integrating Object Detection Networks for Event Recognition

    Authors: Sungmin Eum, Hyungtae Lee, Heesung Kwon, David Doermann

    Abstract: Many previous methods have showed the importance of considering semantically relevant objects for performing event recognition, yet none of the methods have exploited the power of deep convolutional neural networks to directly integrate relevant object information into a unified network. We present a novel unified deep CNN architecture which integrates architecturally different, yet semantically-r… ▽ More

    Submitted 21 March, 2017; originally announced March 2017.

    Comments: submitted to IEEE International Conference on Image Processing 2017

  50. arXiv:1502.00030  [pdf, other

    cs.CV

    SHOE: Supervised Hashing with Output Embeddings

    Authors: Sravanthi Bondugula, Varun Manjunatha, Larry S. Davis, David Doermann

    Abstract: We present a supervised binary encoding scheme for image retrieval that learns projections by taking into account similarity between classes obtained from output embeddings. Our motivation is that binary hash codes learned in this way improve both the visual quality of retrieval results and existing supervised hashing schemes. We employ a sequential greedy optimization that learns relationship awa… ▽ More

    Submitted 30 January, 2015; originally announced February 2015.