Skip to main content

Showing 1–50 of 52 results for author: Chan, A B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.19943  [pdf, other

    cs.CV

    Multi-View People Detection in Large Scenes via Supervised View-Wise Contribution Weighting

    Authors: Qi Zhang, Yunfei Gong, Daijie Chen, Antoni B. Chan, Hui Huang

    Abstract: Recent deep learning-based multi-view people detection (MVD) methods have shown promising results on existing datasets. However, current methods are mainly trained and evaluated on small, single scenes with a limited number of multi-view frames and fixed camera views. As a result, these methods may not be practical for detecting people in larger, more complex scenes with severe occlusions and came… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: AAAI 2024

  2. arXiv:2405.08886  [pdf, other

    cs.LG stat.ML

    The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

    Authors: Ziquan Liu, Yufei Cui, Yan Yan, Yi Xu, Xiangyang Ji, Xue Liu, Antoni B. Chan

    Abstract: In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient health and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness th… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: ICML2024

  3. arXiv:2404.11895  [pdf, other

    cs.CV

    FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models

    Authors: Wei Wu, Qingnan Fan, Shuai Qin, Hong Gu, Ruoyu Zhao, Antoni B. Chan

    Abstract: Precise image editing with text-to-image models has attracted increasing interest due to their remarkable generative capabilities and user-friendly nature. However, such attempts face the pivotal challenge of misalignment between the intended precise editing target regions and the broader area impacted by the guidance in practice. Despite excellent methods leveraging attention mechanisms that have… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  4. arXiv:2404.09504  [pdf, other

    cs.CV

    Learning Tracking Representations from Single Point Annotations

    Authors: Qiangqiang Wu, Antoni B. Chan

    Abstract: Existing deep trackers are typically trained with largescale video frames with annotated bounding boxes. However, these bounding boxes are expensive and time-consuming to annotate, in particular for large scale datasets. In this paper, we propose to learn tracking representations from single point annotations (i.e., 4.5x faster to annotate than the traditional bounding box) in a weakly supervised… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accept to CVPR2024-L3DIVU

  5. arXiv:2403.10236  [pdf, other

    cs.CV

    A Fixed-Point Approach to Unified Prompt-Based Counting

    Authors: Wei Lin, Antoni B. Chan

    Abstract: Existing class-agnostic counting models typically rely on a single type of prompt, e.g., box annotations. This paper aims to establish a comprehensive prompt-based counting framework capable of generating density maps for concerned objects indicated by various prompt types, such as box, point, and text. To achieve this goal, we begin by converting prompts from different modalities into prompt mask… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI 2024

  6. arXiv:2402.17514  [pdf, other

    cs.CV

    Robust Unsupervised Crowd Counting and Localization with Adaptive Resolution SAM

    Authors: Jia Wan, Qiangqiang Wu, Wei Lin, Antoni B. Chan

    Abstract: The existing crowd counting models require extensive training data, which is time-consuming to annotate. To tackle this issue, we propose a simple yet effective crowd counting method by utilizing the Segment-Everything-Everywhere Model (SEEM), an adaptation of the Segmentation Anything Model (SAM), to generate pseudo-labels for training crowd counting models. However, our initial investigation rev… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  7. arXiv:2308.09091  [pdf, other

    cs.CV

    Edit Temporal-Consistent Videos with Image Diffusion Model

    Authors: Yuanzhi Wang, Yong Li, Xiaoya Zhang, Xin Liu, Anbo Dai, Antoni B. Chan, Zhen Cui

    Abstract: Large-scale text-to-image (T2I) diffusion models have been extended for text-guided video editing, yielding impressive zero-shot video editing performance. Nonetheless, the generated videos usually show spatial irregularities and temporal inconsistencies as the temporal characteristics of videos have not been faithfully modeled. In this paper, we propose an elegant yet effective Temporal-Consisten… ▽ More

    Submitted 29 December, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: 10 pages, 7 figures

  8. arXiv:2305.03601  [pdf, other

    cs.CV cs.AI

    Human Attention-Guided Explainable Artificial Intelligence for Computer Vision Models

    Authors: Guoyang Liu, **di Zhang, Antoni B. Chan, Janet H. Hsiao

    Abstract: We examined whether embedding human attention knowledge into saliency-based explainable AI (XAI) methods for computer vision models could enhance their plausibility and faithfulness. We first developed new gradient-based XAI methods for object detection models to generate object-specific explanations by extending the current methods for image classification models. Interestingly, while these gradi… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: 14 pages, 18 figures

    MSC Class: 68T45 ACM Class: I.2.0; I.4.0

  9. arXiv:2304.06354  [pdf, other

    cs.CV

    ODAM: Gradient-based instance-specific visual explanations for object detection

    Authors: Chenyang Zhao, Antoni B. Chan

    Abstract: We propose the gradient-weighted Object Detector Activation Maps (ODAM), a visualized explanation technique for interpreting the predictions of object detectors. Utilizing the gradients of detector targets flowing into the intermediate feature maps, ODAM produces heat maps that show the influence of regions on the detector's decision for each predicted attribute. Compared to previous works classif… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

    Comments: 2023 International Conference on Learning Representations

  10. arXiv:2304.00571  [pdf, other

    cs.CV

    DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks

    Authors: Qiangqiang Wu, Tianyu Yang, Ziquan Liu, Baoyuan Wu, Ying Shan, Antoni B. Chan

    Abstract: In this paper, we study masked autoencoder (MAE) pretraining on videos for matching-based downstream tasks, including visual object tracking (VOT) and video object segmentation (VOS). A simple extension of MAE is to randomly mask out frame patches in videos and reconstruct the frame pixels. However, we find that this simple baseline heavily relies on spatial cues while ignoring temporal relations… ▽ More

    Submitted 6 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Comments: CVPR 2023; V2: fixed typos in Table-2

  11. arXiv:2303.11135  [pdf, other

    cs.LG cs.CV

    TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization

    Authors: Ziquan Liu, Yi Xu, Xiangyang Ji, Antoni B. Chan

    Abstract: Recent years have seen the ever-increasing importance of pre-trained models and their downstream training in deep learning research and applications. At the same time, the defense for adversarial examples has been mainly investigated in the context of training from random initialization on simple classification tasks. To better exploit the potential of pre-trained models in adversarial robustness,… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: CVPR2023

  12. arXiv:2210.05118  [pdf, other

    cs.LG cs.CV stat.ML

    Boosting Adversarial Robustness From The Perspective of Effective Margin Regularization

    Authors: Ziquan Liu, Antoni B. Chan

    Abstract: The adversarial vulnerability of deep neural networks (DNNs) has been actively investigated in the past several years. This paper investigates the scale-variant property of cross-entropy loss, which is the most commonly used loss function in classification tasks, and its impact on the effective margin and adversarial robustness of deep neural networks. Since the loss function is not invariant to l… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: BMVC 2022

  13. arXiv:2207.01190  [pdf, other

    cs.LG

    Pareto Optimization for Active Learning under Out-of-Distribution Data Scenarios

    Authors: Xueying Zhan, Zeyu Dai, Qingzhong Wang, Qing Li, Haoyi Xiong, De**g Dou, Antoni B. Chan

    Abstract: Pool-based Active Learning (AL) has achieved great success in minimizing labeling cost by sequentially selecting informative unlabeled samples from a large unlabeled data pool and querying their labels from oracle/annotators. However, existing AL sampling strategies might not work well in out-of-distribution (OOD) data scenarios, where the unlabeled data pool contains some data samples that do not… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

  14. arXiv:2205.12753  [pdf, other

    cs.CV cs.LG

    An Empirical Study on Distribution Shift Robustness From the Perspective of Pre-Training and Data Augmentation

    Authors: Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Rong **, Xiangyang Ji, Antoni B. Chan

    Abstract: The performance of machine learning models under distribution shift has been the focus of the community in recent years. Most of current methods have been proposed to improve the robustness to distribution shift from the algorithmic perspective, i.e., designing better training algorithms to help the generalization in shifted test distributions. This paper studies the distribution shift problem fro… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

  15. arXiv:2205.01551  [pdf, other

    cs.CV

    Cross-View Cross-Scene Multi-View Crowd Counting

    Authors: Qi Zhang, Wei Lin, Antoni B. Chan

    Abstract: Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera, capturing more people in the scene, and improve counting performance for occluded people or those in low resolution. However, the current multi-view paradigm trains and tests on the same single scene and camera-views, which limits its practical application. In this paper,… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: CVPR 2021

  16. On Distinctive Image Captioning via Comparing and Reweighting

    Authors: Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan

    Abstract: Recent image captioning models are achieving impressive results based on popular metrics, i.e., BLEU, CIDEr, and SPICE. However, focusing on the most popular metrics that only consider the overlap between the generated captions and human annotation could result in using common words and phrases, which lacks distinctiveness, i.e., many similar images have the same caption. In this paper, we aim to… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: 20 pages. arXiv admin note: substantial text overlap with arXiv:2007.06877

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI 2022)

  17. arXiv:2203.13450  [pdf, other

    cs.LG

    A Comparative Survey of Deep Active Learning

    Authors: Xueying Zhan, Qingzhong Wang, Kuan-hao Huang, Haoyi Xiong, De**g Dou, Antoni B. Chan

    Abstract: While deep learning (DL) is data-hungry and usually relies on extensive labeled data to deliver good performance, Active Learning (AL) reduces labeling costs by selecting a small proportion of samples from unlabeled data for labeling and training. Therefore, Deep Active Learning (DAL) has risen as a feasible solution for maximizing model performance under a limited labeling cost/budget in recent y… ▽ More

    Submitted 19 July, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

    Comments: 24 pages

  18. arXiv:2203.04232  [pdf, other

    cs.CV

    A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds

    Authors: Yan Xia, Qiangqiang Wu, Wei Li, Antoni B. Chan, Uwe Stilla

    Abstract: Recent works on 3D single object tracking treat the task as a target-specific 3D detection task, where an off-the-shelf 3D detector is commonly employed for the tracking. However, it is non-trivial to perform accurate target-specific detection since the point cloud of objects in raw LiDAR scans is usually sparse and incomplete. In this paper, we address this issue by explicitly leveraging temporal… ▽ More

    Submitted 11 February, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: Accepted by IEEE Transactions on Intelligent Transportation Systems 2023

  19. arXiv:2110.04931  [pdf, other

    cs.CV

    BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning

    Authors: Zhirui Dai, Yuepeng Jiang, Yi Li, Bo Liu, Antoni B. Chan, Nuno Vasconcelos

    Abstract: Social distancing, an essential public health measure to limit the spread of contagious diseases, has gained significant attention since the outbreak of the COVID-19 pandemic. In this work, the problem of visual social distancing compliance assessment in busy public areas, with wide field-of-view cameras, is considered. A dataset of crowd scenes with people annotations under a bird's eye view (BEV… ▽ More

    Submitted 12 October, 2021; v1 submitted 10 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at International Conference on Computer Vision, 2021

  20. arXiv:2108.09151  [pdf, other

    cs.CV cs.CL cs.LG

    Group-based Distinctive Image Captioning with Memory Attention

    Authors: Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan

    Abstract: Describing images using natural language is widely known as image captioning, which has made consistent progress due to the development of computer vision and natural language generation techniques. Though conventional captioning models achieve high accuracy based on popular metrics, i.e., BLEU, CIDEr, and SPICE, the ability of captions to distinguish the target image from other similar images is… ▽ More

    Submitted 7 April, 2022; v1 submitted 20 August, 2021; originally announced August 2021.

    Comments: Accepted at ACM MM 2021 (oral)

  21. arXiv:2107.01622  [pdf, other

    cs.LG

    Multiple-criteria Based Active Learning with Fixed-size Determinantal Point Processes

    Authors: Xueying Zhan, Qing Li, Antoni B. Chan

    Abstract: Active learning aims to achieve greater accuracy with less training data by selecting the most useful data samples from which it learns. Single-criterion based methods (i.e., informativeness and representativeness based methods) are simple and efficient; however, they lack adaptability to different real-world scenarios. In this paper, we introduce a multiple-criteria based active learning algorith… ▽ More

    Submitted 4 July, 2021; originally announced July 2021.

  22. arXiv:2102.03497  [pdf, other

    cs.LG stat.ML

    Weight Rescaling: Effective and Robust Regularization for Deep Neural Networks with Batch Normalization

    Authors: Ziquan Liu, Yufei Cui, Jia Wan, Yu Mao, Antoni B. Chan

    Abstract: Weight decay is often used to ensure good generalization in the training practice of deep neural networks with batch normalization (BN-DNNs), where some convolution layers are invariant to weight rescaling due to the normalization. In this paper, we demonstrate that the practical usage of weight decay still has some unsolved problems in spite of existing theoretical work on explaining the effect o… ▽ More

    Submitted 17 June, 2022; v1 submitted 5 February, 2021; originally announced February 2021.

    Comments: Preprint

  23. arXiv:2101.11353  [pdf, other

    cs.LG cs.CV

    Variational Nested Dropout

    Authors: Yufei Cui, Yu Mao, Ziquan Liu, Qiao Li, Antoni B. Chan, Xue Liu, Tei-Wei Kuo, Chun Jason Xue

    Abstract: Nested dropout is a variant of dropout operation that is able to order network parameters or features based on the pre-defined importance during training. It has been explored for: I. Constructing nested nets: the nested nets are neural networks whose architectures can be adjusted instantly during testing time, e.g., based on computational constraints. The nested dropout implicitly ranks the netwo… ▽ More

    Submitted 17 June, 2022; v1 submitted 27 January, 2021; originally announced January 2021.

    Comments: 20 pages, 17 figures

  24. arXiv:2012.00946  [pdf, other

    cs.CV

    Wide-Area Crowd Counting: Multi-View Fusion Networks for Counting in Large Scenes

    Authors: Qi Zhang, Antoni B. Chan

    Abstract: Crowd counting in single-view images has achieved outstanding performance on existing counting datasets. However, single-view counting is not applicable to large and wide scenes (e.g., public parks, long subway platforms, or event spaces) because a single camera cannot capture the whole scene in adequate detail for counting, e.g., when the scene is too large to fit into the field-of-view of the ca… ▽ More

    Submitted 2 May, 2022; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: Accepted to IJCV

  25. arXiv:2010.08161  [pdf, other

    cs.LG

    ALdataset: a benchmark for pool-based active learning

    Authors: Xueying Zhan, Antoni Bert Chan

    Abstract: Active learning (AL) is a subfield of machine learning (ML) in which a learning algorithm could achieve good accuracy with less training samples by interactively querying a user/oracle to label new data points. Pool-based AL is well-motivated in many ML tasks, where unlabeled data is abundant, but their labels are hard to obtain. Although many pool-based AL methods have been developed, the lack of… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

  26. arXiv:2008.02965  [pdf, other

    cs.LG stat.ML

    Improve Generalization and Robustness of Neural Networks via Weight Scale Shifting Invariant Regularizations

    Authors: Ziquan Liu, Yufei Cui, Antoni B. Chan

    Abstract: Using weight decay to penalize the L2 norms of weights in neural networks has been a standard training practice to regularize the complexity of networks. In this paper, we show that a family of regularizers, including weight decay, is ineffective at penalizing the intrinsic norms of weights for networks with positively homogeneous activation functions, such as linear, ReLU and max-pooling function… ▽ More

    Submitted 8 June, 2022; v1 submitted 6 August, 2020; originally announced August 2020.

    Comments: 14 pages, 5 figures, Accepted by ICML 2021 Workshop on Adversarial Machine Learning

  27. Tracking-by-Counting: Using Network Flows on Crowd Density Maps for Tracking Multiple Targets

    Authors: Weihong Ren, Xinchao Wang, Jiandong Tian, Yandong Tang, Antoni B. Chan

    Abstract: State-of-the-art multi-object tracking~(MOT) methods follow the tracking-by-detection paradigm, where object trajectories are obtained by associating per-frame outputs of object detectors. In crowded scenes, however, detectors often fail to obtain accurate detections due to heavy occlusions and high crowd density. In this paper, we propose a new MOT paradigm, tracking-by-counting, tailored for cro… ▽ More

    Submitted 18 July, 2020; originally announced July 2020.

    Comments: 14 pages

  28. arXiv:2007.06877  [pdf, other

    cs.CV cs.CL cs.LG

    Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets

    Authors: Jiuniu Wang, Wenjia Xu, Qingzhong Wang, Antoni B. Chan

    Abstract: A wide range of image captioning models has been developed, achieving significant improvement based on popular metrics, such as BLEU, CIDEr, and SPICE. However, although the generated captions can accurately describe the image, they are generic for similar images and lack distinctiveness, i.e., cannot properly describe the uniqueness of each image. In this paper, we aim to improve the distinctiven… ▽ More

    Submitted 14 July, 2020; originally announced July 2020.

    Report number: Accepted at ECCV 2020 (oral)

  29. Fine-Grained Crowd Counting

    Authors: Jia Wan, Nikil Senthil Kumar, Antoni B. Chan

    Abstract: Current crowd counting algorithms are only concerned about the number of people in an image, which lacks low-level fine-grained information of the crowd. For many practical applications, the total number of people in an image is not as useful as the number of people in each sub-category. E.g., knowing the number of people waiting inline or browsing can help retail stores; knowing the number of peo… ▽ More

    Submitted 12 July, 2020; originally announced July 2020.

  30. arXiv:2007.03891  [pdf, other

    cs.CV

    Single-Frame based Deep View Synchronization for Unsynchronized Multi-Camera Surveillance

    Authors: Qi Zhang, Antoni B. Chan

    Abstract: Multi-camera surveillance has been an active research topic for understanding and modeling scenes. Compared to a single camera, multi-cameras provide larger field-of-view and more object cues, and the related applications are multi-view counting, multi-view tracking, 3D pose estimation or 3D reconstruction, etc. It is usually assumed that the cameras are all temporally synchronized when designing… ▽ More

    Submitted 2 May, 2022; v1 submitted 8 July, 2020; originally announced July 2020.

    Comments: Accepted to IEEE TNNLS

  31. arXiv:2006.05127  [pdf, other

    cs.CV

    Over-crowdedness Alert! Forecasting the Future Crowd Distribution

    Authors: Yuzhen Niu, Weifeng Shi, Wenxi Liu, Shengfeng He, Jia Pan, Antoni B. Chan

    Abstract: In recent years, vision-based crowd analysis has been studied extensively due to its practical applications in real world. In this paper, we formulate a novel crowd analysis problem, in which we aim to predict the crowd distribution in the near future given sequential frames of a crowd video without any identity annotations. Studying this research problem will benefit applications concerned with f… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

  32. arXiv:2003.08162  [pdf, other

    cs.CV

    3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels

    Authors: Qi Zhang, Antoni B. Chan

    Abstract: Crowd counting has been studied for decades and a lot of works have achieved good performance, especially the DNNs-based density map estimation methods. Most existing crowd counting works focus on single-view counting, while few works have studied multi-view counting for large and wide scenes, where multiple cameras are used. Recently, an end-to-end multi-view crowd counting method called multi-vi… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

    Comments: 8 pages, 5 figures, AAAI Conference on Artificial Intelligence, AAAI, New York, Feb 2020

  33. arXiv:1908.04919  [pdf, other

    cs.CV cs.CL

    Towards Diverse and Accurate Image Captions via Reinforcing Determinantal Point Process

    Authors: Qingzhong Wang, Antoni B. Chan

    Abstract: Although significant progress has been made in the field of automatic image captioning, it is still a challenging task. Previous works normally pay much attention to improving the quality of the generated captions but ignore the diversity of captions. In this paper, we combine determinantal point process (DPP) and reinforcement learning (RL) and propose a novel reinforcing DPP (R-DPP) approach to… ▽ More

    Submitted 13 August, 2019; originally announced August 2019.

    Comments: 14 pages. Code is comming soon,please pay attention to my personal page visal.cs.cityu.edu.hk/people/qingzhong-wang/

  34. arXiv:1907.12006  [pdf, other

    cs.CV

    ROAM: Recurrently Optimizing Tracking Model

    Authors: Tianyu Yang, Pengfei Xu, Runbo Hu, Hua Chai, Antoni B. Chan

    Abstract: In this paper, we design a tracking model consisting of response generation and bounding box regression, where the first component produces a heat map to indicate the presence of the object at different positions and the second part regresses the relative bounding box shifts to anchors mounted on sliding-window locations. Thanks to the resizable convolutional filters used in both components to ada… ▽ More

    Submitted 24 March, 2020; v1 submitted 27 July, 2019; originally announced July 2019.

    Comments: CVPR2020 camera ready

  35. arXiv:1907.07613  [pdf, other

    cs.CV

    Visual Tracking via Dynamic Memory Networks

    Authors: Tianyu Yang, Antoni B. Chan

    Abstract: Template-matching methods for visual tracking have gained popularity recently due to their good performance and fast speed. However, they lack effective ways to adapt to changes in the target object's appearance, making their tracking accuracy still far from state-of-the-art. In this paper, we propose a dynamic memory network to adapt the template to the target's appearance variations during track… ▽ More

    Submitted 29 November, 2019; v1 submitted 12 July, 2019; originally announced July 2019.

    Comments: accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019. arXiv admin note: substantial text overlap with arXiv:1803.07268

  36. Accelerating Monte Carlo Bayesian Inference via Approximating Predictive Uncertainty over Simplex

    Authors: Yufei Cui, Wuguannan Yao, Qiao Li, Antoni B. Chan, Chun Jason Xue

    Abstract: Estimating the predictive uncertainty of a Bayesian learning model is critical in various decision-making problems, e.g., reinforcement learning, detecting adversarial attack, self-driving car. As the model posterior is almost always intractable, most efforts were made on finding an accurate approximation the true posterior. Even though a decent estimation of the model posterior is obtained, anoth… ▽ More

    Submitted 25 September, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: 8 pages, 3 figures

  37. arXiv:1903.12020  [pdf, other

    cs.CV

    Describing like humans: on diversity in image captioning

    Authors: Qingzhong Wang, Antoni B. Chan

    Abstract: Recently, the state-of-the-art models for image captioning have overtaken human performance based on the most popular metrics, such as BLEU, METEOR, ROUGE, and CIDEr. Does this mean we have solved the task of image captioning? The above metrics only measure the similarity of the generated caption to the human annotations, which reflects its accuracy. However, an image contains many concepts and mu… ▽ More

    Submitted 14 May, 2019; v1 submitted 28 March, 2019; originally announced March 2019.

    Comments: Accepted by CVPR2019. In this version, we correct the label of y axis in figure 8

  38. arXiv:1901.03968  [pdf, other

    cs.GR cs.LG stat.ML

    A Fully Bayesian Infinite Generative Model for Dynamic Texture Segmentation

    Authors: Sahar Yousefi, M. T. Manzuri Shalmani, Antoni B. Chan

    Abstract: Generative dynamic texture models (GDTMs) are widely used for dynamic texture (DT) segmentation in the video sequences. GDTMs represent DTs as a set of linear dynamical systems (LDSs). A major limitation of these models concerns the automatic selection of a proper number of DTs. Dirichlet process mixture (DPM) models which have appeared recently as the cornerstone of the non-parametric Bayesian st… ▽ More

    Submitted 13 January, 2019; originally announced January 2019.

    Comments: 38 pages; 15 figures;

  39. arXiv:1810.12535  [pdf, other

    cs.CV cs.AI

    Gated Hierarchical Attention for Image Captioning

    Authors: Qingzhong Wang, Antoni B. Chan

    Abstract: Attention modules connecting encoder and decoders have been widely applied in the field of object recognition, image captioning, visual question answering and neural machine translation, and significantly improves the performance. In this paper, we propose a bottom-up gated hierarchical attention (GHA) mechanism for image captioning. Our proposed model employs a CNN as the decoder which is able to… ▽ More

    Submitted 31 October, 2018; v1 submitted 30 October, 2018; originally announced October 2018.

    Comments: Accepted by ACCV

  40. arXiv:1810.07435  [pdf, other

    stat.ML cs.LG

    EMHMM Simulation Study

    Authors: Antoni B. Chan, Janet H. Hsiao

    Abstract: Eye Movement analysis with Hidden Markov Models (EMHMM) is a method for modeling eye fixation sequences using hidden Markov models (HMMs). In this report, we run a simulation study to investigate the estimation error for learning HMMs with variational Bayesian inference, with respect to the number of sequences and the sequence lengths. We also relate the estimation error measured by KL divergence… ▽ More

    Submitted 24 June, 2019; v1 submitted 17 October, 2018; originally announced October 2018.

  41. arXiv:1805.09019  [pdf, other

    cs.CV

    CNN+CNN: Convolutional Decoders for Image Captioning

    Authors: Qingzhong Wang, Antoni B. Chan

    Abstract: Image captioning is a challenging task that combines the field of computer vision and natural language processing. A variety of approaches have been proposed to achieve the goal of automatically describing an image, and recurrent neural network (RNN) or long-short term memory (LSTM) based models dominate this field. However, RNNs or LSTMs cannot be calculated in parallel and ignore the underlying… ▽ More

    Submitted 23 May, 2018; originally announced May 2018.

  42. arXiv:1803.07268  [pdf, other

    cs.CV

    Learning Dynamic Memory Networks for Object Tracking

    Authors: Tianyu Yang, Antoni B. Chan

    Abstract: Template-matching methods for visual tracking have gained popularity recently due to their comparable performance and fast speed. However, they lack effective ways to adapt to changes in the target object's appearance, making their tracking accuracy still far from state-of-the-art. In this paper, we propose a dynamic memory network to adapt the template to the target's appearance variations during… ▽ More

    Submitted 2 September, 2018; v1 submitted 20 March, 2018; originally announced March 2018.

    Comments: ECCV2018 camera ready. Code is available at https://github.com/skyoung/MemTrack

  43. arXiv:1708.03874  [pdf, other

    cs.CV

    Recurrent Filter Learning for Visual Tracking

    Authors: Tianyu Yang, Antoni B. Chan

    Abstract: Recently using convolutional neural networks (CNNs) has gained popularity in visual tracking, due to its robust feature representation of images. Recent methods perform online tracking by fine-tuning a pre-trained CNN model to the specific target object using stochastic gradient descent (SGD) back-propagation, which is usually time-consuming. In this paper, we propose a recurrent filter generation… ▽ More

    Submitted 13 August, 2017; originally announced August 2017.

    Comments: ICCV2017 Workshop on VOT

  44. arXiv:1705.10118  [pdf, other

    cs.CV

    Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks - Counting, Detection, and Tracking

    Authors: Di Kang, Zheng Ma, Antoni B. Chan

    Abstract: For crowded scenes, the accuracy of object-based computer vision methods declines when the images are low-resolution and objects have severe occlusions. Taking counting methods for example, almost all the recent state-of-the-art counting methods bypass explicit detection and adopt regression-based methods to directly count the objects of interest. Among regression-based methods, density map estima… ▽ More

    Submitted 13 June, 2018; v1 submitted 29 May, 2017; originally announced May 2017.

    Comments: accepted to IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)

  45. arXiv:1703.06003  [pdf, other

    cs.CV cs.GR

    Color Orchestra: Ordering Color Palettes for Interpolation and Prediction

    Authors: Huy Q. Phan, Hongbo Fu, Antoni B. Chan

    Abstract: Color theme or color palette can deeply influence the quality and the feeling of a photograph or a graphical design. Although color palettes may come from different sources such as online crowd-sourcing, photographs and graphical designs, in this paper, we consider color palettes extracted from fine art collections, which we believe to be an abundant source of stylistic and unique color themes. We… ▽ More

    Submitted 17 March, 2017; originally announced March 2017.

    Comments: IEEE Transactions on Visualization and Computer Graphics

  46. arXiv:1611.06748  [pdf, other

    cs.CV

    Crowd Counting by Adapting Convolutional Neural Networks with Side Information

    Authors: Di Kang, Debarun Dhar, Antoni B. Chan

    Abstract: Computer vision tasks often have side information available that is helpful to solve the task. For example, for crowd counting, the camera perspective (e.g., camera angle and height) gives a clue about the appearance and scale of people in the scene. While side information has been shown to be useful for counting systems using traditional hand-crafted features, it has not been fully utilized in co… ▽ More

    Submitted 21 November, 2016; originally announced November 2016.

    Comments: 8 pages

  47. arXiv:1508.06708  [pdf, ps, other

    cs.CV

    Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation

    Authors: Si** Li, Weichen Zhang, Antoni B. Chan

    Abstract: This paper focuses on structured-output learning using deep neural networks for 3D human pose estimation from monocular images. Our network takes an image and 3D pose as inputs and outputs a score value, which is high when the image-pose pair matches and low otherwise. The network structure consists of a convolutional neural network for image feature extraction, followed by two sub-networks for tr… ▽ More

    Submitted 26 August, 2015; originally announced August 2015.

  48. arXiv:1406.3474  [pdf, other

    cs.CV cs.LG cs.NE

    Heterogeneous Multi-task Learning for Human Pose Estimation with Deep Convolutional Neural Network

    Authors: Si** Li, Zhi-Qiang Liu, Antoni B. Chan

    Abstract: We propose an heterogeneous multi-task learning framework for human pose estimation from monocular image with deep convolutional neural network. In particular, we simultaneously learn a pose-joint regressor and a sliding-window body-part detector in a deep network architecture. We show that including the body-part detection task helps to regularize the network, directing it to converge to a good s… ▽ More

    Submitted 13 June, 2014; originally announced June 2014.

  49. Leveraging Long-Term Predictions and Online-Learning in Agent-based Multiple Person Tracking

    Authors: Wenxi Liu, Antoni B. Chan, Rynson W. H. Lau, Dinesh Manocha

    Abstract: We present a multiple-person tracking algorithm, based on combining particle filters and RVO, an agent-based crowd model that infers collision-free velocities so as to predict pedestrian's motion. In addition to position and velocity, our tracking algorithm can estimate the internal goals (desired destination or desired velocity) of the tracked pedestrian in an online manner, thus removing the nee… ▽ More

    Submitted 7 March, 2014; v1 submitted 9 February, 2014; originally announced February 2014.

  50. arXiv:1311.6371  [pdf, other

    stat.ML cs.CV cs.LG

    On Approximate Inference for Generalized Gaussian Process Models

    Authors: Lifeng Shang, Antoni B. Chan

    Abstract: A generalized Gaussian process model (GGPM) is a unifying framework that encompasses many existing Gaussian process (GP) models, such as GP regression, classification, and counting. In the GGPM framework, the observation likelihood of the GP model is itself parameterized using the exponential family distribution (EFD). In this paper, we consider efficient algorithms for approximate inference on GG… ▽ More

    Submitted 27 November, 2013; v1 submitted 25 November, 2013; originally announced November 2013.