Search | arXiv e-print repository

Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

Authors: Hanxi Li, **gqi Wu, Lin Yuanbo Wu, Hao Chen, Deyin Liu, Chunhua Shen

Abstract: In the realm of practical Anomaly Detection (AD) tasks, manual labeling of anomalous pixels proves to be a costly endeavor. Consequently, many AD methods are crafted as one-class classifiers, tailored for training sets completely devoid of anomalies, ensuring a more cost-effective approach. While some pioneering work has demonstrated heightened AD accuracy by incorporating real anomaly samples in… ▽ More In the realm of practical Anomaly Detection (AD) tasks, manual labeling of anomalous pixels proves to be a costly endeavor. Consequently, many AD methods are crafted as one-class classifiers, tailored for training sets completely devoid of anomalies, ensuring a more cost-effective approach. While some pioneering work has demonstrated heightened AD accuracy by incorporating real anomaly samples in training, this enhancement comes at the price of labor-intensive labeling processes. This paper strikes the balance between AD accuracy and labeling expenses by introducing ADClick, a novel Interactive Image Segmentation (IIS) algorithm. ADClick efficiently generates "ground-truth" anomaly masks for real defective images, leveraging innovative residual features and meticulously crafted language prompts. Notably, ADClick showcases a significantly elevated generalization capacity compared to existing state-of-the-art IIS approaches. Functioning as an anomaly labeling tool, ADClick generates high-quality anomaly labels (AP $= 94.1\%$ on MVTec AD) based on only $3$ to $5$ manual click annotations per training image. Furthermore, we extend the capabilities of ADClick into ADClick-Seg, an enhanced model designed for anomaly detection and localization. By fine-tuning the ADClick-Seg model using the weak labels inferred by ADClick, we establish the state-of-the-art performances in supervised AD tasks (AP $= 86.4\%$ on MVTec AD and AP $= 78.4\%$, PRO $= 98.6\%$ on KSDD2). △ Less

Submitted 4 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: 18 pages, 5 figures

arXiv:2405.17054 [pdf, other]

Improving Data-aware and Parameter-aware Robustness for Continual Learning

Authors: Hanxi Xiao, Fan Lyu

Abstract: The goal of Continual Learning (CL) task is to continuously learn multiple new tasks sequentially while achieving a balance between the plasticity and stability of new and old knowledge. This paper analyzes that this insufficiency arises from the ineffective handling of outliers, leading to abnormal gradients and unexpected model updates. To address this issue, we enhance the data-aware and parame… ▽ More The goal of Continual Learning (CL) task is to continuously learn multiple new tasks sequentially while achieving a balance between the plasticity and stability of new and old knowledge. This paper analyzes that this insufficiency arises from the ineffective handling of outliers, leading to abnormal gradients and unexpected model updates. To address this issue, we enhance the data-aware and parameter-aware robustness of CL, proposing a Robust Continual Learning (RCL) method. From the data perspective, we develop a contrastive loss based on the concepts of uniformity and alignment, forming a feature distribution that is more applicable to outliers. From the parameter perspective, we present a forward strategy for worst-case perturbation and apply robust gradient projection to the parameters. The experimental results on three benchmarks show that the proposed method effectively maintains robustness and achieves new state-of-the-art (SOTA) results. The code is available at: https://github.com/HanxiXiao/RCL △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2404.19609 [pdf, other]

Seeing Through the Clouds: Cloud Gap Imputation with Prithvi Foundation Model

Authors: Denys Godwin, Hanxi Li, Michael Cecil, Hamed Alemohammad

Abstract: Filling cloudy pixels in multispectral satellite imagery is essential for accurate data analysis and downstream applications, especially for tasks which require time series data. To address this issue, we compare the performance of a foundational Vision Transformer (ViT) model with a baseline Conditional Generative Adversarial Network (CGAN) model for missing value imputation in time series of mul… ▽ More Filling cloudy pixels in multispectral satellite imagery is essential for accurate data analysis and downstream applications, especially for tasks which require time series data. To address this issue, we compare the performance of a foundational Vision Transformer (ViT) model with a baseline Conditional Generative Adversarial Network (CGAN) model for missing value imputation in time series of multispectral satellite imagery. We randomly mask time series of satellite images using real-world cloud masks and train each model to reconstruct the missing pixels. The ViT model is fine-tuned from a pretrained model, while the CGAN is trained from scratch. Using quantitative evaluation metrics such as structural similarity index and mean absolute error as well as qualitative visual analysis, we assess imputation accuracy and contextual preservation. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.16143 [pdf, ps, other]

doi 10.1145/3674652

A Two-Phase Infinite/Finite Low-Level Memory Model

Authors: Calvin Beck, Irene Yoon, Hanxi Chen, Yannick Zakowski, Steve Zdancewic

Abstract: This paper provides a novel approach to reconciling complex low-level memory model features, such as pointer--integer casts, with desired refinements that are needed to justify the correctness of program transformations. The idea is to use a "two-phased" memory model, one with and unbounded memory and corresponding unbounded integer type, and one with a finite memory; the connection between the tw… ▽ More This paper provides a novel approach to reconciling complex low-level memory model features, such as pointer--integer casts, with desired refinements that are needed to justify the correctness of program transformations. The idea is to use a "two-phased" memory model, one with and unbounded memory and corresponding unbounded integer type, and one with a finite memory; the connection between the two levels is made explicit by our notion of refinement that handles out-of-memory behaviors. This approach allows for more optimizations to be performed and establishes a clear boundary between the idealized semantics of a program and the implementation of that program on finite hardware. To demonstrate the utility of this idea in practice, we instantiate the two-phase memory model in the context of Zakowski et al.'s VIR semantics, yielding infinite and finite memory models of LLVM IR, including low-level features like undef and bitcast. Both the infinite and finite models, which act as specifications, can provably be refined to executable reference interpreters. The semantics justify optimizations, such as dead-alloca-elimination, that were previously impossible or difficult to prove correct. △ Less

Submitted 24 April, 2024; originally announced April 2024.

ACM Class: D.3.1

Journal ref: 2024

arXiv:2403.11432 [pdf, other]

Demystifying the Physics of Deep Reinforcement Learning-Based Autonomous Vehicle Decision-Making

Authors: Hanxi Wan, Pei Li, Arpan Kusari

Abstract: With the advent of universal function approximators in the domain of reinforcement learning, the number of practical applications leveraging deep reinforcement learning (DRL) has exploded. Decision-making in autonomous vehicles (AVs) has emerged as a chief application among them, taking the sensor data or the higher-order kinematic variables as the input and providing a discrete choice or continuo… ▽ More With the advent of universal function approximators in the domain of reinforcement learning, the number of practical applications leveraging deep reinforcement learning (DRL) has exploded. Decision-making in autonomous vehicles (AVs) has emerged as a chief application among them, taking the sensor data or the higher-order kinematic variables as the input and providing a discrete choice or continuous control output. There has been a continuous effort to understand the black-box nature of the DRL models, but so far, there hasn't been any discussion (to the best of authors' knowledge) about how the models learn the physical process. This presents an overwhelming limitation that restricts the real-world deployment of DRL in AVs. Therefore, in this research work, we try to decode the knowledge learnt by the attention-based DRL framework about the physical process. We use a continuous proximal policy optimization-based DRL algorithm as the baseline model and add a multi-head attention framework in an open-source AV simulation environment. We provide some analytical techniques for discussing the interpretability of the trained models in terms of explainability and causality for spatial and temporal correlations. We show that the weights in the first head encode the positions of the neighboring vehicles while the second head focuses on the leader vehicle exclusively. Also, the ego vehicle's action is causally dependent on the vehicles in the target lane spatially and temporally. Through these findings, we reliably show that these techniques can help practitioners decipher the results of the DRL algorithms. △ Less

Submitted 13 June, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

Comments: Submitted for peer-review

arXiv:2402.19330 [pdf, other]

A Novel Approach to Industrial Defect Generation through Blended Latent Diffusion Model with Online Adaptation

Authors: Hanxi Li, Zhengxun Zhang, Hao Chen, Lin Wu, Bo Li, Deyin Liu, Mingwen Wang

Abstract: Effectively addressing the challenge of industrial Anomaly Detection (AD) necessitates an ample supply of defective samples, a constraint often hindered by their scarcity in industrial contexts. This paper introduces a novel algorithm designed to augment defective samples, thereby enhancing AD performance. The proposed method tailors the blended latent diffusion model for defect sample generation,… ▽ More Effectively addressing the challenge of industrial Anomaly Detection (AD) necessitates an ample supply of defective samples, a constraint often hindered by their scarcity in industrial contexts. This paper introduces a novel algorithm designed to augment defective samples, thereby enhancing AD performance. The proposed method tailors the blended latent diffusion model for defect sample generation, employing a diffusion model to generate defective samples in the latent space. A feature editing process, controlled by a ``trimap" mask and text prompts, refines the generated samples. The image generation inference process is structured into three stages: a free diffusion stage, an editing diffusion stage, and an online decoder adaptation stage. This sophisticated inference strategy yields high-quality synthetic defective samples with diverse pattern variations, leading to significantly improved AD accuracies based on the augmented training set. Specifically, on the widely recognized MVTec AD dataset, the proposed method elevates the state-of-the-art (SOTA) performance of AD with augmented data by 1.5%, 1.9%, and 3.1% for AD metrics AP, IAP, and IAP90, respectively. The implementation code of this work can be found at the GitHub repository https://github.com/GrandpaXun242/AdaBLDM.git △ Less

Submitted 26 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: 13 pages,7 figures

arXiv:2402.15820 [pdf, other]

DART: Depth-Enhanced Accurate and Real-Time Background Matting

Authors: Hanxi Li, Guofeng Li, Bo Li, Lin Wu, Yan Cheng

Abstract: Matting with a static background, often referred to as ``Background Matting" (BGM), has garnered significant attention within the computer vision community due to its pivotal role in various practical applications like webcasting and photo editing. Nevertheless, achieving highly accurate background matting remains a formidable challenge, primarily owing to the limitations inherent in conventional… ▽ More Matting with a static background, often referred to as ``Background Matting" (BGM), has garnered significant attention within the computer vision community due to its pivotal role in various practical applications like webcasting and photo editing. Nevertheless, achieving highly accurate background matting remains a formidable challenge, primarily owing to the limitations inherent in conventional RGB images. These limitations manifest in the form of susceptibility to varying lighting conditions and unforeseen shadows. In this paper, we leverage the rich depth information provided by the RGB-Depth (RGB-D) cameras to enhance background matting performance in real-time, dubbed DART. Firstly, we adapt the original RGB-based BGM algorithm to incorporate depth information. The resulting model's output undergoes refinement through Bayesian inference, incorporating a background depth prior. The posterior prediction is then translated into a "trimap," which is subsequently fed into a state-of-the-art matting algorithm to generate more precise alpha mattes. To ensure real-time matting capabilities, a critical requirement for many real-world applications, we distill the backbone of our model from a larger and more versatile BGM network. Our experiments demonstrate the superior performance of the proposed method. Moreover, thanks to the distillation operation, our method achieves a remarkable processing speed of 33 frames per second (fps) on a mid-range edge-computing device. This high efficiency underscores DART's immense potential for deployment in mobile applications} △ Less

Submitted 24 February, 2024; originally announced February 2024.

arXiv:2312.15395 [pdf, other]

Prompt Valuation Based on Shapley Values

Authors: Hanxi Liu, Xiaokai Mao, Haocheng Xia, Jian Lou, **fei Liu

Abstract: Large language models (LLMs) excel on new tasks without additional training, simply by providing natural language prompts that demonstrate how the task should be performed. Prompt ensemble methods comprehensively harness the knowledge of LLMs while mitigating individual biases and errors and further enhancing performance. However, more prompts do not necessarily lead to better results, and not all… ▽ More Large language models (LLMs) excel on new tasks without additional training, simply by providing natural language prompts that demonstrate how the task should be performed. Prompt ensemble methods comprehensively harness the knowledge of LLMs while mitigating individual biases and errors and further enhancing performance. However, more prompts do not necessarily lead to better results, and not all prompts are beneficial. A small number of high-quality prompts often outperform many low-quality prompts. Currently, there is a lack of a suitable method for evaluating the impact of prompts on the results. In this paper, we utilize the Shapley value to fairly quantify the contributions of prompts, hel** to identify beneficial or detrimental prompts, and potentially guiding prompt valuation in data markets. Through extensive experiments employing various ensemble methods and utility functions on diverse tasks, we validate the effectiveness of using the Shapley value method for prompts as it effectively distinguishes and quantifies the contributions of each prompt. △ Less

Submitted 23 December, 2023; originally announced December 2023.

arXiv:2311.11383 [pdf, other]

A Survey of Emerging Applications of Diffusion Probabilistic Models in MRI

Authors: Yuheng Fan, Hanxi Liao, Shiqi Huang, Yimin Luo, Huazhu Fu, Haikun Qi

Abstract: Diffusion probabilistic models (DPMs) which employ explicit likelihood characterization and a gradual sampling process to synthesize data, have gained increasing research interest. Despite their huge computational burdens due to the large number of steps involved during sampling, DPMs are widely appreciated in various medical imaging tasks for their high-quality and diversity of generation. Magnet… ▽ More Diffusion probabilistic models (DPMs) which employ explicit likelihood characterization and a gradual sampling process to synthesize data, have gained increasing research interest. Despite their huge computational burdens due to the large number of steps involved during sampling, DPMs are widely appreciated in various medical imaging tasks for their high-quality and diversity of generation. Magnetic resonance imaging (MRI) is an important medical imaging modality with excellent soft tissue contrast and superb spatial resolution, which possesses unique opportunities for DPMs. Although there is a recent surge of studies exploring DPMs in MRI, a survey paper of DPMs specifically designed for MRI applications is still lacking. This review article aims to help researchers in the MRI community to grasp the advances of DPMs in different applications. We first introduce the theory of two dominant kinds of DPMs, categorized according to whether the diffusion time step is discrete or continuous, and then provide a comprehensive review of emerging DPMs in MRI, including reconstruction, image generation, image translation, segmentation, anomaly detection, and further research topics. Finally, we discuss the general limitations as well as limitations specific to the MRI tasks of DPMs and point out potential areas that are worth further exploration. △ Less

Submitted 7 May, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

arXiv:2310.18660 [pdf, other]

Foundation Models for Generalist Geospatial Artificial Intelligence

Authors: Johannes Jakubik, Sujit Roy, C. E. Phillips, Paolo Fraccaro, Denys Godwin, Bianca Zadrozny, Daniela Szwarcman, Carlos Gomes, Gabby Nyirjesy, Blair Edwards, Daiki Kimura, Naomi Simumba, Linsong Chu, S. Karthik Mukkavilli, Devyani Lambhate, Kamal Das, Ran**i Bangalore, Dario Oliveira, Michal Muszynski, Kumar Ankur, Muthukumaran Ramasubramanian, Iksha Gurung, Sam Khallaghi, Hanxi, Li , et al. (8 additional authors not shown)

Abstract: Significant progress in the development of highly adaptable and reusable Artificial Intelligence (AI) models is expected to have a significant impact on Earth science and remote sensing. Foundation models are pre-trained on large unlabeled datasets through self-supervision, and then fine-tuned for various downstream tasks with small labeled datasets. This paper introduces a first-of-a-kind framewo… ▽ More Significant progress in the development of highly adaptable and reusable Artificial Intelligence (AI) models is expected to have a significant impact on Earth science and remote sensing. Foundation models are pre-trained on large unlabeled datasets through self-supervision, and then fine-tuned for various downstream tasks with small labeled datasets. This paper introduces a first-of-a-kind framework for the efficient pre-training and fine-tuning of foundational models on extensive geospatial data. We have utilized this framework to create Prithvi, a transformer-based geospatial foundational model pre-trained on more than 1TB of multispectral satellite imagery from the Harmonized Landsat-Sentinel 2 (HLS) dataset. Our study demonstrates the efficacy of our framework in successfully fine-tuning Prithvi to a range of Earth observation tasks that have not been tackled by previous work on foundation models involving multi-temporal cloud gap imputation, flood map**, wildfire scar segmentation, and multi-temporal crop segmentation. Our experiments show that the pre-trained model accelerates the fine-tuning process compared to leveraging randomly initialized weights. In addition, pre-trained Prithvi compares well against the state-of-the-art, e.g., outperforming a conditional GAN model in multi-temporal cloud imputation by up to 5pp (or 5.7%) in the structural similarity index. Finally, due to the limited availability of labeled data in the field of Earth observation, we gradually reduce the quantity of available labeled data for refining the model to evaluate data efficiency and demonstrate that data can be decreased significantly without affecting the model's accuracy. The pre-trained 100 million parameter model and corresponding fine-tuning workflows have been released publicly as open source contributions to the global Earth sciences community through Hugging Face. △ Less

Submitted 8 November, 2023; v1 submitted 28 October, 2023; originally announced October 2023.

arXiv:2308.06748 [pdf, other]

Target before Shooting: Accurate Anomaly Detection and Localization under One Millisecond via Cascade Patch Retrieval

Authors: Hanxi Li, Jianfei Hu, Bo Li, Hao Chen, Yongbin Zheng, Chunhua Shen

Abstract: In this work, by re-examining the "matching" nature of Anomaly Detection (AD), we propose a new AD framework that simultaneously enjoys new records of AD accuracy and dramatically high running speed. In this framework, the anomaly detection problem is solved via a cascade patch retrieval procedure that retrieves the nearest neighbors for each test image patch in a coarse-to-fine fashion. Given a t… ▽ More In this work, by re-examining the "matching" nature of Anomaly Detection (AD), we propose a new AD framework that simultaneously enjoys new records of AD accuracy and dramatically high running speed. In this framework, the anomaly detection problem is solved via a cascade patch retrieval procedure that retrieves the nearest neighbors for each test image patch in a coarse-to-fine fashion. Given a test sample, the top-K most similar training images are first selected based on a robust histogram matching process. Secondly, the nearest neighbor of each test patch is retrieved over the similar geometrical locations on those "global nearest neighbors", by using a carefully trained local metric. Finally, the anomaly score of each test image patch is calculated based on the distance to its "local nearest neighbor" and the "non-background" probability. The proposed method is termed "Cascade Patch Retrieval" (CPR) in this work. Different from the conventional patch-matching-based AD algorithms, CPR selects proper "targets" (reference images and locations) before "shooting" (patch-matching). On the well-acknowledged MVTec AD, BTAD and MVTec-3D AD datasets, the proposed algorithm consistently outperforms all the comparing SOTA methods by remarkable margins, measured by various AD metrics. Furthermore, CPR is extremely efficient. It runs at the speed of 113 FPS with the standard setting while its simplified version only requires less than 1 ms to process an image at the cost of a trivial accuracy drop. The code of CPR is available at https://github.com/flyinghu123/CPR. △ Less

Submitted 13 August, 2023; originally announced August 2023.

Comments: 13 pages,8 figures

arXiv:2306.03492 [pdf, other]

Industrial Anomaly Detection and Localization Using Weakly-Supervised Residual Transformers

Authors: Hanxi Li, **gqi Wu, Lin Yuanbo Wu, Hao Chen, Deyin Liu, Mingwen Wang, Peng Wang

Abstract: Recent advancements in industrial Anomaly Detection (AD) have shown that incorporating a few anomalous samples during training can significantly boost accuracy. However, this performance improvement comes at a high cost: extensive annotation efforts, which are often impractical in real-world applications. In this work, we propose a novel framework called "Weakly-supervised RESidual Transformer" (W… ▽ More Recent advancements in industrial Anomaly Detection (AD) have shown that incorporating a few anomalous samples during training can significantly boost accuracy. However, this performance improvement comes at a high cost: extensive annotation efforts, which are often impractical in real-world applications. In this work, we propose a novel framework called "Weakly-supervised RESidual Transformer" (WeakREST), which aims to achieve high AD accuracy while minimizing the need for extensive annotations. First, we reformulate the pixel-wise anomaly localization task into a block-wise classification problem. By shifting the focus to block-wise level, we can drastically reduce the amount of required annotations without compromising on the accuracy of anomaly detection Secondly, we design a residual-based transformer model, termed "Positional Fast Anomaly Residuals" (PosFAR), to classify the image blocks in real time. We further propose to label the anomalous regions using only bounding boxes or image tags as weaker labels, leading to a semi-supervised learning setting. On the benchmark dataset MVTec-AD, our proposed WeakREST framework achieves a remarkable Average Precision (AP) of 83.0%, significantly outperforming the previous best result of 75.8% in the unsupervised setting. In the supervised AD setting, WeakREST further improves performance, attaining an AP of 87.6% compared to the previous best of 78.6%. Notably, even when utilizing weaker labels based on bounding boxes, WeakREST surpasses recent leading methods that rely on pixel-wise supervision, achieving an AP of 87.1% against the prior best of 78.6% on MVTec-AD. This precision advantage is also consistently observed on other well-known AD datasets, such as BTAD and KSDD2. △ Less

Submitted 11 July, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: 14 pages,7 figures

arXiv:2305.04750 [pdf, other]

Sense, Imagine, Act: Multimodal Perception Improves Model-Based Reinforcement Learning for Head-to-Head Autonomous Racing

Authors: Elena Shrestha, Chetan Reddy, Hanxi Wan, Yulun Zhuang, Ram Vasudevan

Abstract: Model-based reinforcement learning (MBRL) techniques have recently yielded promising results for real-world autonomous racing using high-dimensional observations. MBRL agents, such as Dreamer, solve long-horizon tasks by building a world model and planning actions by latent imagination. This approach involves explicitly learning a model of the system dynamics and using it to learn the optimal poli… ▽ More Model-based reinforcement learning (MBRL) techniques have recently yielded promising results for real-world autonomous racing using high-dimensional observations. MBRL agents, such as Dreamer, solve long-horizon tasks by building a world model and planning actions by latent imagination. This approach involves explicitly learning a model of the system dynamics and using it to learn the optimal policy for continuous control over multiple timesteps. As a result, MBRL agents may converge to sub-optimal policies if the world model is inaccurate. To improve state estimation for autonomous racing, this paper proposes a self-supervised sensor fusion technique that combines egocentric LiDAR and RGB camera observations collected from the F1TENTH Gym. The zero-shot performance of MBRL agents is empirically evaluated on unseen tracks and against a dynamic obstacle. This paper illustrates that multimodal perception improves robustness of the world model without requiring additional training data. The resulting multimodal Dreamer agent safely avoided collisions and won the most races compared to other tested baselines in zero-shot head-to-head autonomous racing. △ Less

Submitted 8 May, 2023; originally announced May 2023.

arXiv:2305.03267 [pdf, other]

Forecasting Inter-Destination Tourism Flow via a Hybrid Deep Learning Model

Authors: Hanxi Fang, Song Gao, Feng Zhang

Abstract: Tourists often go to multiple tourism destinations in one trip. The volume of tourism flow between tourism destinations, also referred to as ITF (Inter-Destination Tourism Flow) in this paper, is commonly used for tourism management on tasks like the classification of destinations' roles and visitation pattern mining. However, the ITF is hard to get due to the limitation of data collection techniq… ▽ More Tourists often go to multiple tourism destinations in one trip. The volume of tourism flow between tourism destinations, also referred to as ITF (Inter-Destination Tourism Flow) in this paper, is commonly used for tourism management on tasks like the classification of destinations' roles and visitation pattern mining. However, the ITF is hard to get due to the limitation of data collection techniques and privacy issues. It is difficult to understand how the volume of ITF is influenced by features of the multi-attraction system. To address these challenges, we utilized multi-source datasets and proposed a graph-based hybrid deep learning model to predict the ITF. The model makes use of both the explicit features of individual tourism attractions and the implicit features of the interactions between multiple attractions. Experiments on ITF data extracted from crowdsourced tourists' travel notes about the city of Bei**g verified the usefulness of the proposed model. Besides, we analyze how different features of tourism attractions influence the volume of ITF with explainable AI techniques. Results show that popularity, quality and distance are the main three influential factors. Other features like coordinates will also exert an influence in different ways. The predicted ITF data can be further used for various downstream tasks in tourism management. The research also deepens the understanding of tourists' visiting choice in a tourism system consisting of multiple attractions. △ Less

Submitted 4 May, 2023; originally announced May 2023.

arXiv:2303.05012 [pdf, ps, other]

Spatio-Temporal Trajectory Similarity Measures: A Comprehensive Survey and Quantitative Study

Authors: Danlei Hu, Lu Chen, Hanxi Fang, Ziquan Fang, Tianyi Li, Yunjun Gao

Abstract: Spatio-temporal trajectory analytics is at the core of smart mobility solutions, which offers unprecedented information for diversified applications such as urban planning, infrastructure development, and vehicular networks. Trajectory similarity measure, which aims to evaluate the distance between two trajectories, is a fundamental functionality of trajectory analytics. In this paper, we propose… ▽ More Spatio-temporal trajectory analytics is at the core of smart mobility solutions, which offers unprecedented information for diversified applications such as urban planning, infrastructure development, and vehicular networks. Trajectory similarity measure, which aims to evaluate the distance between two trajectories, is a fundamental functionality of trajectory analytics. In this paper, we propose a comprehensive survey that investigates all the most common and representative spatio-temporal trajectory measures. First, we provide an overview of spatio-temporal trajectory measures in terms of three hierarchical perspectives: Non-learning vs. Learning, Free Space vs. Road Network, and Standalone vs. Distributed. Next, we present an evaluation benchmark by designing five real-world transformation scenarios. Based on this benchmark, extensive experiments are conducted to study the effectiveness, robustness,nefficiency, and scalability of each measure, which offers guidelines for trajectory measure selection among multiple techniques and applications such as trajectory data mining, deep learning, and distributed processing. △ Less

Submitted 17 March, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

Comments: 20 pages, 21 figures

arXiv:2302.13508 [pdf, other]

Detecting Jumps on a Tree: a Hierarchical Pitman-Yor Model for Evolution of Phenotypic Distributions

Authors: Hanxi Sun, Heejung Shim, Vinayak Rao

Abstract: This work focuses on clustering populations with a hierarchical dependency structure that can be described by a tree. A particular example that is the focus of our work is the phylogenetic tree, with nodes often representing biological species. Clustering of the populations in this problem is equivalent to identify branches in the tree where the populations at the parent and child node have signif… ▽ More This work focuses on clustering populations with a hierarchical dependency structure that can be described by a tree. A particular example that is the focus of our work is the phylogenetic tree, with nodes often representing biological species. Clustering of the populations in this problem is equivalent to identify branches in the tree where the populations at the parent and child node have significantly different distributions. We construct a nonparametric Bayesian model based on hierarchical Pitman-Yor and Poisson processes to exploit this hierarchical structure, with a key contribution being the ability to share statistical information between subpopulations. We develop an efficient particle MCMC algorithm to address computational challenges involved with posterior inference. We illustrate the efficacy of our proposed approach on both synthetic and real-world problems. △ Less

Submitted 26 February, 2023; originally announced February 2023.

Comments: 41 pages, 9 figures

arXiv:2302.12842 [pdf, other]

Anisotropic mass segregation: two-component mean-field model

Authors: Hanxi Wang, Bence Kocsis

Abstract: Galactic nuclei, the densest stellar environments in the Universe, exhibit a complex geometrical structure. The stars orbiting the central supermassive black hole follow a mass segregated distribution both in the radial distance from the center and in the inclination angle of the orbital planes. The latter distribution may represent the equilibrium state of vector resonant relaxation (VRR). In thi… ▽ More Galactic nuclei, the densest stellar environments in the Universe, exhibit a complex geometrical structure. The stars orbiting the central supermassive black hole follow a mass segregated distribution both in the radial distance from the center and in the inclination angle of the orbital planes. The latter distribution may represent the equilibrium state of vector resonant relaxation (VRR). In this paper, we build simple models to understand the equilibrium distribution found previously in numerical simulations. Using the method of maximising the total entropy and the quadrupole mean-field approximation, we determine the equilibrium distribution of axisymmetric two-component gravitating systems with two distinct masses, semimajor axes, and eccentricities. We also examine the limiting case when one of the components dominates over the total energy and angular momentum, approximately acting as a heat bath, which may represent the surrounding astrophysical environment such as the tidal perturbation from the galaxy, a massive perturber, a gas torus, or a nearby stellar system. Remarkably, the bodies above a critical mass in the subdominant component condense into a disk in a ubiquitous way. We identify the system parameters where the transition is smooth and where it is discontinuous. The latter cases exhibit a phase transition between an ordered disk-like state and a disordered nearly spherical distribution both in the canonical and in the microcanonical ensembles for these long-range interacting systems. △ Less

Submitted 23 October, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

Comments: 25 pages, 24 figures, accepted by PRD

arXiv:2212.05502 [pdf, ps, other]

Estimator: An Effective and Scalable Framework for Transportation Mode Classification over Trajectories

Authors: Danlei Hu, Ziquan Fang, Hanxi Fang, Tianyi Li, Chunhui Shen, Lu Chen, Yunjun Gao

Abstract: Transportation mode classification, the process of predicting the class labels of moving objects transportation modes, has been widely applied to a variety of real world applications, such as traffic management, urban computing, and behavior study. However, existing studies of transportation mode classification typically extract the explicit features of trajectory data but fail to capture the impl… ▽ More Transportation mode classification, the process of predicting the class labels of moving objects transportation modes, has been widely applied to a variety of real world applications, such as traffic management, urban computing, and behavior study. However, existing studies of transportation mode classification typically extract the explicit features of trajectory data but fail to capture the implicit features that affect the classification performance. In addition, most of the existing studies also prefer to apply RNN-based models to embed trajectories, which is only suitable for classifying small-scale data. To tackle the above challenges, we propose an effective and scalable framework for transportation mode classification over GPS trajectories, abbreviated Estimator. Estimator is established on a developed CNN-TCN architecture, which is capable of leveraging the spatial and temporal hidden features of trajectories to achieve high effectiveness and efficiency. Estimator partitions the entire traffic space into disjointed spatial regions according to traffic conditions, which enhances the scalability significantly and thus enables parallel transportation classification. Extensive experiments using eight public real-life datasets offer evidence that Estimator i) achieves superior model effectiveness (i.e., 99% Accuracy and 0.98 F1-score), which outperforms state-of-the-arts substantially; ii) exhibits prominent model efficiency, and obtains 7-40x speedups up over state-of-the-arts learning-based methods; and iii) shows high model scalability and robustness that enables large-scale classification analytics. △ Less

Submitted 11 December, 2022; originally announced December 2022.

Comments: 12 pages, 8 figures

arXiv:2211.12030 [pdf, other]

Knowledge Prompting for Few-shot Action Recognition

Authors: Yuheng Shi, Xinxiao Wu, Hanxi Lin

Abstract: Few-shot action recognition in videos is challenging for its lack of supervision and difficulty in generalizing to unseen actions. To address this task, we propose a simple yet effective method, called knowledge prompting, which leverages commonsense knowledge of actions from external resources to prompt a powerful pre-trained vision-language model for few-shot classification. We first collect lar… ▽ More Few-shot action recognition in videos is challenging for its lack of supervision and difficulty in generalizing to unseen actions. To address this task, we propose a simple yet effective method, called knowledge prompting, which leverages commonsense knowledge of actions from external resources to prompt a powerful pre-trained vision-language model for few-shot classification. We first collect large-scale language descriptions of actions, defined as text proposals, to build an action knowledge base. The collection of text proposals is done by filling in handcraft sentence templates with external action-related corpus or by extracting action-related phrases from captions of Web instruction videos.Then we feed these text proposals into the pre-trained vision-language model along with video frames to generate matching scores of the proposals to each frame, and the scores can be treated as action semantics with strong generalization. Finally, we design a lightweight temporal modeling network to capture the temporal evolution of action semantics for classification.Extensive experiments on six benchmark datasets demonstrate that our method generally achieves the state-of-the-art performance while reducing the training overhead to 0.001 of existing methods. △ Less

Submitted 22 November, 2022; originally announced November 2022.

arXiv:2211.09492 [pdf, ps, other]

doi 10.1093/mnras/stac3037

Photometric redshift estimation of galaxies in the DESI Legacy Imaging Surveys

Authors: Changhua Li, Yanxia Zhang, Chenzhou Cui, Dongwei Fan, Yongheng Zhao, Xue-Bing Wu, **g-Yi Zhang, Yihan Tao, Jun Han, Yunfei Xu, Shanshan Li, Linying Mi, Boliang He, Zihan Kang, Youfen Wang, Hanxi Yang, Sisi Yang

Abstract: The accurate estimation of photometric redshifts plays a crucial role in accomplishing science objectives of the large survey projects. The template-fitting and machine learning are the two main types of methods applied currently. Based on the training set obtained by cross-correlating the DESI Legacy Imaging Surveys DR9 galaxy catalogue and SDSS DR16 galaxy catalogue, the two kinds of methods are… ▽ More The accurate estimation of photometric redshifts plays a crucial role in accomplishing science objectives of the large survey projects. The template-fitting and machine learning are the two main types of methods applied currently. Based on the training set obtained by cross-correlating the DESI Legacy Imaging Surveys DR9 galaxy catalogue and SDSS DR16 galaxy catalogue, the two kinds of methods are used and optimized, such as EAZY for template-fitting approach and CATBOOST for machine learning. Then the created models are tested by the cross-matched samples of the DESI Legacy Imaging SurveysDR9 galaxy catalogue with LAMOST DR7, GAMA DR3 and WiggleZ galaxy catalogues. Moreover three machine learning methods (CATBOOST, Multi-Layer Perceptron and Random Forest) are compared, CATBOOST shows its superiority for our case. By feature selection and optimization of model parameters, CATBOOST can obtain higher accuracy with optical and infrared photometric information, the best performance ($MSE=0.0032$, $σ_{NMAD}=0.0156$ and $O=0.88$ per cent) with $g \le 24.0$, $r \le 23.4$ and $z \le 22.5$ is achieved. But EAZY can provide more accurate photometric redshift estimation for high redshift galaxies, especially beyond the redhisft range of training sample. Finally, we finish the redshift estimation of all DESI DR9 galaxies with CATBOOST and EAZY, which will contribute to the further study of galaxies and their properties. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: Accepted for publication in MNRAS. 14 pages, 9 figures, 11 tables

arXiv:2210.04140 [pdf, other]

Bayesian Repulsive Mixture Modeling with Matérn Point Processes

Authors: Hanxi Sun, Boqian Zhang, Vinayak Rao

Abstract: Mixture models are a standard tool in statistical analysis, widely used for density modeling and model-based clustering. Current approaches typically model the parameters of the mixture components as independent variables. This can result in overlap** or poorly separated clusters when either the number of clusters or the form of the mixture components is misspecified. Such model misspecification… ▽ More Mixture models are a standard tool in statistical analysis, widely used for density modeling and model-based clustering. Current approaches typically model the parameters of the mixture components as independent variables. This can result in overlap** or poorly separated clusters when either the number of clusters or the form of the mixture components is misspecified. Such model misspecification can undermine the interpretability and simplicity of these mixture models. To address this problem, we propose a Bayesian mixture model with repulsion between mixture components. The repulsion is induced by a generalized Matérn type-III repulsive point process model, obtained through a dependent sequential thinning scheme on a primary Poisson point process. We derive a novel and efficient Gibbs sampler for posterior inference, and demonstrate the utility of the proposed method on a number of synthetic and real-world problems. △ Less

Submitted 8 October, 2022; originally announced October 2022.

arXiv:2107.11813 [pdf, other]

Adaptive Recursive Circle Framework for Fine-grained Action Recognition

Authors: Hanxi Lin, Xinxiao Wu, Jiebo Luo

Abstract: How to model fine-grained spatial-temporal dynamics in videos has been a challenging problem for action recognition. It requires learning deep and rich features with superior distinctiveness for the subtle and abstract motions. Most existing methods generate features of a layer in a pure feedforward manner, where the information moves in one direction from inputs to outputs. And they rely on stack… ▽ More How to model fine-grained spatial-temporal dynamics in videos has been a challenging problem for action recognition. It requires learning deep and rich features with superior distinctiveness for the subtle and abstract motions. Most existing methods generate features of a layer in a pure feedforward manner, where the information moves in one direction from inputs to outputs. And they rely on stacking more layers to obtain more powerful features, bringing extra non-negligible overheads. In this paper, we propose an Adaptive Recursive Circle (ARC) framework, a fine-grained decorator for pure feedforward layers. It inherits the operators and parameters of the original layer but is slightly different in the use of those operators and parameters. Specifically, the input of the layer is treated as an evolving state, and its update is alternated with the feature generation. At each recursive step, the input state is enriched by the previously generated features and the feature generation is made with the newly updated input state. We hope the ARC framework can facilitate fine-grained action recognition by introducing deeply refined features and multi-scale receptive fields at a low cost. Significant improvements over feedforward baselines are observed on several benchmarks. For example, an ARC-equipped TSM-ResNet18 outperforms TSM-ResNet50 with 48% fewer FLOPs and 52% model parameters on Something-Something V1 and Diving48. △ Less

Submitted 25 July, 2021; originally announced July 2021.

arXiv:2106.13199 [pdf, other]

A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs

Authors: Hanxi Sun, Jason Plawinski, Sajanth Subramaniam, Amir Jamaludin, Timor Kadir, Aimee Readie, Gregory Ligozio, David Ohlssen, Mark Baillie, Thibaud Coroller

Abstract: Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data b… ▽ More Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data but preserves privacy. As part of the collaboration between Novartis and the Oxford Big Data Institute, we generate a synthetic dataset based on COSENTYX (secukinumab) Ankylosing Spondylitis clinical study. We apply an Auxiliary Classifier GAN to generate synthetic MRIs of vertebral units. The images are conditioned on the VU location (cervical, thoracic and lumbar). In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties along three key metrics: image fidelity, sample diversity and dataset privacy. △ Less

Submitted 19 August, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

Comments: v2

arXiv:2106.05563 [pdf, ps, other]

doi 10.1093/mnras/stab1650

Identification of BASS DR3 Sources as Stars, Galaxies and Quasars by XGBoost

Authors: Changhua Li, Yanxia Zhang, Chenzhou Cui, Dongwei Fan, Yongheng Zhao, Xue-Bing Wu, Boliang He, Yunfei Xu, Shanshan Li, Jun Han, Yihan Tao, Linying Mi, Hanxi Yang, Sisi Yang

Abstract: The Bei**g-Arizona Sky Survey (BASS) Data Release 3 (DR3) catalogue was released in 2019, which contains the data from all BASS and the Mosaic z-band Legacy Survey (MzLS) observations during 2015 January and 2019 March, about 200 million sources. We cross-match BASS DR3 with spectral databases from the Sloan Digital Sky Survey (SDSS) and the Large Sky Area Multi-object Fiber Spectroscopic Telesco… ▽ More The Bei**g-Arizona Sky Survey (BASS) Data Release 3 (DR3) catalogue was released in 2019, which contains the data from all BASS and the Mosaic z-band Legacy Survey (MzLS) observations during 2015 January and 2019 March, about 200 million sources. We cross-match BASS DR3 with spectral databases from the Sloan Digital Sky Survey (SDSS) and the Large Sky Area Multi-object Fiber Spectroscopic Telescope (LAMOST) to obtain the spectroscopic classes of known samples. Then, the samples are cross-matched with ALLWISE database. Based on optical and infrared information of the samples, we use the XGBoost algorithm to construct different classifiers, including binary classification and multiclass classification. The accuracy of these classifiers with the best input pattern is larger than 90.0 per cent. Finally, all selected sources in the BASS DR3 catalogue are classified by these classifiers. The classification label and probabilities for individual sources are assigned by different classifiers. When the predicted results by binary classification are the same as multiclass classification with optical and infrared information, the number of star, galaxy and quasar candidates is separately 12 375 838 (P_S>0.95), 18 606 073 (P_G>0.95) and 798 928 (P_Q>0.95). For these sources without infrared information, the predicted results can be as a reference. Those candidates may be taken as input catalogue of LAMOST, DESI or other projects for follow up observation. The classified result will be of great help and reference for future research of the BASS DR3 sources. △ Less

Submitted 10 June, 2021; originally announced June 2021.

Comments: 15 pages, 10 tables, 6 figures, accepted for publication in Monthly Notices of the Royal Astronomical Society Main Journal

arXiv:2009.03497 [pdf, ps, other]

doi 10.1088/1538-3873/aba69f

GWOPS: A VO-technology Driven Tool to Search for the Electromagnetic Counterpart of Gravitational Wave Event

Authors: Yunfei Xu, Dong Xu, Chenzhou Cui, Dongwei Fan, Zipei Zhu, Bangyao Yu, Changhua Li, Jun Han, Linying Mi, Shanshan Li, Boliang He, Yihan Tao, Hanxi Yang, Sisi Yang

Abstract: The search and follow-up observation of electromagnetic (EM) counterparts of gravitational waves (GW) is a current hot topic of GW cosmology. Due to the limitation of the accuracy of the GW observation facility at this stage, we can only get a rough sky-localization region for the GW event, and the typical area of the region is between 200 and 1500 square degrees. Since GW events occur in or near… ▽ More The search and follow-up observation of electromagnetic (EM) counterparts of gravitational waves (GW) is a current hot topic of GW cosmology. Due to the limitation of the accuracy of the GW observation facility at this stage, we can only get a rough sky-localization region for the GW event, and the typical area of the region is between 200 and 1500 square degrees. Since GW events occur in or near galaxies, limiting the observation target to galaxies can significantly speedup searching for EM counterparts. Therefore, how to efficiently select host galaxy candidates in such a large GW localization region, how to arrange the observation sequence, and how to efficiently identify the GW source from observational data are the problems that need to be solved. International Virtual Observatory Alliance has developed a series of technical standards for data retrieval, interoperability and visualization. Based on the application of VO technologies, we construct the GW follow-up Observation Planning System (GWOPS). It consists of three parts: a pipeline to select host candidates of GW and sort their priorities for follow-up observation, an identification module to find the transient from follow-up observation data, and a visualization module to display GW-related data. GWOPS can rapidly respond to GW events. With GWOPS, the operations such as follow-up observation planning, data storage, data visualization, and transient identification can be efficiently coordinated, which will promote the success searching rate for GWs EM counterparts. △ Less

Submitted 9 September, 2020; v1 submitted 7 September, 2020; originally announced September 2020.

Comments: 12 pages, 8 figures, published by Publications of the Astronomical Society of the Pacific

Journal ref: Publications of the Astronomical Society of the Pacific, 132(1016), 104501 (2020)

arXiv:2003.02029 [pdf, ps, other]

IVOA HiPS Implementation in the Framework of WorldWide Telescope

Authors: Yunfei Xu, Chenzhou Cui, Dongwei Fan, Shanshan Li, Changhua Li, Jun Han, Linying Mi, Boliang He, Hanxi Yang, Yihan Tao, Sisi Yang, Lan He

Abstract: The WorldWide Telescope(WWT) is a scientific visualization platform which can browse deep space images, star catalogs, and planetary remote sensing data from different observation facilities in a three-dimensional virtual scene. First launched and then open-sourced by Microsoft Research, the WWT is now managed by the American Astronomical Society (AAS). Hierarchical Progressive Survey (HiPS) is an… ▽ More The WorldWide Telescope(WWT) is a scientific visualization platform which can browse deep space images, star catalogs, and planetary remote sensing data from different observation facilities in a three-dimensional virtual scene. First launched and then open-sourced by Microsoft Research, the WWT is now managed by the American Astronomical Society (AAS). Hierarchical Progressive Survey (HiPS) is an astronomical data release scheme proposed by Centre de Données astronomiques de Strasbourg (CDS) and has been accepted as a recommendation by International Virtual Observatory Alliance (IVOA). The HiPS solution has been adopted widely by many astronomical institutions for data release. Since WWT selected Hierarchical Triangular Mesh (HTM) as the standard for data visualization in the early stage of development, data released by HiPS cannot be visualized in WWT, which significantly limits the application of WWT. This paper introduces the implementation method for HiPS dataset visualization in WWT, and introduces HiPS data projection, mesh rendering, and data index implementation in WWT. Taking Chang'E-2 lunar probe data as an example, this paper introduces how to convert planetary remote sensing data into a HiPS dataset and integrate it into WWT. This paper also compares the efficiency and memory consumption of WWT loading its native data and HiPS data, and illustrates the application of HiPS in scientific data visualization and science education in WWT. △ Less

Submitted 4 March, 2020; originally announced March 2020.

Comments: 22 pages, 15 figures

arXiv:1810.03545 [pdf, other]

Stein Neural Sampler

Authors: Tianyang Hu, Zixiang Chen, Hanxi Sun, **cheng Bai, Mao Ye, Guang Cheng

Abstract: We propose two novel samplers to generate high-quality samples from a given (un-normalized) probability density. Motivated by the success of generative adversarial networks, we construct our samplers using deep neural networks that transform a reference distribution to the target distribution. Training schemes are developed to minimize two variations of the Stein discrepancy, which is designed to… ▽ More We propose two novel samplers to generate high-quality samples from a given (un-normalized) probability density. Motivated by the success of generative adversarial networks, we construct our samplers using deep neural networks that transform a reference distribution to the target distribution. Training schemes are developed to minimize two variations of the Stein discrepancy, which is designed to work with un-normalized densities. Once trained, our samplers are able to generate samples instantaneously. We show that the proposed methods are theoretically sound and experience fewer convergence issues compared with traditional sampling approaches according to our empirical studies. △ Less

Submitted 8 February, 2021; v1 submitted 8 October, 2018; originally announced October 2018.

arXiv:1701.00561 [pdf, other]

Robust and Real-time Deep Tracking Via Multi-Scale Domain Adaptation

Authors: Xinyu Wang, Hanxi Li, Yi Li, Fumin Shen, Fatih Porikli

Abstract: Visual tracking is a fundamental problem in computer vision. Recently, some deep-learning-based tracking algorithms have been achieving record-breaking performances. However, due to the high complexity of deep learning, most deep trackers suffer from low tracking speed, and thus are impractical in many real-world applications. Some new deep trackers with smaller network structure achieve high effi… ▽ More Visual tracking is a fundamental problem in computer vision. Recently, some deep-learning-based tracking algorithms have been achieving record-breaking performances. However, due to the high complexity of deep learning, most deep trackers suffer from low tracking speed, and thus are impractical in many real-world applications. Some new deep trackers with smaller network structure achieve high efficiency while at the cost of significant decrease on precision. In this paper, we propose to transfer the feature for image classification to the visual tracking domain via convolutional channel reductions. The channel reduction could be simply viewed as an additional convolutional layer with the specific task. It not only extracts useful information for object tracking but also significantly increases the tracking speed. To better accommodate the useful feature of the target in different scales, the adaptation filters are designed with different sizes. The yielded visual tracker is real-time and also illustrates the state-of-the-art accuracies in the experiment involving two well-adopted benchmarks with more than 100 test videos. △ Less

Submitted 2 January, 2017; originally announced January 2017.

Comments: 6 pages

arXiv:1511.05659 [pdf, ps, other]

Learning Discriminative Representations for Semantic Cross Media Retrieval

Authors: Aiwen Jiang, Hanxi Li, Yi Li, Mingwen Wang

Abstract: Heterogeneous gap among different modalities emerges as one of the critical issues in modern AI problems. Unlike traditional uni-modal cases, where raw features are extracted and directly measured, the heterogeneous nature of cross modal tasks requires the intrinsic semantic representation to be compared in a unified framework. This paper studies the learning of different representations that can… ▽ More Heterogeneous gap among different modalities emerges as one of the critical issues in modern AI problems. Unlike traditional uni-modal cases, where raw features are extracted and directly measured, the heterogeneous nature of cross modal tasks requires the intrinsic semantic representation to be compared in a unified framework. This paper studies the learning of different representations that can be retrieved across different modality contents. A novel approach for mining cross-modal representations is proposed by incorporating explicit linear semantic projecting in Hilbert space. The insight is that the discriminative structures of different modality data can be linearly represented in appropriate high dimension Hilbert spaces, where linear operations can be used to approximate nonlinear decisions in the original spaces. As a result, an efficient linear semantic down map** is jointly learned for multimodal data, leading to a common space where they can be compared. The mechanism of "feature up-lifting and down-projecting" works seamlessly as a whole, which accomplishes crossmodal retrieval tasks very well. The proposed method, named as shared discriminative semantic representation learning (\textbf{SDSRL}), is tested on two public multimodal dataset for both within- and inter- modal retrieval. The experiments demonstrate that it outperforms several state-of-the-art methods in most scenarios. △ Less

Submitted 18 November, 2015; originally announced November 2015.

arXiv:1503.00072 [pdf, other]

doi 10.1109/TIP.2015.2510583

DeepTrack: Learning Discriminative Feature Representations Online for Robust Visual Tracking

Authors: Hanxi Li, Yi Li, Fatih Porikli

Abstract: Deep neural networks, albeit their great success on feature learning in various computer vision tasks, are usually considered as impractical for online visual tracking because they require very long training time and a large number of training samples. In this work, we present an efficient and very robust tracking algorithm using a single Convolutional Neural Network (CNN) for learning effective f… ▽ More Deep neural networks, albeit their great success on feature learning in various computer vision tasks, are usually considered as impractical for online visual tracking because they require very long training time and a large number of training samples. In this work, we present an efficient and very robust tracking algorithm using a single Convolutional Neural Network (CNN) for learning effective feature representations of the target object, in a purely online manner. Our contributions are multifold: First, we introduce a novel truncated structural loss function that maintains as many training samples as possible and reduces the risk of tracking error accumulation. Second, we enhance the ordinary Stochastic Gradient Descent approach in CNN training with a robust sample selection mechanism. The sampling mechanism randomly generates positive and negative samples from different temporal distributions, which are generated by taking the temporal relations and label noise into account. Finally, a lazy yet effective updating scheme is designed for CNN training. Equipped with this novel updating algorithm, the CNN model is robust to some long-existing difficulties in visual tracking such as occlusion or incorrect detections, without loss of the effective adaption for significant appearance changes. In the experiment, our CNN tracker outperforms all compared state-of-the-art methods on two recently proposed benchmarks which in total involve over 60 video sequences. The remarkable performance improvement over the existing trackers illustrates the superiority of the feature representations which are learned △ Less

Submitted 28 February, 2015; originally announced March 2015.

Comments: 12 pages

arXiv:1110.0264 [pdf, other]

Face Recognition using Optimal Representation Ensemble

Authors: Hanxi Li, Chunhua Shen, Yongsheng Gao

Abstract: Recently, the face recognizers based on linear representations have been shown to deliver state-of-the-art performance. In real-world applications, however, face images usually suffer from expressions, disguises and random occlusions. The problematic facial parts undermine the validity of the linear-subspace assumption and thus the recognition performance deteriorates significantly. In this work,… ▽ More Recently, the face recognizers based on linear representations have been shown to deliver state-of-the-art performance. In real-world applications, however, face images usually suffer from expressions, disguises and random occlusions. The problematic facial parts undermine the validity of the linear-subspace assumption and thus the recognition performance deteriorates significantly. In this work, we address the problem in a learning-inference-mixed fashion. By observing that the linear-subspace assumption is more reliable on certain face patches rather than on the holistic face, some Bayesian Patch Representations (BPRs) are randomly generated and interpreted according to the Bayes' theory. We then train an ensemble model over the patch-representations by minimizing the empirical risk w.r.t the "leave-one-out margins". The obtained model is termed Optimal Representation Ensemble (ORE), since it guarantees the optimality from the perspective of Empirical Risk Minimization. To handle the unknown patterns in test faces, a robust version of BPR is proposed by taking the non-face category into consideration. Equipped with the Robust-BPRs, the inference ability of ORE is increased dramatically and several record-breaking accuracies (99.9% on Yale-B and 99.5% on AR) and desirable efficiencies (below 20 ms per face in Matlab) are achieved. It also overwhelms other modular heuristics on the faces with random occlusions, extreme expressions and disguises. Furthermore, to accommodate immense BPRs sets, a boosting-like algorithm is also derived. The boosted model, a.k.a Boosted-ORE, obtains similar performance to its prototype. Besides the empirical superiorities, two desirable features of the proposed methods, namely, the training-determined model-selection and the data-weight-free boosting procedure, are also theoretically verified. △ Less

Submitted 3 October, 2011; originally announced October 2011.

Comments: 36-page draft for IEEE Transactions on Image Processing (TIP)

arXiv:1012.2603 [pdf, other]

Real-time Visual Tracking Using Sparse Representation

Authors: Hanxi Li, Chunhua Shen, Qinfeng Shi

Abstract: The $\ell_1$ tracker obtains robustness by seeking a sparse representation of the tracking object via $\ell_1$ norm minimization \cite{Xue_ICCV_09_Track}. However, the high computational complexity involved in the $ \ell_1 $ tracker restricts its further applications in real time processing scenario. Hence we propose a Real Time Compressed Sensing Tracking (RTCST) by exploiting the signal recovery… ▽ More The $\ell_1$ tracker obtains robustness by seeking a sparse representation of the tracking object via $\ell_1$ norm minimization \cite{Xue_ICCV_09_Track}. However, the high computational complexity involved in the $ \ell_1 $ tracker restricts its further applications in real time processing scenario. Hence we propose a Real Time Compressed Sensing Tracking (RTCST) by exploiting the signal recovery power of Compressed Sensing (CS). Dimensionality reduction and a customized Orthogonal Matching Pursuit (OMP) algorithm are adopted to accelerate the CS tracking. As a result, our algorithm achieves a real-time speed that is up to $6,000$ times faster than that of the $\ell_1$ tracker. Meanwhile, RTCST still produces competitive (sometimes even superior) tracking accuracy comparing to the existing $\ell_1$ tracker. Furthermore, for a stationary camera, a further refined tracker is designed by integrating a CS-based background model (CSBM). This CSBM-equipped tracker coined as RTCST-B, outperforms most state-of-the-arts with respect to both accuracy and robustness. Finally, our experimental results on various video sequences, which are verified by a new metric---Tracking Success Probability (TSP), show the excellence of the proposed algorithms. △ Less

Submitted 12 December, 2010; originally announced December 2010.

Comments: 14 pages

arXiv:1008.5188

Totally Corrective Boosting for Regularized Risk Minimization

Authors: Chunhua Shen, Hanxi Li, Nick Barnes

Abstract: Consideration of the primal and dual problems together leads to important new insights into the characteristics of boosting algorithms. In this work, we propose a general framework that can be used to design new boosting algorithms. A wide variety of machine learning problems essentially minimize a regularized risk functional. We show that the proposed boosting framework, termed CGBoost, can accom… ▽ More Consideration of the primal and dual problems together leads to important new insights into the characteristics of boosting algorithms. In this work, we propose a general framework that can be used to design new boosting algorithms. A wide variety of machine learning problems essentially minimize a regularized risk functional. We show that the proposed boosting framework, termed CGBoost, can accommodate various loss functions and different regularizers in a totally-corrective optimization fashion. We show that, by solving the primal rather than the dual, a large body of totally-corrective boosting algorithms can actually be efficiently solved and no sophisticated convex optimization solvers are needed. We also demonstrate that some boosting algorithms like AdaBoost can be interpreted in our framework--even their optimization is not totally corrective. We empirically show that various boosting algorithms based on the proposed framework perform similarly on the UCIrvine machine learning datasets [1] that we have used in the experiments. △ Less

Submitted 11 December, 2011; v1 submitted 30 August, 2010; originally announced August 2010.

Comments: This paper has been withdrawn by the author

arXiv:1005.4103 [pdf, other]

LACBoost and FisherBoost: Optimally Building Cascade Classifiers

Authors: Chunhua Shen, Peng Wang, Hanxi Li

Abstract: Object detection is one of the key tasks in computer vision. The cascade framework of Viola and Jones has become the de facto standard. A classifier in each node of the cascade is required to achieve extremely high detection rates, instead of low overall classification error. Although there are a few reported methods addressing this requirement in the context of object detection, there is no a pri… ▽ More Object detection is one of the key tasks in computer vision. The cascade framework of Viola and Jones has become the de facto standard. A classifier in each node of the cascade is required to achieve extremely high detection rates, instead of low overall classification error. Although there are a few reported methods addressing this requirement in the context of object detection, there is no a principled feature selection method that explicitly takes into account this asymmetric node learning objective. We provide such a boosting algorithm in this work. It is inspired by the linear asymmetric classifier (LAC) of Wu et al. in that our boosting algorithm optimizes a similar cost function. The new totally-corrective boosting algorithm is implemented by the column generation technique in convex optimization. Experimental results on face detection suggest that our proposed boosting algorithms can improve the state-of-the-art methods in detection performance. △ Less

Submitted 22 May, 2010; originally announced May 2010.

Comments: 15 pages

arXiv:0904.2037 [pdf, other]

Boosting through Optimization of Margin Distributions

Authors: Chunhua Shen, Hanxi Li

Abstract: Boosting has attracted much research attention in the past decade. The success of boosting algorithms may be interpreted in terms of the margin theory. Recently it has been shown that generalization error of classifiers can be obtained by explicitly taking the margin distribution of the training data into account. Most of the current boosting algorithms in practice usually optimizes a convex los… ▽ More Boosting has attracted much research attention in the past decade. The success of boosting algorithms may be interpreted in terms of the margin theory. Recently it has been shown that generalization error of classifiers can be obtained by explicitly taking the margin distribution of the training data into account. Most of the current boosting algorithms in practice usually optimizes a convex loss function and do not make use of the margin distribution. In this work we design a new boosting algorithm, termed margin-distribution boosting (MDBoost), which directly maximizes the average margin and minimizes the margin variance simultaneously. This way the margin distribution is optimized. A totally-corrective optimization algorithm based on column generation is proposed to implement MDBoost. Experiments on UCI datasets show that MDBoost outperforms AdaBoost and LPBoost in most cases. △ Less

Submitted 6 January, 2010; v1 submitted 13 April, 2009; originally announced April 2009.

Comments: 9 pages. To publish/Published in IEEE Transactions on Neural Networks, 21(7), July 2010

arXiv:0901.3590 [pdf, ps, other]

doi 10.1109/TPAMI.2010.47

On the Dual Formulation of Boosting Algorithms

Authors: Chunhua Shen, Hanxi Li

Abstract: We study boosting algorithms from a new perspective. We show that the Lagrange dual problems of AdaBoost, LogitBoost and soft-margin LPBoost with generalized hinge loss are all entropy maximization problems. By looking at the dual problems of these boosting algorithms, we show that the success of boosting algorithms can be understood in terms of maintaining a better margin distribution by maximizi… ▽ More We study boosting algorithms from a new perspective. We show that the Lagrange dual problems of AdaBoost, LogitBoost and soft-margin LPBoost with generalized hinge loss are all entropy maximization problems. By looking at the dual problems of these boosting algorithms, we show that the success of boosting algorithms can be understood in terms of maintaining a better margin distribution by maximizing margins and at the same time controlling the margin variance.We also theoretically prove that, approximately, AdaBoost maximizes the average margin, instead of the minimum margin. The duality formulation also enables us to develop column generation based optimization algorithms, which are totally corrective. We show that they exhibit almost identical classification results to that of standard stage-wise additive boosting algorithms but with much faster convergence rates. Therefore fewer weak classifiers are needed to build the ensemble using our proposed optimization technique. △ Less

Submitted 27 May, 2023; v1 submitted 22 January, 2009; originally announced January 2009.

Comments: Fixed typos. 16 pages. Published in IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Dec. 2010

Showing 1–36 of 36 results for author: Hanxi