Search | arXiv e-print repository

PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance

Authors: Haohan Weng, Yikai Wang, Tong Zhang, C. L. Philip Chen, Jun Zhu

Abstract: Generating compact and sharply detailed 3D meshes poses a significant challenge for current 3D generative models. Different from extracting dense meshes from neural representation, some recent works try to model the native mesh distribution (i.e., a set of triangles), which generates more compact results as humans crafted. However, due to the complexity and variety of mesh topology, these methods… ▽ More Generating compact and sharply detailed 3D meshes poses a significant challenge for current 3D generative models. Different from extracting dense meshes from neural representation, some recent works try to model the native mesh distribution (i.e., a set of triangles), which generates more compact results as humans crafted. However, due to the complexity and variety of mesh topology, these methods are typically limited to small datasets with specific categories and are hard to extend. In this paper, we introduce a generic and scalable mesh generation framework PivotMesh, which makes an initial attempt to extend the native mesh generation to large-scale datasets. We employ a transformer-based auto-encoder to encode meshes into discrete tokens and decode them from face level to vertex level hierarchically. Subsequently, to model the complex typology, we first learn to generate pivot vertices as coarse mesh representation and then generate the complete mesh tokens with the same auto-regressive Transformer. This reduces the difficulty compared with directly modeling the mesh distribution and further improves the model controllability. PivotMesh demonstrates its versatility by effectively learning from both small datasets like Shapenet, and large-scale datasets like Objaverse and Objaverse-xl. Extensive experiments indicate that PivotMesh can generate compact and sharp 3D meshes across various categories, highlighting its great potential for native mesh modeling. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Project website: https://whaohan.github.io/pivotmesh

arXiv:2404.17936 [pdf, other]

FDCE-Net: Underwater Image Enhancement with Embedding Frequency and Dual Color Encoder

Authors: Zheng Cheng, Guodong Fan, **gchun Zhou, Min Gan, C. L. Philip Chen

Abstract: Underwater images often suffer from various issues such as low brightness, color shift, blurred details, and noise due to light absorption and scattering caused by water and suspended particles. Previous underwater image enhancement (UIE) methods have primarily focused on spatial domain enhancement, neglecting the frequency domain information inherent in the images. However, the degradation factor… ▽ More Underwater images often suffer from various issues such as low brightness, color shift, blurred details, and noise due to light absorption and scattering caused by water and suspended particles. Previous underwater image enhancement (UIE) methods have primarily focused on spatial domain enhancement, neglecting the frequency domain information inherent in the images. However, the degradation factors of underwater images are closely intertwined in the spatial domain. Although certain methods focus on enhancing images in the frequency domain, they overlook the inherent relationship between the image degradation factors and the information present in the frequency domain. As a result, these methods frequently enhance certain attributes of the improved image while inadequately addressing or even exacerbating other attributes. Moreover, many existing methods heavily rely on prior knowledge to address color shift problems in underwater images, limiting their flexibility and robustness. In order to overcome these limitations, we propose the Embedding Frequency and Dual Color Encoder Network (FDCE-Net) in our paper. The FDCE-Net consists of two main structures: (1) Frequency Spatial Network (FS-Net) aims to achieve initial enhancement by utilizing our designed Frequency Spatial Residual Block (FSRB) to decouple image degradation factors in the frequency domain and enhance different attributes separately. (2) To tackle the color shift issue, we introduce the Dual-Color Encoder (DCE). The DCE establishes correlations between color and semantic representations through cross-attention and leverages multi-scale image features to guide the optimization of adaptive color query. The final enhanced images are generated by combining the outputs of FS-Net and DCE through a fusion network. These images exhibit rich details, clear textures, low noise and natural colors. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: 16pages,13 figures

arXiv:2404.17400 [pdf, other]

Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement

Authors: Zishu Yao, Guodong Fan, **fu Fan, Min Gan, C. L. Philip Chen

Abstract: Low-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range corre… ▽ More Low-light remote sensing images generally feature high resolution and high spatial complexity, with continuously distributed surface features in space. This continuity in scenes leads to extensive long-range correlations in spatial domains within remote sensing images. Convolutional Neural Networks, which rely on local correlations for long-distance modeling, struggle to establish long-range correlations in such images. On the other hand, transformer-based methods that focus on global information face high computational complexities when processing high-resolution remote sensing images. From another perspective, Fourier transform can compute global information without introducing a large number of parameters, enabling the network to more efficiently capture the overall image structure and establish long-range correlations. Therefore, we propose a Dual-Domain Feature Fusion Network (DFFN) for low-light remote sensing image enhancement. Specifically, this challenging task of low-light enhancement is divided into two more manageable sub-tasks: the first phase learns amplitude information to restore image brightness, and the second phase learns phase information to refine details. To facilitate information exchange between the two phases, we designed an information fusion affine block that combines data from different phases and scales. Additionally, we have constructed two dark light remote sensing datasets to address the current lack of datasets in dark light remote sensing image enhancement. Extensive evaluations show that our method outperforms existing state-of-the-art methods. The code is available at https://github.com/iijjlk/DFFN. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 14 page

arXiv:2404.12398 [pdf, other]

Incremental Self-training for Semi-supervised Learning

Authors: Jifeng Guo, Zhulin Liu, Tong Zhang, C. L. Philip Chen

Abstract: Semi-supervised learning provides a solution to reduce the dependency of machine learning on labeled data. As one of the efficient semi-supervised techniques, self-training (ST) has received increasing attention. Several advancements have emerged to address challenges associated with noisy pseudo-labels. Previous works on self-training acknowledge the importance of unlabeled data but have not delv… ▽ More Semi-supervised learning provides a solution to reduce the dependency of machine learning on labeled data. As one of the efficient semi-supervised techniques, self-training (ST) has received increasing attention. Several advancements have emerged to address challenges associated with noisy pseudo-labels. Previous works on self-training acknowledge the importance of unlabeled data but have not delved into their efficient utilization, nor have they paid attention to the problem of high time consumption caused by iterative learning. This paper proposes Incremental Self-training (IST) for semi-supervised learning to fill these gaps. Unlike ST, which processes all data indiscriminately, IST processes data in batches and priority assigns pseudo-labels to unlabeled samples with high certainty. Then, it processes the data around the decision boundary after the model is stabilized, enhancing classifier performance. Our IST is simple yet effective and fits existing self-training-based semi-supervised learning methods. We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed. Significantly, it outperforms state-of-the-art competitors on three challenging image classification tasks. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2403.09093 [pdf, other]

Desigen: A Pipeline for Controllable Design Template Generation

Authors: Haohan Weng, Danqing Huang, Yu Qiao, Zheng Hu, Chin-Yew Lin, Tong Zhang, C. L. Philip Chen

Abstract: Templates serve as a good starting point to implement a design (e.g., banner, slide) but it takes great effort from designers to manually create. In this paper, we present Desigen, an automatic template creation pipeline which generates background images as well as harmonious layout elements over the background. Different from natural images, a background image should preserve enough non-salient s… ▽ More Templates serve as a good starting point to implement a design (e.g., banner, slide) but it takes great effort from designers to manually create. In this paper, we present Desigen, an automatic template creation pipeline which generates background images as well as harmonious layout elements over the background. Different from natural images, a background image should preserve enough non-salient space for the overlaying layout elements. To equip existing advanced diffusion-based models with stronger spatial control, we propose two simple but effective techniques to constrain the saliency distribution and reduce the attention weight in desired regions during the background generation process. Then conditioned on the background, we synthesize the layout with a Transformer-based autoregressive generator. To achieve a more harmonious composition, we propose an iterative inference strategy to adjust the synthesized background and layout in multiple rounds. We constructed a design dataset with more than 40k advertisement banners to verify our approach. Extensive experiments demonstrate that the proposed pipeline generates high-quality templates comparable to human designers. More than a single-page design, we further show an application of presentation generation that outputs a set of theme-consistent slides. The data and code are available at https://whaohan.github.io/desigen. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: This paper has been accepted for publication in CVPR2024

arXiv:2402.13865 [pdf, other]

Variable Projection Algorithms: Theoretical Insights and A Novel Approach for Problems with Large Residual

Authors: Guangyong Chen, Peng Xue, Min Gan, **g Chen, Wenzhong Guo, C. L. Philip. Chen

Abstract: This paper delves into an in-depth exploration of the Variable Projection (VP) algorithm, a powerful tool for solving separable nonlinear optimization problems across multiple domains, including system identification, image processing, and machine learning. We first establish a theoretical framework to examine the effect of the approximate treatment of the coupling relationship among parameters on… ▽ More This paper delves into an in-depth exploration of the Variable Projection (VP) algorithm, a powerful tool for solving separable nonlinear optimization problems across multiple domains, including system identification, image processing, and machine learning. We first establish a theoretical framework to examine the effect of the approximate treatment of the coupling relationship among parameters on the local convergence of the VP algorithm and theoretically prove that the Kaufman's VP algorithm can achieve a similar convergence rate as the Golub \& Pereyra's form. These studies fill the gap in the existing convergence theory analysis, and provide a solid foundation for understanding the mechanism of VP algorithm and broadening its application horizons. Furthermore, drawing inspiration from these theoretical revelations, we design a refined VP algorithm for handling separable nonlinear optimization problems characterized by large residual, called VPLR, which boosts the convergence performance by addressing the interdependence of parameters within the separable model and by continually correcting the approximated Hessian matrix to counteract the influence of large residual during the iterative process. The effectiveness of this refined algorithm is corroborated through numerical experimentation. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 17 pages, 6 figures

arXiv:2402.05410 [pdf, other]

SpirDet: Towards Efficient, Accurate and Lightweight Infrared Small Target Detector

Authors: Qianchen Mao, Qiang Li, Bingshu Wang, Yongjun Zhang, Tao Dai, C. L. Philip Chen

Abstract: In recent years, the detection of infrared small targets using deep learning methods has garnered substantial attention due to notable advancements. To improve the detection capability of small targets, these methods commonly maintain a pathway that preserves high-resolution features of sparse and tiny targets. However, it can result in redundant and expensive computations. To tackle this challeng… ▽ More In recent years, the detection of infrared small targets using deep learning methods has garnered substantial attention due to notable advancements. To improve the detection capability of small targets, these methods commonly maintain a pathway that preserves high-resolution features of sparse and tiny targets. However, it can result in redundant and expensive computations. To tackle this challenge, we propose SpirDet, a novel approach for efficient detection of infrared small targets. Specifically, to cope with the computational redundancy issue, we employ a new dual-branch sparse decoder to restore the feature map. Firstly, the fast branch directly predicts a sparse map indicating potential small target locations (occupying only 0.5\% area of the map). Secondly, the slow branch conducts fine-grained adjustments at the positions indicated by the sparse map. Additionally, we design an lightweight DO-RepEncoder based on reparameterization with the Downsampling Orthogonality, which can effectively reduce memory consumption and inference latency. Extensive experiments show that the proposed SpirDet significantly outperforms state-of-the-art models while achieving faster inference speed and fewer parameters. For example, on the IRSTD-1K dataset, SpirDet improves $MIoU$ by 4.7 and has a $7\times$ $FPS$ acceleration compared to the previous state-of-the-art model. The code will be open to the public. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2312.11583 [pdf, other]

AI-Based Energy Transportation Safety: Pipeline Radial Threat Estimation Using Intelligent Sensing System

Authors: Chengyuan Zhu, Yiyuan Yang, Kaixiang Yang, Haifeng Zhang, Qinmin Yang, C. L. Philip Chen

Abstract: The application of artificial intelligence technology has greatly enhanced and fortified the safety of energy pipelines, particularly in safeguarding against external threats. The predominant methods involve the integration of intelligent sensors to detect external vibration, enabling the identification of event types and locations, thereby replacing manual detection methods. However, practical im… ▽ More The application of artificial intelligence technology has greatly enhanced and fortified the safety of energy pipelines, particularly in safeguarding against external threats. The predominant methods involve the integration of intelligent sensors to detect external vibration, enabling the identification of event types and locations, thereby replacing manual detection methods. However, practical implementation has exposed a limitation in current methods - their constrained ability to accurately discern the spatial dimensions of external signals, which complicates the authentication of threat events. Our research endeavors to overcome the above issues by harnessing deep learning techniques to achieve a more fine-grained recognition and localization process. This refinement is crucial in effectively identifying genuine threats to pipelines, thus enhancing the safety of energy transportation. This paper proposes a radial threat estimation method for energy pipelines based on distributed optical fiber sensing technology. Specifically, we introduce a continuous multi-view and multi-domain feature fusion methodology to extract comprehensive signal features and construct a threat estimation and recognition network. The utilization of collected acoustic signal data is optimized, and the underlying principle is elucidated. Moreover, we incorporate the concept of transfer learning through a pre-trained model, enhancing both recognition accuracy and training efficiency. Empirical evidence gathered from real-world scenarios underscores the efficacy of our method, notably in its substantial reduction of false alarms and remarkable gains in recognition accuracy. More generally, our method exhibits versatility and can be extrapolated to a broader spectrum of recognition tasks and scenarios. △ Less

Submitted 25 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: The 38th Annual AAAI Conference on Artificial Intelligence (AAAI 2024)

arXiv:2310.08092 [pdf, other]

Consistent123: Improve Consistency for One Image to 3D Object Synthesis

Authors: Haohan Weng, Tianyu Yang, Jianan Wang, Yu Li, Tong Zhang, C. L. Philip Chen, Lei Zhang

Abstract: Large image diffusion models enable novel view synthesis with high quality and excellent zero-shot capability. However, such models based on image-to-image translation have no guarantee of view consistency, limiting the performance for downstream tasks like 3D reconstruction and image-to-3D generation. To empower consistency, we propose Consistent123 to synthesize novel views simultaneously by inc… ▽ More Large image diffusion models enable novel view synthesis with high quality and excellent zero-shot capability. However, such models based on image-to-image translation have no guarantee of view consistency, limiting the performance for downstream tasks like 3D reconstruction and image-to-3D generation. To empower consistency, we propose Consistent123 to synthesize novel views simultaneously by incorporating additional cross-view attention layers and the shared self-attention mechanism. The proposed attention mechanism improves the interaction across all synthesized views, as well as the alignment between the condition view and novel views. In the sampling stage, such architecture supports simultaneously generating an arbitrary number of views while training at a fixed length. We also introduce a progressive classifier-free guidance strategy to achieve the trade-off between texture and geometry for synthesized object views. Qualitative and quantitative experiments show that Consistent123 outperforms baselines in view consistency by a large margin. Furthermore, we demonstrate a significant improvement of Consistent123 on varying downstream tasks, showing its great potential in the 3D generation field. The project page is available at consistent-123.github.io. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: For more qualitative results, please see https://consistent-123.github.io/

arXiv:2309.03509 [pdf, other]

BroadCAM: Outcome-agnostic Class Activation Map** for Small-scale Weakly Supervised Applications

Authors: Jiatai Lin, Guoqiang Han, Xuemiao Xu, Changhong Liang, Tien-Tsin Wong, C. L. Philip Chen, Zaiyi Liu, Chu Han

Abstract: Class activation map**~(CAM), a visualization technique for interpreting deep learning models, is now commonly used for weakly supervised semantic segmentation~(WSSS) and object localization~(WSOL). It is the weighted aggregation of the feature maps by activating the high class-relevance ones. Current CAM methods achieve it relying on the training outcomes, such as predicted scores~(forward info… ▽ More Class activation map**~(CAM), a visualization technique for interpreting deep learning models, is now commonly used for weakly supervised semantic segmentation~(WSSS) and object localization~(WSOL). It is the weighted aggregation of the feature maps by activating the high class-relevance ones. Current CAM methods achieve it relying on the training outcomes, such as predicted scores~(forward information), gradients~(backward information), etc. However, when with small-scale data, unstable training may lead to less effective model outcomes and generate unreliable weights, finally resulting in incorrect activation and noisy CAM seeds. In this paper, we propose an outcome-agnostic CAM approach, called BroadCAM, for small-scale weakly supervised applications. Since broad learning system (BLS) is independent to the model learning, BroadCAM can avoid the weights being affected by the unreliable model outcomes when with small-scale data. By evaluating BroadCAM on VOC2012 (natural images) and BCSS-WSSS (medical images) for WSSS and OpenImages30k for WSOL, BroadCAM demonstrates superior performance than existing CAM methods with small-scale data (less than 5\%) in different CNN architectures. It also achieves SOTA performance with large-scale training data. Extensive qualitative comparisons are conducted to demonstrate how BroadCAM activates the high class-relevance feature maps and generates reliable CAMs when with small-scale training data. △ Less

Submitted 7 September, 2023; originally announced September 2023.

arXiv:2308.10162 [pdf, other]

Rethinking Client Drift in Federated Learning: A Logit Perspective

Authors: Yunlu Yan, Chun-Mei Feng, Mang Ye, Wangmeng Zuo, ** Li, Rick Siow Mong Goh, Lei Zhu, C. L. Philip Chen

Abstract: Federated Learning (FL) enables multiple clients to collaboratively learn in a distributed way, allowing for privacy protection. However, the real-world non-IID data will lead to client drift which degrades the performance of FL. Interestingly, we find that the difference in logits between the local and global models increases as the model is continuously updated, thus seriously deteriorating FL p… ▽ More Federated Learning (FL) enables multiple clients to collaboratively learn in a distributed way, allowing for privacy protection. However, the real-world non-IID data will lead to client drift which degrades the performance of FL. Interestingly, we find that the difference in logits between the local and global models increases as the model is continuously updated, thus seriously deteriorating FL performance. This is mainly due to catastrophic forgetting caused by data heterogeneity between clients. To alleviate this problem, we propose a new algorithm, named FedCSD, a Class prototype Similarity Distillation in a federated framework to align the local and global models. FedCSD does not simply transfer global knowledge to local clients, as an undertrained global model cannot provide reliable knowledge, i.e., class similarity information, and its wrong soft labels will mislead the optimization of local models. Concretely, FedCSD introduces a class prototype similarity distillation to align the local logits with the refined global logits that are weighted by the similarity between local logits and the global prototype. To enhance the quality of global logits, FedCSD adopts an adaptive mask to filter out the terrible soft labels of the global models, thereby preventing them to mislead local optimization. Extensive experiments demonstrate the superiority of our method over the state-of-the-art federated learning approaches in various heterogeneous settings. The source code will be released. △ Less

Submitted 20 August, 2023; originally announced August 2023.

Comments: 11 pages, 7 figures

arXiv:2305.15768 [pdf, other]

High-Similarity-Pass Attention for Single Image Super-Resolution

Authors: Jian-Nan Su, Min Gan, Guang-Yong Chen, Wenzhong Guo, C. L. Philip Chen

Abstract: Recent developments in the field of non-local attention (NLA) have led to a renewed interest in self-similarity-based single image super-resolution (SISR). Researchers usually used the NLA to explore non-local self-similarity (NSS) in SISR and achieve satisfactory reconstruction results. However, a surprising phenomenon that the reconstruction performance of the standard NLA is similar to the NLA… ▽ More Recent developments in the field of non-local attention (NLA) have led to a renewed interest in self-similarity-based single image super-resolution (SISR). Researchers usually used the NLA to explore non-local self-similarity (NSS) in SISR and achieve satisfactory reconstruction results. However, a surprising phenomenon that the reconstruction performance of the standard NLA is similar to the NLA with randomly selected regions stimulated our interest to revisit NLA. In this paper, we first analyzed the attention map of the standard NLA from different perspectives and discovered that the resulting probability distribution always has full support for every local feature, which implies a statistical waste of assigning values to irrelevant non-local features, especially for SISR which needs to model long-range dependence with a large number of redundant non-local features. Based on these findings, we introduced a concise yet effective soft thresholding operation to obtain high-similarity-pass attention (HSPA), which is beneficial for generating a more compact and interpretable distribution. Furthermore, we derived some key properties of the soft thresholding operation that enable training our HSPA in an end-to-end manner. The HSPA can be integrated into existing deep SISR models as an efficient general building block. In addition, to demonstrate the effectiveness of the HSPA, we constructed a deep high-similarity-pass attention network (HSPAN) by integrating a few HSPAs in a simple backbone. Extensive experimental results demonstrate that HSPAN outperforms state-of-the-art approaches on both quantitative and qualitative evaluations. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: 13 pages, 12 figures. arXiv admin note: text overlap with arXiv:2212.01057

arXiv:2305.07180 [pdf, other]

doi 10.1109/TMM.2024.3369870

Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual Recognition

Authors: Haiqi Liu, C. L. Philip Chen, Xinrong Gong, Tong Zhang

Abstract: Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision. Existing literature addresses this challenge by employing local-based representation approaches, which may not sufficiently facilitate meaningful object-specific semantic understanding, leading to a reliance on apparent background correlations. Moreover, they primarily rely on hi… ▽ More Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision. Existing literature addresses this challenge by employing local-based representation approaches, which may not sufficiently facilitate meaningful object-specific semantic understanding, leading to a reliance on apparent background correlations. Moreover, they primarily rely on high-dimensional local descriptors to construct complex embedding space, potentially limiting the generalization. To address the above challenges, this article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition. RSaD introduces additional saliency-aware supervision via saliency detection to guide the model toward focusing on the intrinsic discriminative regions. Specifically, RSaD utilizes the saliency detection model to emphasize the critical regions of each sub-category, providing additional object-specific information for fine-grained prediction. RSaD transfers such information with two symmetric branches in a mutual learning paradigm. Furthermore, RSaD exploits inter-regional relationships to enhance the informativeness of the representation and subsequently summarize the highlighted details into contextual embeddings to facilitate the effective transfer, enabling quick generalization to novel sub-categories. The proposed approach is empirically evaluated on three widely used benchmarks, demonstrating its superior performance. △ Less

Submitted 27 February, 2024; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: Accepted by IEEE Transactions on Multimedia

arXiv:2304.00957 [pdf, other]

Properties and Potential Applications of Random Functional-Linked Types of Neural Networks

Authors: Guang-Yong Chen, Yong-Hang Yu, Min Gan, C. L. Philip Chen, Wenzhong Guo

Abstract: Random functional-linked types of neural networks (RFLNNs), e.g., the extreme learning machine (ELM) and broad learning system (BLS), which avoid suffering from a time-consuming training process, offer an alternative way of learning in deep structure. The RFLNNs have achieved excellent performance in various classification and regression tasks, however, the properties and explanations of these net… ▽ More Random functional-linked types of neural networks (RFLNNs), e.g., the extreme learning machine (ELM) and broad learning system (BLS), which avoid suffering from a time-consuming training process, offer an alternative way of learning in deep structure. The RFLNNs have achieved excellent performance in various classification and regression tasks, however, the properties and explanations of these networks are ignored in previous research. This paper gives some insights into the properties of RFLNNs from the viewpoints of frequency domain, and discovers the presence of frequency principle in these networks, that is, they preferentially capture low-frequencies quickly and then fit the high frequency components during the training process. These findings are valuable for understanding the RFLNNs and expanding their applications. Guided by the frequency principle, we propose a method to generate a BLS network with better performance, and design an efficient algorithm for solving Poison's equation in view of the different frequency principle presenting in the Jacobi iterative method and BLS network. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: 9 pages,14 figures

arXiv:2304.00219 [pdf, other]

ConvBLS: An Effective and Efficient Incremental Convolutional Broad Learning System for Image Classification

Authors: Chunyu Lei, C. L. Philip Chen, Jifeng Guo, Tong Zhang

Abstract: Deep learning generally suffers from enormous computational resources and time-consuming training processes. Broad Learning System (BLS) and its convolutional variants have been proposed to mitigate these issues and have achieved superb performance in image classification. However, the existing convolutional-based broad learning system (C-BLS) either lacks an efficient training method and incremen… ▽ More Deep learning generally suffers from enormous computational resources and time-consuming training processes. Broad Learning System (BLS) and its convolutional variants have been proposed to mitigate these issues and have achieved superb performance in image classification. However, the existing convolutional-based broad learning system (C-BLS) either lacks an efficient training method and incremental learning capability or suffers from poor performance. To this end, we propose a convolutional broad learning system (ConvBLS) based on the spherical K-means (SKM) algorithm and two-stage multi-scale (TSMS) feature fusion, which consists of the convolutional feature (CF) layer, convolutional enhancement (CE) layer, TSMS feature fusion layer, and output layer. First, unlike the current C-BLS, the simple yet efficient SKM algorithm is utilized to learn the weights of CF layers. Compared with random filters, the SKM algorithm makes the CF layer learn more comprehensive spatial features. Second, similar to the vanilla BLS, CE layers are established to expand the feature space. Third, the TSMS feature fusion layer is proposed to extract more effective multi-scale features through the integration of CF layers and CE layers. Thanks to the above design and the pseudo-inverse calculation of the output layer weights, our proposed ConvBLS method is unprecedentedly efficient and effective. Finally, the corresponding incremental learning algorithms are presented for rapid remodeling if the model deems to expand. Experiments and comparisons demonstrate the superiority of our method. △ Less

Submitted 1 April, 2023; originally announced April 2023.

arXiv:2302.12662 [pdf, other]

FedDBL: Communication and Data Efficient Federated Deep-Broad Learning for Histopathological Tissue Classification

Authors: Tianpeng Deng, Yanqi Huang, Guoqiang Han, Zhenwei Shi, Jiatai Lin, Qi Dou, Zaiyi Liu, Xiao-**g Guo, C. L. Philip Chen, Chu Han

Abstract: Histopathological tissue classification is a fundamental task in computational pathology. Deep learning-based models have achieved superior performance but centralized training with data centralization suffers from the privacy leakage problem. Federated learning (FL) can safeguard privacy by kee** training samples locally, but existing FL-based frameworks require a large number of well-annotated… ▽ More Histopathological tissue classification is a fundamental task in computational pathology. Deep learning-based models have achieved superior performance but centralized training with data centralization suffers from the privacy leakage problem. Federated learning (FL) can safeguard privacy by kee** training samples locally, but existing FL-based frameworks require a large number of well-annotated training samples and numerous rounds of communication which hinder their practicability in the real-world clinical scenario. In this paper, we propose a universal and lightweight federated learning framework, named Federated Deep-Broad Learning (FedDBL), to achieve superior classification performance with limited training samples and only one-round communication. By simply associating a pre-trained deep learning feature extractor, a fast and lightweight broad learning inference system and a classical federated aggregation approach, FedDBL can dramatically reduce data dependency and improve communication efficiency. Five-fold cross-validation demonstrates that FedDBL greatly outperforms the competitors with only one-round communication and limited training samples, while it even achieves comparable performance with the ones under multiple-round communications. Furthermore, due to the lightweight design and one-round communication, FedDBL reduces the communication burden from 4.6GB to only 276.5KB per client using the ResNet-50 backbone at 50-round training. Since no data or deep model sharing across different clients, the privacy issue is well-solved and the model security is guaranteed with no model inversion attack risk. Code is available at https://github.com/tianpeng-deng/FedDBL. △ Less

Submitted 17 December, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

arXiv:2212.01057 [pdf, other]

Global Learnable Attention for Single Image Super-Resolution

Authors: Jian-Nan Su, Min Gan, Guang-Yong Chen, Jia-Li Yin, C. L. Philip Chen

Abstract: Self-similarity is valuable to the exploration of non-local textures in single image super-resolution (SISR). Researchers usually assume that the importance of non-local textures is positively related to their similarity scores. In this paper, we surprisingly found that when repairing severely damaged query textures, some non-local textures with low-similarity which are closer to the target can pr… ▽ More Self-similarity is valuable to the exploration of non-local textures in single image super-resolution (SISR). Researchers usually assume that the importance of non-local textures is positively related to their similarity scores. In this paper, we surprisingly found that when repairing severely damaged query textures, some non-local textures with low-similarity which are closer to the target can provide more accurate and richer details than the high-similarity ones. In these cases, low-similarity does not mean inferior but is usually caused by different scales or orientations. Utilizing this finding, we proposed a Global Learnable Attention (GLA) to adaptively modify similarity scores of non-local textures during training instead of only using a fixed similarity scoring function such as the dot product. The proposed GLA can explore non-local textures with low-similarity but more accurate details to repair severely damaged textures. Furthermore, we propose to adopt Super-Bit Locality-Sensitive Hashing (SB-LSH) as a preprocessing method for our GLA. With the SB-LSH, the computational complexity of our GLA is reduced from quadratic to asymptotic linear with respect to the image size. In addition, the proposed GLA can be integrated into existing deep SISR models as an efficient general building block. Based on the GLA, we constructed a Deep Learnable Similarity Network (DLSN), which achieves state-of-the-art performance for SISR tasks of different degradation types (e.g. blur and noise). Our code and a pre-trained DLSN have been uploaded to GitHub† for validation. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: 16 pages, 16 figures

arXiv:2206.11133 [pdf, other]

Multi-party Secure Broad Learning System for Privacy Preserving

Authors: Xiao-Kai Cao, Chang-Dong Wang, Jian-Huang Lai, Qiong Huang, C. L. Philip Chen

Abstract: Multi-party learning is an indispensable technique for improving the learning performance via integrating data from multiple parties. Unfortunately, directly integrating multi-party data would not meet the privacy preserving requirements. Therefore, Privacy-Preserving Machine Learning (PPML) becomes a key research task in multi-party learning. In this paper, we present a new PPML method based on s… ▽ More Multi-party learning is an indispensable technique for improving the learning performance via integrating data from multiple parties. Unfortunately, directly integrating multi-party data would not meet the privacy preserving requirements. Therefore, Privacy-Preserving Machine Learning (PPML) becomes a key research task in multi-party learning. In this paper, we present a new PPML method based on secure multi-party interactive protocol, namely Multi-party Secure Broad Learning System (MSBLS), and derive security analysis of the method. The existing PPML methods generally cannot simultaneously meet multiple requirements such as security, accuracy, efficiency and application scope, but MSBLS achieves satisfactory results in these aspects. It uses interactive protocol and random map** to generate the mapped features of data, and then uses efficient broad learning to train neural network classifier. This is the first privacy computing method that combines secure multi-party computing and neural network. Theoretically, this method can ensure that the accuracy of the model will not be reduced due to encryption, and the calculation speed is very fast. We verify this conclusion on three classical datasets. △ Less

Submitted 22 June, 2022; originally announced June 2022.

arXiv:2204.11602 [pdf, other]

Broad Recommender System: An Efficient Nonlinear Collaborative Filtering Approach

Authors: Ling Huang, Can-Rong Guan, Zhen-Wei Huang, Yuefang Gao, Yingjie Kuang, Chang-Dong Wang, C. L. Philip Chen

Abstract: Recently, Deep Neural Networks (DNNs) have been widely introduced into Collaborative Filtering (CF) to produce more accurate recommendation results due to their capability of capturing the complex nonlinear relationships between items and users.However, the DNNs-based models usually suffer from high computational complexity, i.e., consuming very long training time and storing huge amount of traina… ▽ More Recently, Deep Neural Networks (DNNs) have been widely introduced into Collaborative Filtering (CF) to produce more accurate recommendation results due to their capability of capturing the complex nonlinear relationships between items and users.However, the DNNs-based models usually suffer from high computational complexity, i.e., consuming very long training time and storing huge amount of trainable parameters. To address these problems, we propose a new broad recommender system called Broad Collaborative Filtering (BroadCF), which is an efficient nonlinear collaborative filtering approach. Instead of DNNs, Broad Learning System (BLS) is used as a map** function to learn the complex nonlinear relationships between users and items, which can avoid the above issues while achieving very satisfactory recommendation performance. However, it is not feasible to directly feed the original rating data into BLS. To this end, we propose a user-item rating collaborative vector preprocessing procedure to generate low-dimensional user-item input data, which is able to harness quality judgments of the most similar users/items. Extensive experiments conducted on seven benchmark datasets have confirmed the effectiveness of the proposed BroadCF algorithm △ Less

Submitted 24 February, 2024; v1 submitted 19 April, 2022; originally announced April 2022.

arXiv:2201.05782 [pdf, other]

A Novel Multi-Task Learning Method for Symbolic Music Emotion Recognition

Authors: Jibao Qiu, C. L. Philip Chen, Tong Zhang

Abstract: Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML. Previous work mainly focused on learning better representation via (mask) language model pre-training but ignored the intrinsic structure of the music, which is extremely important to the emotional expression of music. In this paper, we present a simple multi-task framework for SMER,… ▽ More Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML. Previous work mainly focused on learning better representation via (mask) language model pre-training but ignored the intrinsic structure of the music, which is extremely important to the emotional expression of music. In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks derived from the intrinsic structure of the music. The results show that our multi-task framework can be adapted to different models. Moreover, the labels of auxiliary tasks are easy to be obtained, which means our multi-task methods do not require manually annotated labels other than emotion. Conducting on two publicly available datasets (EMOPIA and VGMIDI), the experiments show that our methods perform better in SMER task. Specifically, accuracy has been increased by 4.17 absolute point to 67.58 in EMOPIA dataset, and 1.97 absolute point to 55.85 in VGMIDI dataset. Ablation studies also show the effectiveness of multi-task methods designed in this paper. △ Less

Submitted 15 January, 2022; originally announced January 2022.

arXiv:2201.05781 [pdf, other]

OneDConv: Generalized Convolution For Transform-Invariant Representation

Authors: Tong Zhang, Haohan Weng, Ke Yi, C. L. Philip Chen

Abstract: Convolutional Neural Networks (CNNs) have exhibited their great power in a variety of vision tasks. However, the lack of transform-invariant property limits their further applications in complicated real-world scenarios. In this work, we proposed a novel generalized one dimension convolutional operator (OneDConv), which dynamically transforms the convolution kernels based on the input features in… ▽ More Convolutional Neural Networks (CNNs) have exhibited their great power in a variety of vision tasks. However, the lack of transform-invariant property limits their further applications in complicated real-world scenarios. In this work, we proposed a novel generalized one dimension convolutional operator (OneDConv), which dynamically transforms the convolution kernels based on the input features in a computationally and parametrically efficient manner. The proposed operator can extract the transform-invariant features naturally. It improves the robustness and generalization of convolution without sacrificing the performance on common images. The proposed OneDConv operator can substitute the vanilla convolution, thus it can be incorporated into current popular convolutional architectures and trained end-to-end readily. On several popular benchmarks, OneDConv outperforms the original convolution operation and other proposed models both in canonical and distorted images. △ Less

Submitted 15 January, 2022; originally announced January 2022.

arXiv:2201.04777 [pdf, other]

doi 10.1109/TAI.2021.3139058

A Survey on Masked Facial Detection Methods and Datasets for Fighting Against COVID-19

Authors: Bingshu Wang, Jiangbin Zheng, C. L. Philip Chen

Abstract: Coronavirus disease 2019 (COVID-19) continues to pose a great challenge to the world since its outbreak. To fight against the disease, a series of artificial intelligence (AI) techniques are developed and applied to real-world scenarios such as safety monitoring, disease diagnosis, infection risk assessment, lesion segmentation of COVID-19 CT scans,etc. The coronavirus epidemics have forced people… ▽ More Coronavirus disease 2019 (COVID-19) continues to pose a great challenge to the world since its outbreak. To fight against the disease, a series of artificial intelligence (AI) techniques are developed and applied to real-world scenarios such as safety monitoring, disease diagnosis, infection risk assessment, lesion segmentation of COVID-19 CT scans,etc. The coronavirus epidemics have forced people wear masks to counteract the transmission of virus, which also brings difficulties to monitor large groups of people wearing masks. In this paper, we primarily focus on the AI techniques of masked facial detection and related datasets. We survey the recent advances, beginning with the descriptions of masked facial detection datasets. Thirteen available datasets are described and discussed in details. Then, the methods are roughly categorized into two classes: conventional methods and neural network-based methods. Conventional methods are usually trained by boosting algorithms with hand-crafted features, which accounts for a small proportion. Neural network-based methods are further classified as three parts according to the number of processing stages. Representative algorithms are described in detail, coupled with some typical techniques that are described briefly. Finally, we summarize the recent benchmarking results, give the discussions on the limitations of datasets and methods, and expand future research directions. To our knowledge, this is the first survey about masked facial detection methods and datasets. Hopefully our survey could provide some help to fight against epidemics. △ Less

Submitted 12 January, 2022; originally announced January 2022.

Comments: 21 pages, 9 figures, 5 tables. IEEE Transactions on Artificial Intelligence, 2021, early access

arXiv:2112.07934 [pdf, other]

Graph Representation Learning via Contrasting Cluster Assignments

Authors: Chunyang Zhang, Hongyu Yao, C. L. Philip Chen, Yuena Lin

Abstract: With the rise of contrastive learning, unsupervised graph representation learning has been booming recently, even surpassing the supervised counterparts in some machine learning tasks. Most of existing contrastive models for graph representation learning either focus on maximizing mutual information between local and global embeddings, or primarily depend on contrasting embeddings at node level. H… ▽ More With the rise of contrastive learning, unsupervised graph representation learning has been booming recently, even surpassing the supervised counterparts in some machine learning tasks. Most of existing contrastive models for graph representation learning either focus on maximizing mutual information between local and global embeddings, or primarily depend on contrasting embeddings at node level. However, they are still not exquisite enough to comprehensively explore the local and global views of network topology. Although the former considers local-global relationship, its coarse global information leads to grudging cooperation between local and global views. The latter pays attention to node-level feature alignment, so that the role of global view appears inconspicuous. To avoid falling into these two extreme cases, we propose a novel unsupervised graph representation model by contrasting cluster assignments, called as GRCCA. It is motivated to make good use of local and global information synthetically through combining clustering algorithms and contrastive learning. This not only facilitates the contrastive effect, but also provides the more high-quality graph information. Meanwhile, GRCCA further excavates cluster-level information, which make it get insight to the elusive association between nodes beyond graph topology. Specifically, we first generate two augmented graphs with distinct graph augmentation strategies, then employ clustering algorithms to obtain their cluster assignments and prototypes respectively. The proposed GRCCA further compels the identical nodes from different augmented graphs to recognize their cluster assignments mutually by minimizing a cross entropy loss. To demonstrate its effectiveness, we compare with the state-of-the-art models in three different downstream tasks. The experimental results show that GRCCA has strong competitiveness in most tasks. △ Less

Submitted 15 December, 2021; originally announced December 2021.

arXiv:2111.07722 [pdf, other]

Stacked BNAS: Rethinking Broad Convolutional Neural Network for Neural Architecture Search

Authors: Zixiang Ding, Yaran Chen, Nannan Li, Dongbin Zhao, C. L. Philip Chen

Abstract: Different from other deep scalable architecture-based NAS approaches, Broad Neural Architecture Search (BNAS) proposes a broad scalable architecture which consists of convolution and enhancement blocks, dubbed Broad Convolutional Neural Network (BCNN), as the search space for amazing efficiency improvement. BCNN reuses the topologies of cells in the convolution block so that BNAS can employ few ce… ▽ More Different from other deep scalable architecture-based NAS approaches, Broad Neural Architecture Search (BNAS) proposes a broad scalable architecture which consists of convolution and enhancement blocks, dubbed Broad Convolutional Neural Network (BCNN), as the search space for amazing efficiency improvement. BCNN reuses the topologies of cells in the convolution block so that BNAS can employ few cells for efficient search. Moreover, multi-scale feature fusion and knowledge embedding are proposed to improve the performance of BCNN with shallow topology. However, BNAS suffers some drawbacks: 1) insufficient representation diversity for feature fusion and enhancement and 2) time consumption of knowledge embedding design by human experts. This paper proposes Stacked BNAS, whose search space is a developed broad scalable architecture named Stacked BCNN, with better performance than BNAS. On the one hand, Stacked BCNN treats mini BCNN as a basic block to preserve comprehensive representation and deliver powerful feature extraction ability. For multi-scale feature enhancement, each mini BCNN feeds the outputs of deep and broad cells to the enhancement cell. For multi-scale feature fusion, each mini BCNN feeds the outputs of deep, broad and enhancement cells to the output node. On the other hand, Knowledge Embedding Search (KES) is proposed to learn appropriate knowledge embeddings in a differentiable way. Moreover, the basic unit of KES is an over-parameterized knowledge embedding module that consists of all possible candidate knowledge embeddings. Experimental results show that 1) Stacked BNAS obtains better performance than BNAS-v2 on both CIFAR-10 and ImageNet, 2) the proposed KES algorithm contributes to reducing the parameters of the learned architecture with satisfactory performance, and 3) Stacked BNAS delivers a state-of-the-art efficiency of 0.02 GPU days. △ Less

Submitted 1 August, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

Comments: 12 pages, 10 figures, 5 tables

arXiv:2103.00200 [pdf, other]

Siamese Labels Auxiliary Learning

Authors: Wenrui Gan, Zhulin Liu, C. L. Philip Chen, Tong Zhang

Abstract: In deep learning, auxiliary training has been widely used to assist the training of models. During the training phase, using auxiliary modules to assist training can improve the performance of the model. During the testing phase, auxiliary modules can be removed, so the test parameters are not increased. In this paper, we propose a novel auxiliary training method, Siamese Labels Auxiliary Learning… ▽ More In deep learning, auxiliary training has been widely used to assist the training of models. During the training phase, using auxiliary modules to assist training can improve the performance of the model. During the testing phase, auxiliary modules can be removed, so the test parameters are not increased. In this paper, we propose a novel auxiliary training method, Siamese Labels Auxiliary Learning (SiLa). Unlike Deep Mutual Learning (DML), SiLa emphasizes auxiliary learning and can be easily combined with DML. In general, the main work of this paper include: (1) propose SiLa Learning, which improves the performance of common models without increasing test parameters; (2) compares SiLa with DML and proves that SiLa can improve the generalization of the model; (3) SiLa is applied to Dynamic Neural Networks, and proved that SiLa can be used for various types of network structures. △ Less

Submitted 26 May, 2022; v1 submitted 27 February, 2021; originally announced March 2021.

arXiv:2012.04280 [pdf, other]

Towards Uncovering the Intrinsic Data Structures for Unsupervised Domain Adaptation using Structurally Regularized Deep Clustering

Authors: Hui Tang, Xiatian Zhu, Ke Chen, Kui Jia, C. L. Philip Chen

Abstract: Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain, given labeled data on a source domain whose distribution diverges from the target one. Mainstream UDA methods strive to learn domain-aligned features such that classifiers trained on the source features can be readily applied to the target ones. Although impressive re… ▽ More Unsupervised domain adaptation (UDA) is to learn classification models that make predictions for unlabeled data on a target domain, given labeled data on a source domain whose distribution diverges from the target one. Mainstream UDA methods strive to learn domain-aligned features such that classifiers trained on the source features can be readily applied to the target ones. Although impressive results have been achieved, these methods have a potential risk of damaging the intrinsic data structures of target discrimination, raising an issue of generalization particularly for UDA tasks in an inductive setting. To address this issue, we are motivated by a UDA assumption of structural similarity across domains, and propose to directly uncover the intrinsic target discrimination via constrained clustering, where we constrain the clustering solutions using structural source regularization that hinges on the very same assumption. Technically, we propose a hybrid model of Structurally Regularized Deep Clustering, which integrates the regularized discriminative clustering of target data with a generative one, and we thus term our method as H-SRDC. Our hybrid model is based on a deep clustering framework that minimizes the Kullback-Leibler divergence between the distribution of network prediction and an auxiliary one, where we impose structural regularization by learning domain-shared classifier and cluster centroids. By enriching the structural similarity assumption, we are able to extend H-SRDC for a pixel-level UDA task of semantic segmentation. We conduct extensive experiments on seven UDA benchmarks of image classification and semantic segmentation. With no explicit feature alignment, our proposed H-SRDC outperforms all the existing methods under both the inductive and transductive settings. We make our implementation codes publicly available at https://github.com/huitangtang/H-SRDC. △ Less

Submitted 7 April, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: Journal extension of our preliminary CVPR conference paper, under review, 16 pages, 8 figures, 9 tables

arXiv:2003.09799 [pdf, other]

Modal Regression based Structured Low-rank Matrix Recovery for Multi-view Learning

Authors: Jiamiao Xu, Fangzhao Wang, Qinmu Peng, Xinge You, Shuo Wang, Xiao-Yuan **g, C. L. Philip Chen

Abstract: Low-rank Multi-view Subspace Learning (LMvSL) has shown great potential in cross-view classification in recent years. Despite their empirical success, existing LMvSL based methods are incapable of well handling view discrepancy and discriminancy simultaneously, which thus leads to the performance degradation when there is a large discrepancy among multi-view data. To circumvent this drawback, moti… ▽ More Low-rank Multi-view Subspace Learning (LMvSL) has shown great potential in cross-view classification in recent years. Despite their empirical success, existing LMvSL based methods are incapable of well handling view discrepancy and discriminancy simultaneously, which thus leads to the performance degradation when there is a large discrepancy among multi-view data. To circumvent this drawback, motivated by the block-diagonal representation learning, we propose Structured Low-rank Matrix Recovery (SLMR), a unique method of effectively removing view discrepancy and improving discriminancy through the recovery of structured low-rank matrix. Furthermore, recent low-rank modeling provides a satisfactory solution to address data contaminated by predefined assumptions of noise distribution, such as Gaussian or Laplacian distribution. However, these models are not practical since complicated noise in practice may violate those assumptions and the distribution is generally unknown in advance. To alleviate such limitation, modal regression is elegantly incorporated into the framework of SLMR (term it MR-SLMR). Different from previous LMvSL based methods, our MR-SLMR can handle any zero-mode noise variable that contains a wide range of noise, such as Gaussian noise, random noise and outliers. The alternating direction method of multipliers (ADMM) framework and half-quadratic theory are used to efficiently optimize MR-SLMR. Experimental results on four public databases demonstrate the superiority of MR-SLMR and its robustness to complicated noise. △ Less

Submitted 21 March, 2020; originally announced March 2020.

Comments: This article has been accepted by IEEE Transactions on Neural Networks and Learning Systems

arXiv:2001.06679 [pdf, other]

doi 10.1109/TNNLS.2021.3067028

BNAS:An Efficient Neural Architecture Search Approach Using Broad Scalable Architecture

Authors: Zixiang Ding, Yaran Chen, Nannan Li, Dongbin Zhao, Zhiquan Sun, C. L. Philip Chen

Abstract: In this paper, we propose Broad Neural Architecture Search (BNAS) where we elaborately design broad scalable architecture dubbed Broad Convolutional Neural Network (BCNN) to solve the above issue. On one hand, the proposed broad scalable architecture has fast training speed due to its shallow topology. Moreover, we also adopt reinforcement learning and parameter sharing used in ENAS as the optimiz… ▽ More In this paper, we propose Broad Neural Architecture Search (BNAS) where we elaborately design broad scalable architecture dubbed Broad Convolutional Neural Network (BCNN) to solve the above issue. On one hand, the proposed broad scalable architecture has fast training speed due to its shallow topology. Moreover, we also adopt reinforcement learning and parameter sharing used in ENAS as the optimization strategy of BNAS. Hence, the proposed approach can achieve higher search efficiency. On the other hand, the broad scalable architecture extracts multi-scale features and enhancement representations, and feeds them into global average pooling layer to yield more reasonable and comprehensive representations. Therefore, the performance of broad scalable architecture can be promised. In particular, we also develop two variants for BNAS who modify the topology of BCNN. In order to verify the effectiveness of BNAS, several experiments are performed and experimental results show that 1) BNAS delivers 0.19 days which is 2.37x less expensive than ENAS who ranks the best in reinforcement learning-based NAS approaches, 2) compared with small-size (0.5 millions parameters) and medium-size (1.1 millions parameters) models, the architecture learned by BNAS obtains state-of-the-art performance (3.58% and 3.24% test error) on CIFAR-10, 3) the learned architecture achieves 25.3% top-1 error on ImageNet just using 3.9 millions parameters. △ Less

Submitted 20 January, 2021; v1 submitted 18 January, 2020; originally announced January 2020.

Comments: 15 pages, 12 figures, 5 tables

arXiv:1912.11171 [pdf, other]

Geometry-Aware Generation of Adversarial Point Clouds

Authors: Yuxin Wen, Jiehong Lin, Ke Chen, C. L. Philip Chen, Kui Jia

Abstract: Machine learning models have been shown to be vulnerable to adversarial examples. While most of the existing methods for adversarial attack and defense work on the 2D image domain, a few recent attempts have been made to extend them to 3D point cloud data. However, adversarial results obtained by these methods typically contain point outliers, which are both noticeable and easy to defend against u… ▽ More Machine learning models have been shown to be vulnerable to adversarial examples. While most of the existing methods for adversarial attack and defense work on the 2D image domain, a few recent attempts have been made to extend them to 3D point cloud data. However, adversarial results obtained by these methods typically contain point outliers, which are both noticeable and easy to defend against using the simple techniques of outlier removal. Motivated by the different mechanisms by which humans perceive 2D images and 3D shapes, in this paper we propose the new design of \emph{geometry-aware objectives}, whose solutions favor (the discrete versions of) the desired surface properties of smoothness and fairness. To generate adversarial point clouds, we use a targeted attack misclassification loss that supports continuous pursuit of increasingly malicious signals. Regularizing the targeted attack loss with our proposed geometry-aware objectives results in our proposed method, Geometry-Aware Adversarial Attack ($GeoA^3$). The results of $GeoA^3$ tend to be more harmful, arguably harder to defend against, and of the key adversarial characterization of being imperceptible to humans. While the main focus of this paper is to learn to generate adversarial point clouds, we also present a simple but effective algorithm termed $Geo_{+}A^3$-IterNormPro, with Iterative Normal Projection (IterNorPro) that solves a new objective function $Geo_{+}A^3$, towards surface-level adversarial attacks via generation of adversarial point clouds. We quantitatively evaluate our methods on both synthetic and physical objects in terms of attack success rate and geometric regularity. For a qualitative evaluation, we conduct subjective studies by collecting human preferences from Amazon Mechanical Turk. Comparative results in comprehensive experiments confirm the advantages of our proposed methods. △ Less

Submitted 13 December, 2020; v1 submitted 23 December, 2019; originally announced December 2019.

arXiv:1910.07755 [pdf, ps, other]

Reducing the Computational Complexity of Pseudoinverse for the Incremental Broad Learning System on Added Inputs

Authors: Hufei Zhu, Zhulin Liu, C. L. Philip Chen, Yanyang Liang

Abstract: In this brief, we improve the Broad Learning System (BLS) [7] by reducing the computational complexity of the incremental learning for added inputs. We utilize the inverse of a sum of matrices in [8] to improve a step in the pseudoinverse of a row-partitioned matrix. Accordingly we propose two fast algorithms for the cases of q > k and q < k, respectively, where q and k denote the number of additi… ▽ More In this brief, we improve the Broad Learning System (BLS) [7] by reducing the computational complexity of the incremental learning for added inputs. We utilize the inverse of a sum of matrices in [8] to improve a step in the pseudoinverse of a row-partitioned matrix. Accordingly we propose two fast algorithms for the cases of q > k and q < k, respectively, where q and k denote the number of additional training samples and the total number of nodes, respectively. Specifically, when q > k, the proposed algorithm computes only a k * k matrix inverse, instead of a q * q matrix inverse in the existing algorithm. Accordingly it can reduce the complexity dramatically. Our simulations, which follow those for Table V in [7], show that the proposed algorithm and the existing algorithm achieve the same testing accuracy, while the speedups in BLS training time of the proposed algorithm over the existing algorithm are 1.24 - 1.30. △ Less

Submitted 18 November, 2022; v1 submitted 17 October, 2019; originally announced October 2019.

arXiv:1909.03204 [pdf, other]

Multi Pseudo Q-learning Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

Authors: Wenjie Shi, Shiji Song, Cheng Wu, C. L. Philip Chen

Abstract: This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor-critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of… ▽ More This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor-critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively. Specifically, for the critics, the expected absolute Bellman error based updating rule is used to choose the worst critic to be updated in each time step. Subsequently, to calculate the loss function with more accurate target value for the chosen critic, Pseudo Q-learning, which uses sub-greedy policy to replace the greedy policy in Q-learning, is developed for continuous action spaces, and Multi Pseudo Q-learning (MPQ) is proposed to reduce the overestimation of action-value function and to stabilize the learning. As for the actors, deterministic policy gradient is applied to update the weights, and the final learned policy is defined as the average of all actors to avoid large but bad updates. Moreover, the stability analysis of the learning is given qualitatively. The effectiveness and generality of the proposed MPQ-based Deterministic Policy Gradient (MPQ-DPG) algorithm are verified by the application on AUV with two different reference trajectories. And the results demonstrate high-level tracking control accuracy and stable learning of MPQ-DPG. Besides, the results also validate that increasing the number of the actors and critics will further improve the performance. △ Less

Submitted 7 September, 2019; originally announced September 2019.

Comments: IEEE Transactions on Neural Networks and Learning Systems

arXiv:1905.07503 [pdf, ps, other]

3DViewGraph: Learning Global Features for 3D Shapes from A Graph of Unordered Views with Attention

Authors: Zhizhong Han, Xiyang Wang, Chi-Man Vong, Yu-Shen Liu, Matthias Zwicker, C. L. Philip Chen

Abstract: Learning global features by aggregating information over multiple views has been shown to be effective for 3D shape analysis. For view aggregation in deep learning models, pooling has been applied extensively. However, pooling leads to a loss of the content within views, and the spatial relationship among views, which limits the discriminability of learned features. We propose 3DViewGraph to resol… ▽ More Learning global features by aggregating information over multiple views has been shown to be effective for 3D shape analysis. For view aggregation in deep learning models, pooling has been applied extensively. However, pooling leads to a loss of the content within views, and the spatial relationship among views, which limits the discriminability of learned features. We propose 3DViewGraph to resolve this issue, which learns 3D global features by more effectively aggregating unordered views with attention. Specifically, unordered views taken around a shape are regarded as view nodes on a view graph. 3DViewGraph first learns a novel latent semantic map** to project low-level view features into meaningful latent semantic embeddings in a lower dimensional space, which is spanned by latent semantic patterns. Then, the content and spatial information of each pair of view nodes are encoded by a novel spatial pattern correlation, where the correlation is computed among latent semantic patterns. Finally, all spatial pattern correlations are integrated with attention weights learned by a novel attention mechanism. This further increases the discriminability of learned features by highlighting the unordered view nodes with distinctive characteristics and depressing the ones with appearance ambiguity. We show that 3DViewGraph outperforms state-of-the-art methods under three large-scale benchmarks. △ Less

Submitted 17 May, 2019; originally announced May 2019.

Comments: To appear at IJCAI2019

arXiv:1804.07237 [pdf, other]

Multi-view Hybrid Embedding: A Divide-and-Conquer Approach

Authors: Jiamiao Xu, Shujian Yu, Xinge You, Mengjun Leng, Xiao-Yuan **g, C. L. Philip Chen

Abstract: We present a novel cross-view classification algorithm where the gallery and probe data come from different views. A popular approach to tackle this problem is the multi-view subspace learning (MvSL) that aims to learn a latent subspace shared by multi-view data. Despite promising results obtained on some applications, the performance of existing methods deteriorates dramatically when the multi-vi… ▽ More We present a novel cross-view classification algorithm where the gallery and probe data come from different views. A popular approach to tackle this problem is the multi-view subspace learning (MvSL) that aims to learn a latent subspace shared by multi-view data. Despite promising results obtained on some applications, the performance of existing methods deteriorates dramatically when the multi-view data is sampled from nonlinear manifolds or suffers from heavy outliers. To circumvent this drawback, motivated by the Divide-and-Conquer strategy, we propose Multi-view Hybrid Embedding (MvHE), a unique method of dividing the problem of cross-view classification into three subproblems and building one model for each subproblem. Specifically, the first model is designed to remove view discrepancy, whereas the second and third models attempt to discover the intrinsic nonlinear structure and to increase discriminability in intra-view and inter-view samples respectively. The kernel extension is conducted to further boost the representation power of MvHE. Extensive experiments are conducted on four benchmark datasets. Our methods demonstrate overwhelming advantages against the state-of-the-art MvSL based cross-view classification approaches in terms of classification accuracy and robustness. △ Less

Submitted 21 January, 2019; v1 submitted 19 April, 2018; originally announced April 2018.

Comments: This paper has been accepted by IEEE Transactions on Cybernetics

arXiv:1803.01097 [pdf, other]

doi 10.1109/TEVC.2018.2874465

A Many-Objective Evolutionary Algorithm With Two Interacting Processes: Cascade Clustering and Reference Point Incremental Learning

Authors: Hongwei Ge, Mingde Zhao, Liang Sun, Zhen Wang, Guozhen Tan, Qiang Zhang, C. L. Philip Chen

Abstract: Researches have shown difficulties in obtaining proximity while maintaining diversity for many-objective optimization problems. Complexities of the true Pareto front pose challenges for the reference vector-based algorithms for their insufficient adaptability to the diverse characteristics with no priori. This paper proposes a many-objective optimization algorithm with two interacting processes: c… ▽ More Researches have shown difficulties in obtaining proximity while maintaining diversity for many-objective optimization problems. Complexities of the true Pareto front pose challenges for the reference vector-based algorithms for their insufficient adaptability to the diverse characteristics with no priori. This paper proposes a many-objective optimization algorithm with two interacting processes: cascade clustering and reference point incremental learning (CLIA). In the population selection process based on cascade clustering (CC), using the reference vectors provided by the process based on incremental learning, the nondominated and the dominated individuals are clustered and sorted with different manners in a cascade style and are selected by round-robin for better proximity and diversity. In the reference vector adaptation process based on reference point incremental learning, using the feedbacks from the process based on CC, proper distribution of reference points is gradually obtained by incremental learning. Experimental studies on several benchmark problems show that CLIA is competitive compared with the state-of-the-art algorithms and has impressive efficiency and versatility using only the interactions between the two processes without incurring extra evaluations. △ Less

Submitted 20 August, 2019; v1 submitted 2 March, 2018; originally announced March 2018.

Report number: IEEE Transactions on Evolutionary Computation (Volume: 23, Issue: 4, Aug. 2019)

Journal ref: IEEE Transactions on Evolutionary Computation, 2019

arXiv:1705.10596 [pdf, ps, other]

doi 10.1109/ICCSS.2016.7586421

Approximation learning methods of Harmonic Map**s in relation to Hardy Spaces

Authors: Zhulin Liu, C. L. Philip Chen

Abstract: A new Hardy space Hardy space approach of Dirichlet type problem based on Tikhonov regularization and Reproducing Hilbert kernel space is discussed in this paper, which turns out to be a typical extremal problem located on the upper upper-high complex plane. If considering this in the Hardy space, the optimization operator of this problem will be highly simplified and an efficient algorithm is pos… ▽ More A new Hardy space Hardy space approach of Dirichlet type problem based on Tikhonov regularization and Reproducing Hilbert kernel space is discussed in this paper, which turns out to be a typical extremal problem located on the upper upper-high complex plane. If considering this in the Hardy space, the optimization operator of this problem will be highly simplified and an efficient algorithm is possible. This is mainly realized by the help of reproducing properties of the functions in the Hardy space of upper-high complex plane, and the detail algorithm is proposed. Moreover, harmonic map**s, which is a significant geometric transformation, are commonly used in many applications such as image processing, since it describes the energy minimization map**s between individual manifolds. Particularly, when we focus on the planer map**s between two Euclid planer regions, the harmonic map**s are exist and unique, which is guaranteed solidly by the existence of harmonic function. This property is attractive and simulation results are shown in this paper to ensure the capability of applications such as planer shape distortion and surface registration. △ Less

Submitted 24 May, 2017; originally announced May 2017.

Comments: 2016 3rd International Conference on Informative and Cybernetics for Computational Social Systems (ICCSS)

arXiv:1204.2310 [pdf, other]

doi 10.1016/j.ins.2013.11.027

A Novel Latin Square Image Cipher

Authors: Yue Wu, Yicong Zhou, Joseph P. Noonan, Sos Agaian, C. L. Philip Chen

Abstract: In this paper, we introduce a symmetric-key Latin square image cipher (LSIC) for grayscale and color images. Our contributions to the image encryption community include 1) we develop new Latin square image encryption primitives including Latin Square Whitening, Latin Square S-box and Latin Square P-box ; 2) we provide a new way of integrating probabilistic encryption in image encryption by embeddi… ▽ More In this paper, we introduce a symmetric-key Latin square image cipher (LSIC) for grayscale and color images. Our contributions to the image encryption community include 1) we develop new Latin square image encryption primitives including Latin Square Whitening, Latin Square S-box and Latin Square P-box ; 2) we provide a new way of integrating probabilistic encryption in image encryption by embedding random noise in the least significant image bit-plane; and 3) we construct LSIC with these Latin square image encryption primitives all on one keyed Latin square in a new loom-like substitution-permutation network. Consequently, the proposed LSIC achieve many desired properties of a secure cipher including a large key space, high key sensitivities, uniformly distributed ciphertext, excellent confusion and diffusion properties, semantically secure, and robustness against channel noise. Theoretical analysis show that the LSIC has good resistance to many attack models including brute-force attacks, ciphertext-only attacks, known-plaintext attacks and chosen-plaintext attacks. Experimental analysis under extensive simulation results using the complete USC-SIPI Miscellaneous image dataset demonstrate that LSIC outperforms or reach state of the art suggested by many peer algorithms. All these analysis and results demonstrate that the LSIC is very suitable for digital image encryption. Finally, we open source the LSIC MATLAB code under webpage https://sites.google.com/site/tuftsyuewu/source-code. △ Less

Submitted 10 April, 2012; originally announced April 2012.

Comments: 26 pages, 17 figures, and 7 tables

Journal ref: Information Sciences 264 (2014): 317-339

arXiv:cond-mat/0607805 [pdf, ps, other]

doi 10.1103/PhysRevLett.98.036404

Inelastic X-Ray Scattering Study of Exciton Properties in an Organic Molecular crystal

Authors: K. Yang, L. P. Chen, Y. Q. Cai, N. Hiraoka, S. Li, J. F. Zhao, D. W. Shen, H. F. Song, H. Tian, L. H. Bai, Z. H. Chen, Z. G. Shuai, D. L. Feng

Abstract: Excitons in a complex organic molecular crystal were studied by inelastic x-ray scattering (IXS) for the first time. The dynamic dielectric response function is measured over a large momentum transfer region, from which an exciton dispersion of 130 meV is observed. Semiempirical quantum chemical calculations reproduce well the momentum dependence of the measured dynamic dielectric responses, and… ▽ More Excitons in a complex organic molecular crystal were studied by inelastic x-ray scattering (IXS) for the first time. The dynamic dielectric response function is measured over a large momentum transfer region, from which an exciton dispersion of 130 meV is observed. Semiempirical quantum chemical calculations reproduce well the momentum dependence of the measured dynamic dielectric responses, and thus unambiguously indicate that the lowest Frenkel exciton is confined within a fraction of the complex molecule. Our results demonstrate that IXS is a powerful tool for studying excitons in complex organic molecular systems. Besides the energy position, the IXS spectra provide a stringent test on the validity of the theoretically calculated exciton wave functions. △ Less

Submitted 21 February, 2007; v1 submitted 31 July, 2006; originally announced July 2006.

Comments: 4 pages, 4 figures

Journal ref: Physical review Letters 98, 036404 (2007)

Showing 1–37 of 37 results for author: Chen, L P