Search | arXiv e-print repository

Symmetric Multi-Similarity Loss for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2024

Authors: Xiaoqi Wang, Yi Wang, Lap-Pui Chau

Abstract: In this report, we present our champion solution for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge in CVPR 2024. Essentially, this challenge differs from traditional visual-text retrieval tasks by providing a correlation matrix that acts as a set of soft labels for video-text clip combinations. However, existing loss functions have not fully exploited this information. Motivated by this, we… ▽ More In this report, we present our champion solution for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge in CVPR 2024. Essentially, this challenge differs from traditional visual-text retrieval tasks by providing a correlation matrix that acts as a set of soft labels for video-text clip combinations. However, existing loss functions have not fully exploited this information. Motivated by this, we propose a novel loss function, Symmetric Multi-Similarity Loss, which offers a more precise learning objective. Together with tricks and ensemble learning, the model achieves 63.76% average mAP and 74.25% average nDCG on the public leaderboard, demonstrating the effectiveness of our approach. Our code will be released at: https://github.com/xqwang14/SMS-Loss/tree/main △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: The champion solution for Epic-Kitchen-100 Multi-Instance Retrieval Challenge

arXiv:2405.05173 [pdf, other]

A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective

Authors: Huaiyuan Xu, Junliang Chen, Shiyu Meng, Yi Wang, Lap-Pui Chau

Abstract: 3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Owing to its comprehensive perception capability, this technology is emerging as a trend in autonomous driving perception systems, and is attracting significant attention from both industry and academia. Similar to traditional bird's-eye view (BEV) perception, 3D occupancy perception ha… ▽ More 3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Owing to its comprehensive perception capability, this technology is emerging as a trend in autonomous driving perception systems, and is attracting significant attention from both industry and academia. Similar to traditional bird's-eye view (BEV) perception, 3D occupancy perception has the nature of multi-source input and the necessity for information fusion. However, the difference is that it captures vertical structures that are ignored by 2D BEV. In this survey, we review the most recent works on 3D occupancy perception, and provide in-depth analyses of methodologies with various input modalities. Specifically, we summarize general network pipelines, highlight information fusion techniques, and discuss effective network training. We evaluate and analyze the occupancy perception performance of the state-of-the-art on the most popular datasets. Furthermore, challenges and future research directions are discussed. We hope this paper will inspire the community and encourage more research work on 3D occupancy perception. A comprehensive list of studies in this survey is publicly available in an active repository that continuously collects the latest work: https://github.com/HuaiyuanXu/3D-Occupancy-Perception. △ Less

Submitted 18 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

arXiv:2404.04669 [pdf, other]

Domain Generalisation via Imprecise Learning

Authors: Anurag Singh, Siu Lun Chau, Shahine Bouabid, Krikamol Muandet

Abstract: Out-of-distribution (OOD) generalisation is challenging because it involves not only learning from empirical data, but also deciding among various notions of generalisation, e.g., optimising the average-case risk, worst-case risk, or interpolations thereof. While this choice should in principle be made by the model operator like medical doctors, this information might not always be available at tr… ▽ More Out-of-distribution (OOD) generalisation is challenging because it involves not only learning from empirical data, but also deciding among various notions of generalisation, e.g., optimising the average-case risk, worst-case risk, or interpolations thereof. While this choice should in principle be made by the model operator like medical doctors, this information might not always be available at training time. The institutional separation between machine learners and model operators leads to arbitrary commitments to specific generalisation strategies by machine learners due to these deployment uncertainties. We introduce the Imprecise Domain Generalisation framework to mitigate this, featuring an imprecise risk optimisation that allows learners to stay imprecise by optimising against a continuous spectrum of generalisation strategies during training, and a model framework that allows operators to specify their generalisation preference at deployment. Supported by both theoretical and empirical evidence, our work showcases the benefits of integrating imprecision into domain generalisation. △ Less

Submitted 30 May, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

arXiv:2403.15751 [pdf, other]

AOCIL: Exemplar-free Analytic Online Class Incremental Learning with Low Time and Resource Consumption

Authors: Hui** Zhuang, Yuchen Liu, Run He, Kai Tong, Ziqian Zeng, Cen Chen, Yi Wang, Lap-Pui Chau

Abstract: Online Class Incremental Learning (OCIL) aims to train the model in a task-by-task manner, where data arrive in mini-batches at a time while previous data are not accessible. A significant challenge is known as Catastrophic Forgetting, i.e., loss of the previous knowledge on old data. To address this, replay-based methods show competitive results but invade data privacy, while exemplar-free method… ▽ More Online Class Incremental Learning (OCIL) aims to train the model in a task-by-task manner, where data arrive in mini-batches at a time while previous data are not accessible. A significant challenge is known as Catastrophic Forgetting, i.e., loss of the previous knowledge on old data. To address this, replay-based methods show competitive results but invade data privacy, while exemplar-free methods protect data privacy but struggle for accuracy. In this paper, we proposed an exemplar-free approach -- Analytic Online Class Incremental Learning (AOCIL). Instead of back-propagation, we design the Analytic Classifier (AC) updated by recursive least square, cooperating with a frozen backbone. AOCIL simultaneously achieves high accuracy, low resource consumption and data privacy protection. We conduct massive experiments on four existing benchmark datasets, and the results demonstrate the strong capability of handling OCIL scenarios. Codes will be ready. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2311.14760 [pdf, other]

SinSR: Diffusion-Based Image Super-Resolution in a Single Step

Authors: Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C. Kot, Bihan Wen

Abstract: While super-resolution (SR) methods based on diffusion models exhibit promising results, their practical application is hindered by the substantial number of required inference steps. Recent methods utilize degraded images in the initial state, thereby shortening the Markov chain. Nevertheless, these solutions either rely on a precise formulation of the degradation process or still necessitate a r… ▽ More While super-resolution (SR) methods based on diffusion models exhibit promising results, their practical application is hindered by the substantial number of required inference steps. Recent methods utilize degraded images in the initial state, thereby shortening the Markov chain. Nevertheless, these solutions either rely on a precise formulation of the degradation process or still necessitate a relatively lengthy generation path (e.g., 15 iterations). To enhance inference speed, we propose a simple yet effective method for achieving single-step SR generation, named SinSR. Specifically, we first derive a deterministic sampling process from the most recent state-of-the-art (SOTA) method for accelerating diffusion-based SR. This allows the map** between the input random noise and the generated high-resolution image to be obtained in a reduced and acceptable number of inference steps during training. We show that this deterministic map** can be distilled into a student model that performs SR within only one inference step. Additionally, we propose a novel consistency-preserving loss to simultaneously leverage the ground-truth image during the distillation process, ensuring that the performance of the student model is not solely bound by the feature manifold of the teacher model, resulting in further performance improvement. Extensive experiments conducted on synthetic and real-world datasets demonstrate that the proposed method can achieve comparable or even superior performance compared to both previous SOTA methods and the teacher model, in just one sampling step, resulting in a remarkable up to x10 speedup for inference. Our code will be released at https://github.com/wyf0912/SinSR △ Less

Submitted 23 November, 2023; originally announced November 2023.

arXiv:2310.17273 [pdf, other]

Loo** in the Human Collaborative and Explainable Bayesian Optimization

Authors: Masaki Adachi, Brady Planden, David A. Howey, Michael A. Osborne, Sebastian Orbell, Natalia Ares, Krikamol Muandet, Siu Lun Chau

Abstract: Like many optimizers, Bayesian optimization often falls short of gaining user trust due to opacity. While attempts have been made to develop human-centric optimizers, they typically assume user knowledge is well-specified and error-free, employing users mainly as supervisors of the optimization process. We relax these assumptions and propose a more balanced human-AI partnership with our Collaborat… ▽ More Like many optimizers, Bayesian optimization often falls short of gaining user trust due to opacity. While attempts have been made to develop human-centric optimizers, they typically assume user knowledge is well-specified and error-free, employing users mainly as supervisors of the optimization process. We relax these assumptions and propose a more balanced human-AI partnership with our Collaborative and Explainable Bayesian Optimization (CoExBO) framework. Instead of explicitly requiring a user to provide a knowledge model, CoExBO employs preference learning to seamlessly integrate human insights into the optimization, resulting in algorithmic suggestions that resonate with user preference. CoExBO explains its candidate selection every iteration to foster trust, empowering users with a clearer grasp of the optimization. Furthermore, CoExBO offers a no-harm guarantee, allowing users to make mistakes; even with extreme adversarial interventions, the algorithm converges asymptotically to a vanilla Bayesian optimization. We validate CoExBO's efficacy through human-AI teaming experiments in lithium-ion battery design, highlighting substantial improvements over conventional methods. Code is available https://github.com/ma921/CoExBO. △ Less

Submitted 29 February, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: Accepted at AISTATS 2024, 24 pages, 11 figures

MSC Class: 62C10; 62F15

arXiv:2309.13890 [pdf, other]

Bitstream-Corrupted Video Recovery: A Novel Benchmark Dataset and Method

Authors: Tianyi Liu, Kejun Wu, Yi Wang, Wenyang Liu, Kim-Hui Yap, Lap-Pui Chau

Abstract: The past decade has witnessed great strides in video recovery by specialist technologies, like video inpainting, completion, and error concealment. However, they typically simulate the missing content by manual-designed error masks, thus failing to fill in the realistic video loss in video communication (e.g., telepresence, live streaming, and internet video) and multimedia forensics. To address t… ▽ More The past decade has witnessed great strides in video recovery by specialist technologies, like video inpainting, completion, and error concealment. However, they typically simulate the missing content by manual-designed error masks, thus failing to fill in the realistic video loss in video communication (e.g., telepresence, live streaming, and internet video) and multimedia forensics. To address this, we introduce the bitstream-corrupted video (BSCV) benchmark, the first benchmark dataset with more than 28,000 video clips, which can be used for bitstream-corrupted video recovery in the real world. The BSCV is a collection of 1) a proposed three-parameter corruption model for video bitstream, 2) a large-scale dataset containing rich error patterns, multiple corruption levels, and flexible dataset branches, and 3) a plug-and-play module in video recovery framework that serves as a benchmark. We evaluate state-of-the-art video inpainting methods on the BSCV dataset, demonstrating existing approaches' limitations and our framework's advantages in solving the bitstream-corrupted video recovery problem. The benchmark and dataset are released at https://github.com/LIUTIGHE/BSCV-Dataset. △ Less

Submitted 26 September, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: Accepted by NeurIPS Dataset and Benchmark Track 2023

arXiv:2308.16262 [pdf, other]

Causal Strategic Learning with Competitive Selection

Authors: Kiet Q. H. Vo, Muneeb Aadil, Siu Lun Chau, Krikamol Muandet

Abstract: We study the problem of agent selection in causal strategic learning under multiple decision makers and address two key challenges that come with it. Firstly, while much of prior work focuses on studying a fixed pool of agents that remains static regardless of their evaluations, we consider the impact of selection procedure by which agents are not only evaluated, but also selected. When each decis… ▽ More We study the problem of agent selection in causal strategic learning under multiple decision makers and address two key challenges that come with it. Firstly, while much of prior work focuses on studying a fixed pool of agents that remains static regardless of their evaluations, we consider the impact of selection procedure by which agents are not only evaluated, but also selected. When each decision maker unilaterally selects agents by maximising their own utility, we show that the optimal selection rule is a trade-off between selecting the best agents and providing incentives to maximise the agents' improvement. Furthermore, this optimal selection rule relies on incorrect predictions of agents' outcomes. Hence, we study the conditions under which a decision maker's optimal selection rule will not lead to deterioration of agents' outcome nor cause unjust reduction in agents' selection chance. To that end, we provide an analytical form of the optimal selection rule and a mechanism to retrieve the causal parameters from observational data, under certain assumptions on agents' behaviour. Secondly, when there are multiple decision makers, the interference between selection rules introduces another source of biases in estimating the underlying causal parameters. To address this problem, we provide a cooperative protocol which all decision makers must collectively adopt to recover the true causal parameters. Lastly, we complement our theoretical results with simulation studies. Our results highlight not only the importance of causal modeling as a strategy to mitigate the effect of gaming, as suggested by previous work, but also the need of a benevolent regulator to enable it. △ Less

Submitted 3 February, 2024; v1 submitted 30 August, 2023; originally announced August 2023.

Comments: Added more discussions on assumptions and the algorithm, and expand the Conclusion

arXiv:2307.07710 [pdf, other]

ExposureDiffusion: Learning to Expose for Low-light Image Enhancement

Authors: Yufei Wang, Yi Yu, Wenhan Yang, Lanqing Guo, Lap-Pui Chau, Alex C. Kot, Bihan Wen

Abstract: Previous raw image-based low-light image enhancement methods predominantly relied on feed-forward neural networks to learn deterministic map**s from low-light to normally-exposed images. However, they failed to capture critical distribution information, leading to visually undesirable results. This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure… ▽ More Previous raw image-based low-light image enhancement methods predominantly relied on feed-forward neural networks to learn deterministic map**s from low-light to normally-exposed images. However, they failed to capture critical distribution information, leading to visually undesirable results. This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure model. Different from a vanilla diffusion model that has to perform Gaussian denoising, with the injected physics-based exposure model, our restoration process can directly start from a noisy image instead of pure noise. As such, our method obtains significantly improved performance and reduced inference time compared with vanilla diffusion models. To make full use of the advantages of different intermediate steps, we further propose an adaptive residual layer that effectively screens out the side-effect in the iterative refinement when the intermediate results have been already well-exposed. The proposed framework can work with both real-paired datasets, SOTA noise models, and different backbone networks. Note that, the proposed framework is compatible with real-paired datasets, real/synthetic noise models, and different backbone networks. We evaluate the proposed method on various public benchmarks, achieving promising results with consistent improvements using different exposure models and backbones. Besides, the proposed method achieves better generalization capacity for unseen amplifying ratios and better performance than a larger feedforward neural model when few parameters are adopted. △ Less

Submitted 15 August, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

Comments: accepted by ICCV2023

arXiv:2307.05616 [pdf, other]

Image Reconstruction using Enhanced Vision Transformer

Authors: Nikhil Verma, Deepkamal Kaur, Lydia Chau

Abstract: Removing noise from images is a challenging and fundamental problem in the field of computer vision. Images captured by modern cameras are inevitably degraded by noise which limits the accuracy of any quantitative measurements on those images. In this project, we propose a novel image reconstruction framework which can be used for tasks such as image denoising, deblurring or inpainting. The model… ▽ More Removing noise from images is a challenging and fundamental problem in the field of computer vision. Images captured by modern cameras are inevitably degraded by noise which limits the accuracy of any quantitative measurements on those images. In this project, we propose a novel image reconstruction framework which can be used for tasks such as image denoising, deblurring or inpainting. The model proposed in this project is based on Vision Transformer (ViT) that takes 2D images as input and outputs embeddings which can be used for reconstructing denoised images. We incorporate four additional optimization techniques in the framework to improve the model reconstruction capability, namely Locality Sensitive Attention (LSA), Shifted Patch Tokenization (SPT), Rotary Position Embeddings (RoPE) and adversarial loss function inspired from Generative Adversarial Networks (GANs). LSA, SPT and RoPE enable the transformer to learn from the dataset more efficiently, while the adversarial loss function enhances the resolution of the reconstructed images. Based on our experiments, the proposed architecture outperforms the benchmark U-Net model by more than 3.5\% structural similarity (SSIM) for the reconstruction tasks of image denoising and inpainting. The proposed enhancements further show an improvement of \textasciitilde5\% SSIM over the benchmark for both tasks. △ Less

Submitted 10 July, 2023; originally announced July 2023.

arXiv:2306.12111 [pdf, other]

A Comprehensive Study on the Robustness of Image Classification and Object Detection in Remote Sensing: Surveying and Benchmarking

Authors: Shaohui Mei, Jiawei Lian, Xiaofei Wang, Yuru Su, Mingyang Ma, Lap-Pui Chau

Abstract: Deep neural networks (DNNs) have found widespread applications in interpreting remote sensing (RS) imagery. However, it has been demonstrated in previous works that DNNs are vulnerable to different types of noises, particularly adversarial noises. Surprisingly, there has been a lack of comprehensive studies on the robustness of RS tasks, prompting us to undertake a thorough survey and benchmark on… ▽ More Deep neural networks (DNNs) have found widespread applications in interpreting remote sensing (RS) imagery. However, it has been demonstrated in previous works that DNNs are vulnerable to different types of noises, particularly adversarial noises. Surprisingly, there has been a lack of comprehensive studies on the robustness of RS tasks, prompting us to undertake a thorough survey and benchmark on the robustness of image classification and object detection in RS. To our best knowledge, this study represents the first comprehensive examination of both natural robustness and adversarial robustness in RS tasks. Specifically, we have curated and made publicly available datasets that contain natural and adversarial noises. These datasets serve as valuable resources for evaluating the robustness of DNNs-based models. To provide a comprehensive assessment of model robustness, we conducted meticulous experiments with numerous different classifiers and detectors, encompassing a wide range of mainstream methods. Through rigorous evaluation, we have uncovered insightful and intriguing findings, which shed light on the relationship between adversarial noise crafting and model training, yielding a deeper understanding of the susceptibility and limitations of various models, and providing guidance for the development of more resilient and robust models △ Less

Submitted 15 September, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

arXiv:2306.12058 [pdf, other]

Beyond Learned Metadata-based Raw Image Reconstruction

Authors: Yufei Wang, Yi Yu, Wenhan Yang, Lanqing Guo, Lap-Pui Chau, Alex C. Kot, Bihan Wen

Abstract: While raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels, they are not widely adopted by general users due to their substantial storage requirements. Very recent studies propose to compress raw images by designing sampling masks within the pixel space of the raw image. However, these approaches often leave space for pursuing more effective im… ▽ More While raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels, they are not widely adopted by general users due to their substantial storage requirements. Very recent studies propose to compress raw images by designing sampling masks within the pixel space of the raw image. However, these approaches often leave space for pursuing more effective image representations and compact metadata. In this work, we propose a novel framework that learns a compact representation in the latent space, serving as metadata, in an end-to-end manner. Compared with lossy image compression, we analyze the intrinsic difference of the raw image reconstruction task caused by rich information from the sRGB image. Based on the analysis, a novel backbone design with asymmetric and hybrid spatial feature resolutions is proposed, which significantly improves the rate-distortion performance. Besides, we propose a novel design of the context model, which can better predict the order masks of encoding/decoding based on both the sRGB image and the masks of already processed features. Benefited from the better modeling of the correlation between order masks, the already processed information can be better utilized. Moreover, a novel sRGB-guided adaptive quantization precision strategy, which dynamically assigns varying levels of quantization precision to different regions, further enhances the representation ability of the model. Finally, based on the iterative properties of the proposed context model, we propose a novel strategy to achieve variable bit rates using a single model. This strategy allows for the continuous convergence of a wide range of bit rates. Extensive experimental results demonstrate that the proposed method can achieve better reconstruction quality with a smaller metadata size. △ Less

Submitted 21 June, 2023; originally announced June 2023.

arXiv:2305.15167 [pdf, other]

Explaining the Uncertain: Stochastic Shapley Values for Gaussian Process Models

Authors: Siu Lun Chau, Krikamol Muandet, Dino Sejdinovic

Abstract: We present a novel approach for explaining Gaussian processes (GPs) that can utilize the full analytical covariance structure present in GPs. Our method is based on the popular solution concept of Shapley values extended to stochastic cooperative games, resulting in explanations that are random variables. The GP explanations generated using our approach satisfy similar favorable axioms to standard… ▽ More We present a novel approach for explaining Gaussian processes (GPs) that can utilize the full analytical covariance structure present in GPs. Our method is based on the popular solution concept of Shapley values extended to stochastic cooperative games, resulting in explanations that are random variables. The GP explanations generated using our approach satisfy similar favorable axioms to standard Shapley values and possess a tractable covariance function across features and data observations. This covariance allows for quantifying explanation uncertainties and studying the statistical dependencies between explanations. We further extend our framework to the problem of predictive explanation, and propose a Shapley prior over the explanation function to predict Shapley values for new data based on previously computed ones. Our extensive illustrations demonstrate the effectiveness of the proposed approach. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: 26 pages, 6 figures

arXiv:2304.06983 [pdf, other]

A Byte Sequence is Worth an Image: CNN for File Fragment Classification Using Bit Shift and n-Gram Embeddings

Authors: Wenyang Liu, Yi Wang, Kejun Wu, Kim-Hui Yap, Lap-Pui Chau

Abstract: File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-byte information, is seldom considered. This is inherently inapt for classifying variable-length codin… ▽ More File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-byte information, is seldom considered. This is inherently inapt for classifying variable-length coding files whose symbols are represented as the variable number of bits. Conversely, we propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images, which allows us to capture both inter-byte and intra-byte correlations simultaneously through powerful convolutional neural networks (CNNs). Specifically, to convert file fragments to 2d images, we employ a sliding byte window to expose the neglected intra-byte information and stack their n-gram features row by row. We further propose a byte sequence \& image fusion network as a classifier, which can jointly model the raw 1d byte sequence and the converted 2d image to perform FFC. Experiments on FFT-75 dataset validate that our proposed method can achieve notable accuracy improvements over state-of-the-art methods in nearly all scenarios. The code will be released at https://github.com/wenyang001/Byte2Image. △ Less

Submitted 14 April, 2023; originally announced April 2023.

Comments: Accepted by AICAS 2023

arXiv:2304.06976 [pdf, other]

Bitstream-Corrupted JPEG Images are Restorable: Two-stage Compensation and Alignment Framework for Image Restoration

Authors: Wenyang Liu, Yi Wang, Kim-Hui Yap, Lap-Pui Chau

Abstract: In this paper, we study a real-world JPEG image restoration problem with bit errors on the encrypted bitstream. The bit errors bring unpredictable color casts and block shifts on decoded image contents, which cannot be resolved by existing image restoration methods mainly relying on pre-defined degradation models in the pixel domain. To address these challenges, we propose a robust JPEG decoder, f… ▽ More In this paper, we study a real-world JPEG image restoration problem with bit errors on the encrypted bitstream. The bit errors bring unpredictable color casts and block shifts on decoded image contents, which cannot be resolved by existing image restoration methods mainly relying on pre-defined degradation models in the pixel domain. To address these challenges, we propose a robust JPEG decoder, followed by a two-stage compensation and alignment framework to restore bitstream-corrupted JPEG images. Specifically, the robust JPEG decoder adopts an error-resilient mechanism to decode the corrupted JPEG bitstream. The two-stage framework is composed of the self-compensation and alignment (SCA) stage and the guided-compensation and alignment (GCA) stage. The SCA adaptively performs block-wise image color compensation and alignment based on the estimated color and block offsets via image content similarity. The GCA leverages the extracted low-resolution thumbnail from the JPEG header to guide full-resolution pixel-wise image restoration in a coarse-to-fine manner. It is achieved by a coarse-guided pix2pix network and a refine-guided bi-directional Laplacian pyramid fusion network. We conduct experiments on three benchmarks with varying degrees of bit error rates. Experimental results and ablation studies demonstrate the superiority of our proposed method. The code will be released at https://github.com/wenyang001/Two-ACIR. △ Less

Submitted 14 April, 2023; originally announced April 2023.

Comments: Accepted by CVPR 2023

arXiv:2302.12995 [pdf, other]

Raw Image Reconstruction with Learned Compact Metadata

Authors: Yufei Wang, Yi Yu, Wenhan Yang, Lanqing Guo, Lap-Pui Chau, Alex Kot, Bihan Wen

Abstract: While raw images exhibit advantages over sRGB images (e.g., linearity and fine-grained quantization level), they are not widely used by common users due to the large storage requirements. Very recent works propose to compress raw images by designing the sampling masks in the raw image pixel space, leading to suboptimal image representations and redundant metadata. In this paper, we propose a novel… ▽ More While raw images exhibit advantages over sRGB images (e.g., linearity and fine-grained quantization level), they are not widely used by common users due to the large storage requirements. Very recent works propose to compress raw images by designing the sampling masks in the raw image pixel space, leading to suboptimal image representations and redundant metadata. In this paper, we propose a novel framework to learn a compact representation in the latent space serving as the metadata in an end-to-end manner. Furthermore, we propose a novel sRGB-guided context model with improved entropy estimation strategies, which leads to better reconstruction quality, smaller size of metadata, and faster speed. We illustrate how the proposed raw image compression scheme can adaptively allocate more bits to image regions that are important from a global perspective. The experimental results show that the proposed method can achieve superior raw image reconstruction results using a smaller size of the metadata on both uncompressed sRGB images and JPEG images. △ Less

Submitted 27 February, 2023; v1 submitted 25 February, 2023; originally announced February 2023.

Comments: Accepted by CVPR 2023

arXiv:2302.11919 [pdf, other]

doi 10.1109/TITS.2023.3311633

PEM: Perception Error Model for Virtual Testing of Autonomous Vehicles

Authors: Andrea Piazzoni, Jim Cherian, Justin Dauwels, Lap-Pui Chau

Abstract: Even though virtual testing of Autonomous Vehicles (AVs) has been well recognized as essential for safety assessment, AV simulators are still undergoing active development. One particularly challenging question is to effectively include the Sensing and Perception (S&P) subsystem into the simulation loop. In this article, we define Perception Error Models (PEM), a virtual simulation component that… ▽ More Even though virtual testing of Autonomous Vehicles (AVs) has been well recognized as essential for safety assessment, AV simulators are still undergoing active development. One particularly challenging question is to effectively include the Sensing and Perception (S&P) subsystem into the simulation loop. In this article, we define Perception Error Models (PEM), a virtual simulation component that can enable the analysis of the impact of perception errors on AV safety, without the need to model the sensors themselves. We propose a generalized data-driven procedure towards parametric modeling and evaluate it using Apollo, an open-source driving software, and nuScenes, a public AV dataset. Additionally, we implement PEMs in SVL, an open-source vehicle simulator. Furthermore, we demonstrate the usefulness of PEM-based virtual tests, by evaluating camera, LiDAR, and camera-LiDAR setups. Our virtual tests highlight limitations in the current evaluation metrics, and the proposed approach can help study the impact of perception errors on AV safety. △ Less

Submitted 27 February, 2024; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: 12 pages, 10 figures. This is a preprint, and version 2 only updates the title and the reference to the final published article, which can be found at DOI: 10.1109/TITS.2023.3311633

ACM Class: C.4; I.2; I.6

Journal ref: IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 1, pp. 670-681, Jan. 2024

arXiv:2302.05746 [pdf, other]

Removing Image Artifacts From Scratched Lens Protectors

Authors: Yufei Wang, Renjie Wan, Wenhan Yang, Bihan Wen, Lap-Pui Chau, Alex C. Kot

Abstract: A protector is placed in front of the camera lens for mobile devices to avoid damage, while the protector itself can be easily scratched accidentally, especially for plastic ones. The artifacts appear in a wide variety of patterns, making it difficult to see through them clearly. Removing image artifacts from the scratched lens protector is inherently challenging due to the occasional flare artifa… ▽ More A protector is placed in front of the camera lens for mobile devices to avoid damage, while the protector itself can be easily scratched accidentally, especially for plastic ones. The artifacts appear in a wide variety of patterns, making it difficult to see through them clearly. Removing image artifacts from the scratched lens protector is inherently challenging due to the occasional flare artifacts and the co-occurring interference within mixed artifacts. Though different methods have been proposed for some specific distortions, they seldom consider such inherent challenges. In our work, we consider the inherent challenges in a unified framework with two cooperative modules, which facilitate the performance boost of each other. We also collect a new dataset from the real world to facilitate training and evaluation purposes. The experimental results demonstrate that our method outperforms the baselines qualitatively and quantitatively. The code and datasets will be released after acceptance. △ Less

Submitted 14 February, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

Comments: Accepted by ISCAS 2023

arXiv:2301.04137 [pdf]

A Global Radio Remote Sensing Network for Observing Space Weather Dynamics

Authors: Ryan Volz, Philip J. Erickson, Scott E. Palo, Jorge L. Chau, Juha Vierinen, Thomas Y. Chen

Abstract: Our current sampling of the near-Earth space environment is wholly insufficient to measure the highly variable processes therein and make predictions on par with lower atmospheric weather. We sketch out the scientific rationale for a network of radio instruments delivering dense observations of the near-Earth space environment and the broad steps necessary to implement wide-scale coverage in the n… ▽ More Our current sampling of the near-Earth space environment is wholly insufficient to measure the highly variable processes therein and make predictions on par with lower atmospheric weather. We sketch out the scientific rationale for a network of radio instruments delivering dense observations of the near-Earth space environment and the broad steps necessary to implement wide-scale coverage in the next 30 years. △ Less

Submitted 9 January, 2023; originally announced January 2023.

Comments: Heliophysics 2050 Workshop

arXiv:2211.11175 [pdf, other]

doi 10.1109/ITSC55140.2022.9921807

CoPEM: Cooperative Perception Error Models for Autonomous Driving

Authors: Andrea Piazzoni, Jim Cherian, Roshan Vijay, Lap-Pui Chau, Justin Dauwels

Abstract: In this paper, we introduce the notion of Cooperative Perception Error Models (coPEMs) towards achieving an effective and efficient integration of V2X solutions within a virtual test environment. We focus our analysis on the occlusion problem in the (onboard) perception of Autonomous Vehicles (AV), which can manifest as misdetection errors on the occluded objects. Cooperative perception (CP) solut… ▽ More In this paper, we introduce the notion of Cooperative Perception Error Models (coPEMs) towards achieving an effective and efficient integration of V2X solutions within a virtual test environment. We focus our analysis on the occlusion problem in the (onboard) perception of Autonomous Vehicles (AV), which can manifest as misdetection errors on the occluded objects. Cooperative perception (CP) solutions based on Vehicle-to-Everything (V2X) communications aim to avoid such issues by cooperatively leveraging additional points of view for the world around the AV. This approach usually requires many sensors, mainly cameras and LiDARs, to be deployed simultaneously in the environment either as part of the road infrastructure or on other traffic vehicles. However, implementing a large number of sensor models in a virtual simulation pipeline is often prohibitively computationally expensive. Therefore, in this paper, we rely on extending Perception Error Models (PEMs) to efficiently implement such cooperative perception solutions along with the errors and uncertainties associated with them. We demonstrate the approach by comparing the safety achievable by an AV challenged with a traffic scenario where occlusion is the primary cause of a potential collision. △ Less

Submitted 21 November, 2022; v1 submitted 20 November, 2022; originally announced November 2022.

Comments: Accepted at 2022 IEEE International Conference on Intelligent Transportation Systems - ITSC2022 6 pages, 6 figures

ACM Class: C.4; I.2; I.6

arXiv:2210.05598 [pdf, other]

Enriching Biomedical Knowledge for Low-resource Language Through Large-Scale Translation

Authors: Long Phan, Tai Dang, Hieu Tran, Trieu H. Trinh, Vy Phan, Lam D. Chau, Minh-Thang Luong

Abstract: Biomedical data and benchmarks are highly valuable yet very limited in low-resource languages other than English such as Vietnamese. In this paper, we make use of a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained as well as supervised data in the biomedical domains. Thanks to such large-scale translation, we introduce ViPubmedT5, a pretrained Encod… ▽ More Biomedical data and benchmarks are highly valuable yet very limited in low-resource languages other than English such as Vietnamese. In this paper, we make use of a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained as well as supervised data in the biomedical domains. Thanks to such large-scale translation, we introduce ViPubmedT5, a pretrained Encoder-Decoder Transformer model trained on 20 million translated abstracts from the high-quality public PubMed corpus. ViPubMedT5 demonstrates state-of-the-art results on two different biomedical benchmarks in summarization and acronym disambiguation. Further, we release ViMedNLI - a new NLP task in Vietnamese translated from MedNLI using the recently public En-vi translation model and carefully refined by human experts, with evaluations of existing methods against ViPubmedT5. △ Less

Submitted 29 January, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

arXiv:2206.12444 [pdf, other]

Gated Domain Units for Multi-source Domain Generalization

Authors: Simon Föll, Alina Dubatovka, Eugen Ernst, Siu Lun Chau, Martin Maritsch, Patrik Okanovic, Gudrun Thäter, Joachim M. Buhmann, Felix Wortmann, Krikamol Muandet

Abstract: The phenomenon of distribution shift (DS) occurs when a dataset at test time differs from the dataset at training time, which can significantly impair the performance of a machine learning model in practical settings due to a lack of knowledge about the data's distribution at test time. To address this problem, we postulate that real-world distributions are composed of latent Invariant Elementary… ▽ More The phenomenon of distribution shift (DS) occurs when a dataset at test time differs from the dataset at training time, which can significantly impair the performance of a machine learning model in practical settings due to a lack of knowledge about the data's distribution at test time. To address this problem, we postulate that real-world distributions are composed of latent Invariant Elementary Distributions (I.E.D) across different domains. This assumption implies an invariant structure in the solution space that enables knowledge transfer to unseen domains. To exploit this property for domain generalization, we introduce a modular neural network layer consisting of Gated Domain Units (GDUs) that learn a representation for each latent elementary distribution. During inference, a weighted ensemble of learning machines can be created by comparing new observations with the representations of each elementary distribution. Our flexible framework also accommodates scenarios where explicit domain information is not present. Extensive experiments on image, text, and graph data show consistent performance improvement on out-of-training target domains. These findings support the practicality of the I.E.D assumption and the effectiveness of GDUs for domain generalisation. △ Less

Submitted 16 May, 2023; v1 submitted 24 June, 2022; originally announced June 2022.

arXiv:2205.13662 [pdf, other]

Explaining Preferences with Shapley Values

Authors: Robert Hu, Siu Lun Chau, Jaime Ferrando Huertas, Dino Sejdinovic

Abstract: While preference modelling is becoming one of the pillars of machine learning, the problem of preference explanation remains challenging and underexplored. In this paper, we propose \textsc{Pref-SHAP}, a Shapley value-based model explanation framework for pairwise comparison data. We derive the appropriate value functions for preference models and further extend the framework to model and explain… ▽ More While preference modelling is becoming one of the pillars of machine learning, the problem of preference explanation remains challenging and underexplored. In this paper, we propose \textsc{Pref-SHAP}, a Shapley value-based model explanation framework for pairwise comparison data. We derive the appropriate value functions for preference models and further extend the framework to model and explain \emph{context specific} information, such as the surface type in a tennis game. To demonstrate the utility of \textsc{Pref-SHAP}, we apply our method to a variety of synthetic and real-world datasets and show that richer and more insightful explanations can be obtained over the baseline. △ Less

Submitted 8 November, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

arXiv:2202.08517 [pdf, other]

TAFNet: A Three-Stream Adaptive Fusion Network for RGB-T Crowd Counting

Authors: Haihan Tang, Yi Wang, Lap-Pui Chau

Abstract: In this paper, we propose a three-stream adaptive fusion network named TAFNet, which uses paired RGB and thermal images for crowd counting. Specifically, TAFNet is divided into one main stream and two auxiliary streams. We combine a pair of RGB and thermal images to constitute the input of main stream. Two auxiliary streams respectively exploit RGB image and thermal image to extract modality-speci… ▽ More In this paper, we propose a three-stream adaptive fusion network named TAFNet, which uses paired RGB and thermal images for crowd counting. Specifically, TAFNet is divided into one main stream and two auxiliary streams. We combine a pair of RGB and thermal images to constitute the input of main stream. Two auxiliary streams respectively exploit RGB image and thermal image to extract modality-specific features. Besides, we propose an Information Improvement Module (IIM) to fuse the modality-specific features into the main stream adaptively. Experiment results on RGBT-CC dataset show that our method achieves more than 20% improvement on mean average error and root mean squared error compared with state-of-the-art method. The source code will be publicly available at https://github.com/TANGHAIHAN/TAFNet. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: This work has been accepted by IEEE International Symposium on Circuits and Systems (ISCAS) 2022

arXiv:2202.01085 [pdf, other]

Giga-scale Kernel Matrix Vector Multiplication on GPU

Authors: Robert Hu, Siu Lun Chau, Dino Sejdinovic, Joan Alexis Glaunès

Abstract: Kernel matrix-vector multiplication (KMVM) is a foundational operation in machine learning and scientific computing. However, as KMVM tends to scale quadratically in both memory and time, applications are often limited by these computational constraints. In this paper, we propose a novel approximation procedure coined \textit{Faster-Fast and Free Memory Method} ($\fthreem$) to address these scalin… ▽ More Kernel matrix-vector multiplication (KMVM) is a foundational operation in machine learning and scientific computing. However, as KMVM tends to scale quadratically in both memory and time, applications are often limited by these computational constraints. In this paper, we propose a novel approximation procedure coined \textit{Faster-Fast and Free Memory Method} ($\fthreem$) to address these scaling issues of KMVM for tall~($10^8\sim 10^9$) and skinny~($D\leq7$) data. Extensive experiments demonstrate that $\fthreem$ has empirical \emph{linear time and memory} complexity with a relative error of order $10^{-3}$ and can compute a full KMVM for a billion points \emph{in under a minute} on a high-end GPU, leading to a significant speed-up in comparison to existing CPU methods. We demonstrate the utility of our procedure by applying it as a drop-in for the state-of-the-art GPU-based linear solver FALKON, \emph{improving speed 1.5-5.5 times} at the cost of $<1\%$ drop in accuracy. We further demonstrate competitive results on \emph{Gaussian Process regression} coupled with significant speedups on a variety of real-world datasets. △ Less

Submitted 12 October, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

arXiv:2110.09167 [pdf, other]

RKHS-SHAP: Shapley Values for Kernel Methods

Authors: Siu Lun Chau, Robert Hu, Javier Gonzalez, Dino Sejdinovic

Abstract: Feature attribution for kernel methods is often heuristic and not individualised for each prediction. To address this, we turn to the concept of Shapley values~(SV), a coalition game theoretical framework that has previously been applied to different machine learning model interpretation tasks, such as linear models, tree ensembles and deep networks. By analysing SVs from a functional perspective,… ▽ More Feature attribution for kernel methods is often heuristic and not individualised for each prediction. To address this, we turn to the concept of Shapley values~(SV), a coalition game theoretical framework that has previously been applied to different machine learning model interpretation tasks, such as linear models, tree ensembles and deep networks. By analysing SVs from a functional perspective, we propose \textsc{RKHS-SHAP}, an attribution method for kernel machines that can efficiently compute both \emph{Interventional} and \emph{Observational Shapley values} using kernel mean embeddings of distributions. We show theoretically that our method is robust with respect to local perturbations - a key yet often overlooked desideratum for consistent model interpretation. Further, we propose \emph{Shapley regulariser}, applicable to a general empirical risk minimisation framework, allowing learning while controlling the level of specific feature's contributions to the model. We demonstrate that the Shapley regulariser enables learning which is robust to covariate shift of a given feature and fair learning which controls the SVs of sensitive features. △ Less

Submitted 26 May, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

arXiv:2109.05923 [pdf, other]

Low-Light Image Enhancement with Normalizing Flow

Authors: Yufei Wang, Renjie Wan, Wenhan Yang, Haoliang Li, Lap-Pui Chau, Alex C. Kot

Abstract: To enhance low-light images to normally-exposed ones is highly ill-posed, namely that the map** relationship between them is one-to-many. Previous works based on the pixel-wise reconstruction losses and deterministic processes fail to capture the complex conditional distribution of normally exposed images, which results in improper brightness, residual noise, and artifacts. In this paper, we inv… ▽ More To enhance low-light images to normally-exposed ones is highly ill-posed, namely that the map** relationship between them is one-to-many. Previous works based on the pixel-wise reconstruction losses and deterministic processes fail to capture the complex conditional distribution of normally exposed images, which results in improper brightness, residual noise, and artifacts. In this paper, we investigate to model this one-to-many relationship via a proposed normalizing flow model. An invertible network that takes the low-light images/features as the condition and learns to map the distribution of normally exposed images into a Gaussian distribution. In this way, the conditional distribution of the normally exposed images can be well modeled, and the enhancement process, i.e., the other inference direction of the invertible network, is equivalent to being constrained by a loss function that better describes the manifold structure of natural images during the training. The experimental results on the existing benchmark datasets show our method achieves better quantitative and qualitative results, obtaining better-exposed illumination, less noise and artifact, and richer colors. △ Less

Submitted 13 September, 2021; originally announced September 2021.

arXiv:2109.05826 [pdf, other]

Variational Disentanglement for Domain Generalization

Authors: Yufei Wang, Haoliang Li, Hao Cheng, Bihan Wen, Lap-Pui Chau, Alex C. Kot

Abstract: Domain generalization aims to learn an invariant model that can generalize well to the unseen target domain. In this paper, we propose to tackle the problem of domain generalization by delivering an effective framework named Variational Disentanglement Network (VDN), which is capable of disentangling the domain-specific features and task-specific features, where the task-specific features are expe… ▽ More Domain generalization aims to learn an invariant model that can generalize well to the unseen target domain. In this paper, we propose to tackle the problem of domain generalization by delivering an effective framework named Variational Disentanglement Network (VDN), which is capable of disentangling the domain-specific features and task-specific features, where the task-specific features are expected to be better generalized to unseen but related test data. We further show the rationale of our proposed method by proving that our proposed framework is equivalent to minimize the evidence upper bound of the divergence between the distribution of task-specific features and its invariant ground truth derived from variational inference. We conduct extensive experiments to verify our method on three benchmarks, and both quantitative and qualitative results illustrate the effectiveness of our method. △ Less

Submitted 16 May, 2023; v1 submitted 13 September, 2021; originally announced September 2021.

Comments: Accepted to TMLR 2022

arXiv:2107.08228 [pdf, other]

Weakly-supervised Part-Attention and Mentored Networks for Vehicle Re-Identification

Authors: Lisha Tang, Yi Wang, Lap-Pui Chau

Abstract: Vehicle re-identification (Re-ID) aims to retrieve images with the same vehicle ID across different cameras. Current part-level feature learning methods typically detect vehicle parts via uniform division, outside tools, or attention modeling. However, such part features often require expensive additional annotations and cause sub-optimal performance in case of unreliable part mask predictions. In… ▽ More Vehicle re-identification (Re-ID) aims to retrieve images with the same vehicle ID across different cameras. Current part-level feature learning methods typically detect vehicle parts via uniform division, outside tools, or attention modeling. However, such part features often require expensive additional annotations and cause sub-optimal performance in case of unreliable part mask predictions. In this paper, we propose a weakly-supervised Part-Attention Network (PANet) and Part-Mentored Network (PMNet) for Vehicle Re-ID. Firstly, PANet localizes vehicle parts via part-relevant channel recalibration and cluster-based mask generation without vehicle part supervisory information. Secondly, PMNet leverages teacher-student guided learning to distill vehicle part-specific features from PANet and performs multi-scale global-part feature extraction. During inference, PMNet can adaptively extract discriminative part features without part localization by PANet, preventing unstable part mask predictions. We address this Re-ID issue as a multi-task problem and adopt Homoscedastic Uncertainty to learn the optimal weighing of ID losses. Experiments are conducted on two public benchmarks, showing that our approach outperforms recent methods, which require no extra annotations by an average increase of 3.0% in CMC@5 on VehicleID and over 1.4% in mAP on VeRi776. Moreover, our method can extend to the occluded vehicle Re-ID task and exhibits good generalization ability. △ Less

Submitted 11 July, 2022; v1 submitted 17 July, 2021; originally announced July 2021.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2107.06211 [pdf, other]

doi 10.1109/TIP.2022.3160070

Attention-Guided Progressive Neural Texture Fusion for High Dynamic Range Image Restoration

Authors: Jie Chen, Zaifeng Yang, Tsz Nam Chan, Hui Li, Junhui Hou, Lap-Pui Chau

Abstract: High Dynamic Range (HDR) imaging via multi-exposure fusion is an important task for most modern imaging platforms. In spite of recent developments in both hardware and algorithm innovations, challenges remain over content association ambiguities caused by saturation, motion, and various artifacts introduced during multi-exposure fusion such as ghosting, noise, and blur. In this work, we propose an… ▽ More High Dynamic Range (HDR) imaging via multi-exposure fusion is an important task for most modern imaging platforms. In spite of recent developments in both hardware and algorithm innovations, challenges remain over content association ambiguities caused by saturation, motion, and various artifacts introduced during multi-exposure fusion such as ghosting, noise, and blur. In this work, we propose an Attention-guided Progressive Neural Texture Fusion (APNT-Fusion) HDR restoration model which aims to address these issues within one framework. An efficient two-stream structure is proposed which separately focuses on texture feature transfer over saturated regions and multi-exposure tonal and texture feature fusion. A neural feature transfer mechanism is proposed which establishes spatial correspondence between different exposures based on multi-scale VGG features in the masked saturated HDR domain for discriminative contextual clues over the ambiguous image areas. A progressive texture blending module is designed to blend the encoded two-stream features in a multi-scale and progressive manner. In addition, we introduce several novel attention mechanisms, i.e., the motion attention module detects and suppresses the content discrepancies among the reference images; the saturation attention module facilitates differentiating the misalignment caused by saturation from those caused by motion; and the scale attention module ensures texture blending consistency between different coder/decoder scales. We carry out comprehensive qualitative and quantitative evaluations and ablation studies, which validate that these novel modules work coherently under the same framework and outperform state-of-the-art methods. △ Less

Submitted 13 July, 2021; originally announced July 2021.

arXiv:2107.06176 [pdf]

An Efficient Shock Advice Algorithm based on K-nearest Neighbors for Automated External Defibrillators

Authors: Dao Thanh Hai, Nguyen Minh Tuan, Nguyen Thi Thu Hang, Le Hai Chau

Abstract: Shockable rhythms, namely ventricular fibrillation and ventricular tachycardia, are the main cause of sudden cardiac arrests, which can be detected quickly by the automated external defibrillator (AED) devices. In this paper, a simple but effective algorithm is proposed as the shock advice algorithm applied in AED. The proposed algorithm consists of K-nearest neighbor classifier and an optimal set… ▽ More Shockable rhythms, namely ventricular fibrillation and ventricular tachycardia, are the main cause of sudden cardiac arrests, which can be detected quickly by the automated external defibrillator (AED) devices. In this paper, a simple but effective algorithm is proposed as the shock advice algorithm applied in AED. The proposed algorithm consists of K-nearest neighbor classifier and an optimal set of 36 features, which are extracted from original ECG and shockable, non-shockable signals using modified variational mode decomposition technique. Cross-validation procedure and sequential forward feature selection are carefully applied to select an optimal set from entire feature space. The performance results show that the MVMD is the key element for SCA detection performance, and the proposed algorithm is simpler while remaining relatively high detection performance compared to previous publications. △ Less

Submitted 25 September, 2021; v1 submitted 29 June, 2021; originally announced July 2021.

Comments: 5 pages, 5 figures, 3 tables, accepted version to IEEE ATC 2021

arXiv:2107.02629 [pdf, other]

Embracing the Dark Knowledge: Domain Generalization Using Regularized Knowledge Distillation

Authors: Yufei Wang, Haoliang Li, Lap-pui Chau, Alex C. Kot

Abstract: Though convolutional neural networks are widely used in different tasks, lack of generalization capability in the absence of sufficient and representative data is one of the challenges that hinder their practical application. In this paper, we propose a simple, effective, and plug-and-play training strategy named Knowledge Distillation for Domain Generalization (KDDG) which is built upon a knowled… ▽ More Though convolutional neural networks are widely used in different tasks, lack of generalization capability in the absence of sufficient and representative data is one of the challenges that hinder their practical application. In this paper, we propose a simple, effective, and plug-and-play training strategy named Knowledge Distillation for Domain Generalization (KDDG) which is built upon a knowledge distillation framework with the gradient filter as a novel regularization term. We find that both the ``richer dark knowledge" from the teacher network, as well as the gradient filter we proposed, can reduce the difficulty of learning the map** which further improves the generalization ability of the model. We also conduct experiments extensively to show that our framework can significantly improve the generalization capability of deep neural networks in different tasks including image classification, segmentation, reinforcement learning by comparing our method with existing state-of-the-art domain generalization techniques. Last but not the least, we propose to adopt two metrics to analyze our proposed method in order to better understand how our proposed method benefits the generalization capability of deep neural networks. △ Less

Submitted 6 July, 2021; originally announced July 2021.

Comments: Accepted by ACM MM, 2021

arXiv:2106.03477 [pdf, other]

BayesIMP: Uncertainty Quantification for Causal Data Fusion

Authors: Siu Lun Chau, Jean-François Ton, Javier González, Yee Whye Teh, Dino Sejdinovic

Abstract: While causal models are becoming one of the mainstays of machine learning, the problem of uncertainty quantification in causal inference remains challenging. In this paper, we study the causal data fusion problem, where datasets pertaining to multiple causal graphs are combined to estimate the average treatment effect of a target variable. As data arises from multiple sources and can vary in quali… ▽ More While causal models are becoming one of the mainstays of machine learning, the problem of uncertainty quantification in causal inference remains challenging. In this paper, we study the causal data fusion problem, where datasets pertaining to multiple causal graphs are combined to estimate the average treatment effect of a target variable. As data arises from multiple sources and can vary in quality and quantity, principled uncertainty quantification becomes essential. To that end, we introduce Bayesian Interventional Mean Processes, a framework which combines ideas from probabilistic integration and kernel mean embeddings to represent interventional distributions in the reproducing kernel Hilbert space, while taking into account the uncertainty within each causal graph. To demonstrate the utility of our uncertainty estimation, we apply our method to the Causal Bayesian Optimisation task and show improvements over state-of-the-art methods. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: 10 pages main text, 10 pages supplementary materials

arXiv:2106.01538 [pdf, other]

PDPGD: Primal-Dual Proximal Gradient Descent Adversarial Attack

Authors: Alexander Matyasko, Lap-Pui Chau

Abstract: State-of-the-art deep neural networks are sensitive to small input perturbations. Since the discovery of this intriguing vulnerability, many defence methods have been proposed that attempt to improve robustness to adversarial noise. Fast and accurate attacks are required to compare various defence methods. However, evaluating adversarial robustness has proven to be extremely challenging. Existing… ▽ More State-of-the-art deep neural networks are sensitive to small input perturbations. Since the discovery of this intriguing vulnerability, many defence methods have been proposed that attempt to improve robustness to adversarial noise. Fast and accurate attacks are required to compare various defence methods. However, evaluating adversarial robustness has proven to be extremely challenging. Existing norm minimisation adversarial attacks require thousands of iterations (e.g. Carlini & Wagner attack), are limited to the specific norms (e.g. Fast Adaptive Boundary), or produce sub-optimal results (e.g. Brendel & Bethge attack). On the other hand, PGD attack, which is fast, general and accurate, ignores the norm minimisation penalty and solves a simpler perturbation-constrained problem. In this work, we introduce a fast, general and accurate adversarial attack that optimises the original non-convex constrained minimisation problem. We interpret optimising the Lagrangian of the adversarial attack optimisation problem as a two-player game: the first player minimises the Lagrangian wrt the adversarial noise; the second player maximises the Lagrangian wrt the regularisation penalty. Our attack algorithm simultaneously optimises primal and dual variables to find the minimal adversarial perturbation. In addition, for non-smooth $l_p$-norm minimisation, such as $l_{\infty}$-, $l_1$-, and $l_0$-norms, we introduce primal-dual proximal gradient descent attack. We show in the experiments that our attack outperforms current state-of-the-art $l_{\infty}$-, $l_2$-, $l_1$-, and $l_0$-attacks on MNIST, CIFAR-10 and Restricted ImageNet datasets against unregularised and adversarially trained models. △ Less

Submitted 2 June, 2021; originally announced June 2021.

arXiv:2105.12909 [pdf, other]

Deconditional Downscaling with Gaussian Processes

Authors: Siu Lun Chau, Shahine Bouabid, Dino Sejdinovic

Abstract: Refining low-resolution (LR) spatial fields with high-resolution (HR) information, often known as statistical downscaling, is challenging as the diversity of spatial datasets often prevents direct matching of observations. Yet, when LR samples are modeled as aggregate conditional means of HR samples with respect to a mediating variable that is globally observed, the recovery of the underlying fine… ▽ More Refining low-resolution (LR) spatial fields with high-resolution (HR) information, often known as statistical downscaling, is challenging as the diversity of spatial datasets often prevents direct matching of observations. Yet, when LR samples are modeled as aggregate conditional means of HR samples with respect to a mediating variable that is globally observed, the recovery of the underlying fine-grained field can be framed as taking an "inverse" of the conditional expectation, namely a deconditioning problem. In this work, we propose a Bayesian formulation of deconditioning which naturally recovers the initial reproducing kernel Hilbert space formulation from Hsu and Ramos (2019). We extend deconditioning to a downscaling setup and devise efficient conditional mean embedding estimator for multiresolution data. By treating conditional expectations as inter-domain features of the underlying field, a posterior for the latent field can be established as a solution to the deconditioning problem. Furthermore, we show that this solution can be viewed as a two-staged vector-valued kernel ridge regressor and show that it has a minimax optimal convergence rate under mild assumptions. Lastly, we demonstrate its proficiency in a synthetic and a real-world atmospheric field downscaling problem, showing substantial improvements over existing methods. △ Less

Submitted 25 October, 2021; v1 submitted 26 May, 2021; originally announced May 2021.

arXiv:2105.07901 [pdf, other]

Multi-object Tracking with Tracked Object Bounding Box Association

Authors: Nanyang Yang, Yi Wang, Lap-Pui Chau

Abstract: The CenterTrack tracking algorithm achieves state-of-the-art tracking performance using a simple detection model and single-frame spatial offsets to localize objects and predict their associations in a single network. However, this joint detection and tracking method still suffers from high identity switches due to the inferior association method. To reduce the high number of identity switches and… ▽ More The CenterTrack tracking algorithm achieves state-of-the-art tracking performance using a simple detection model and single-frame spatial offsets to localize objects and predict their associations in a single network. However, this joint detection and tracking method still suffers from high identity switches due to the inferior association method. To reduce the high number of identity switches and improve the tracking accuracy, in this paper, we propose to incorporate a simple tracked object bounding box and overlap** prediction based on the current frame onto the CenterTrack algorithm. Specifically, we propose an Intersection over Union (IOU) distance cost matrix in the association step instead of simple point displacement distance. We evaluate our proposed tracker on the MOT17 test dataset, showing that our proposed method can reduce identity switches significantly by 22.6% and obtain a notable improvement of 1.5% in IDF1 compared to the original CenterTrack's under the same tracklet lifetime. The source code is released at https://github.com/Nanyangny/CenterTrack-IOU. △ Less

Submitted 17 May, 2021; originally announced May 2021.

Comments: 6 pages, accepted paper at ICME workshop 2021

arXiv:2105.01447 [pdf, other]

Moving Towards Centers: Re-ranking with Attention and Memory for Re-identification

Authors: Yunhao Zhou, Yi Wang, Lap-Pui Chau

Abstract: Re-ranking utilizes contextual information to optimize the initial ranking list of person or vehicle re-identification (re-ID), which boosts the retrieval performance at post-processing steps. This paper proposes a re-ranking network to predict the correlations between the probe and top-ranked neighbor samples. Specifically, all the feature embeddings of query and gallery images are expanded and e… ▽ More Re-ranking utilizes contextual information to optimize the initial ranking list of person or vehicle re-identification (re-ID), which boosts the retrieval performance at post-processing steps. This paper proposes a re-ranking network to predict the correlations between the probe and top-ranked neighbor samples. Specifically, all the feature embeddings of query and gallery images are expanded and enhanced by a linear combination of their neighbors, with the correlation prediction serving as discriminative combination weights. The combination process is equivalent to moving independent embeddings toward the identity centers, improving cluster compactness. For correlation prediction, we first aggregate the contextual information for probe's k-nearest neighbors via the Transformer encoder. Then, we distill and refine the probe-related features into the Contextual Memory cell via attention mechanism. Like humans that retrieve images by not only considering probe images but also memorizing the retrieved ones, the Contextual Memory produces multi-view descriptions for each instance. Finally, the neighbors are reconstructed with features fetched from the Contextual Memory, and a binary classifier predicts their correlations with the probe. Experiments on six widely-used person and vehicle re-ID benchmarks demonstrate the effectiveness of the proposed method. Especially, our method surpasses the state-of-the-art re-ranking approaches on large-scale datasets by a significant margin, i.e., with an average 4.83% CMC@1 and 14.83% mAP improvements on VERI-Wild, MSMT17, and VehicleID datasets. △ Less

Submitted 21 March, 2022; v1 submitted 4 May, 2021; originally announced May 2021.

Comments: 13 pages. Accepted for Publication at IEEE Transactions on Multimedia

arXiv:2104.12505 [pdf, other]

Dense Point Prediction: A Simple Baseline for Crowd Counting and Localization

Authors: Yi Wang, Xinyu Hou, Lap-Pui Chau

Abstract: In this paper, we propose a simple yet effective crowd counting and localization network named SCALNet. Unlike most existing works that separate the counting and localization tasks, we consider those tasks as a pixel-wise dense prediction problem and integrate them into an end-to-end framework. Specifically, for crowd counting, we adopt a counting head supervised by the Mean Square Error (MSE) los… ▽ More In this paper, we propose a simple yet effective crowd counting and localization network named SCALNet. Unlike most existing works that separate the counting and localization tasks, we consider those tasks as a pixel-wise dense prediction problem and integrate them into an end-to-end framework. Specifically, for crowd counting, we adopt a counting head supervised by the Mean Square Error (MSE) loss. For crowd localization, the key insight is to recognize the keypoint of people, i.e., the center point of heads. We propose a localization head to distinguish dense crowds trained by two loss functions, i.e., Negative-Suppressed Focal (NSF) loss and False-Positive (FP) loss, which balances the positive/negative examples and handles the false-positive predictions. Experiments on the recent and large-scale benchmark, NWPU-Crowd, show that our approach outperforms the state-of-the-art methods by more than 5% and 10% improvement in crowd localization and counting tasks, respectively. The code is publicly available at https://github.com/WangyiNTU/SCALNet. △ Less

Submitted 26 April, 2021; originally announced April 2021.

Comments: 6 pages. Accepted at ICME Workshop 2021

arXiv:2011.14936 [pdf, other]

doi 10.1109/TITS.2021.3087158

Rethinking and Designing a High-performing Automatic License Plate Recognition Approach

Authors: Yi Wang, Zhen-Peng Bian, Yunhao Zhou, Lap-Pui Chau

Abstract: In this paper, we propose a real-time and accurate automatic license plate recognition (ALPR) approach. Our study illustrates the outstanding design of ALPR with four insights: (1) the resampling-based cascaded framework is beneficial to both speed and accuracy; (2) the highly efficient license plate recognition should abundant additional character segmentation and recurrent neural network (RNN),… ▽ More In this paper, we propose a real-time and accurate automatic license plate recognition (ALPR) approach. Our study illustrates the outstanding design of ALPR with four insights: (1) the resampling-based cascaded framework is beneficial to both speed and accuracy; (2) the highly efficient license plate recognition should abundant additional character segmentation and recurrent neural network (RNN), but adopt a plain convolutional neural network (CNN); (3) in the case of CNN, taking advantage of vertex information on license plates improves the recognition performance; and (4) the weight-sharing character classifier addresses the lack of training images in small-scale datasets. Based on these insights, we propose a novel ALPR approach, termed VSNet. Specifically, VSNet includes two CNNs, i.e., VertexNet for license plate detection and SCR-Net for license plate recognition, integrated in a resampling-based cascaded manner. In VertexNet, we propose an efficient integration block to extract the spatial features of license plates. With vertex supervisory information, we propose a vertex-estimation branch in VertexNet such that license plates can be rectified as the input images of SCR-Net. In SCR-Net, we introduce a horizontal encoding technique for left-to-right feature extraction and propose a weight-sharing classifier for character recognition. Experimental results show that the proposed VSNet outperforms state-of-the-art methods by more than 50% relative improvement on error rate, achieving > 99% recognition accuracy on CCPD and AOLP datasets with 149 FPS inference speed. Moreover, our method illustrates an outstanding generalization capability when evaluated on the unseen PKUData and CLPD datasets. △ Less

Submitted 17 June, 2021; v1 submitted 30 November, 2020; originally announced November 2020.

Comments: 13 pages. Accepted for Publication at IEEE Transactions on Intelligent Transportation Systems

arXiv:2009.06996 [pdf, other]

Light Can Hack Your Face! Black-box Backdoor Attack on Face Recognition Systems

Authors: Haoliang Li, Yufei Wang, Xiaofei Xie, Yang Liu, Shiqi Wang, Renjie Wan, Lap-Pui Chau, Alex C. Kot

Abstract: Deep neural networks (DNN) have shown great success in many computer vision applications. However, they are also known to be susceptible to backdoor attacks. When conducting backdoor attacks, most of the existing approaches assume that the targeted DNN is always available, and an attacker can always inject a specific pattern to the training data to further fine-tune the DNN model. However, in prac… ▽ More Deep neural networks (DNN) have shown great success in many computer vision applications. However, they are also known to be susceptible to backdoor attacks. When conducting backdoor attacks, most of the existing approaches assume that the targeted DNN is always available, and an attacker can always inject a specific pattern to the training data to further fine-tune the DNN model. However, in practice, such attack may not be feasible as the DNN model is encrypted and only available to the secure enclave. In this paper, we propose a novel black-box backdoor attack technique on face recognition systems, which can be conducted without the knowledge of the targeted DNN model. To be specific, we propose a backdoor attack with a novel color stripe pattern trigger, which can be generated by modulating LED in a specialized waveform. We also use an evolutionary computing strategy to optimize the waveform for backdoor attack. Our backdoor attack can be conducted in a very mild condition: 1) the adversary cannot manipulate the input in an unnatural way (e.g., injecting adversarial noise); 2) the adversary cannot access the training database; 3) the adversary has no knowledge of the training model as well as the training set used by the victim party. We show that the backdoor trigger can be quite effective, where the attack success rate can be up to $88\%$ based on our simulation study and up to $40\%$ based on our physical-domain study by considering the task of face recognition and verification based on at most three-time attempts during authentication. Finally, we evaluate several state-of-the-art potential defenses towards backdoor attacks, and find that our attack can still be effective. We highlight that our study revealed a new physical backdoor attack, which calls for the attention of the security issue of the existing face recognition/verification techniques. △ Less

Submitted 15 September, 2020; originally announced September 2020.

Comments: First two authors contributed equally

arXiv:2008.10065 [pdf, other]

doi 10.1109/TSIPN.2021.3059995

Kernel-based Graph Learning from Smooth Signals: A Functional Viewpoint

Authors: Xingyue Pu, Siu Lun Chau, Xiaowen Dong, Dino Sejdinovic

Abstract: The problem of graph learning concerns the construction of an explicit topological structure revealing the relationship between nodes representing data entities, which plays an increasingly important role in the success of many graph-based representations and algorithms in the field of machine learning and graph signal processing. In this paper, we propose a novel graph learning framework that inc… ▽ More The problem of graph learning concerns the construction of an explicit topological structure revealing the relationship between nodes representing data entities, which plays an increasingly important role in the success of many graph-based representations and algorithms in the field of machine learning and graph signal processing. In this paper, we propose a novel graph learning framework that incorporates the node-side and observation-side information, and in particular the covariates that help to explain the dependency structures in graph signals. To this end, we consider graph signals as functions in the reproducing kernel Hilbert space associated with a Kronecker product kernel, and integrate functional learning with smoothness-promoting graph learning to learn a graph representing the relationship between nodes. The functional learning increases the robustness of graph learning against missing and incomplete information in the graph signals. In addition, we develop a novel graph-based regularisation method which, when combined with the Kronecker product kernel, enables our model to capture both the dependency explained by the graph and the dependency due to graph signals observed under different but related circumstances, e.g. different points in time. The latter means the graph signals are free from the i.i.d. assumptions required by the classical graph learning models. Experiments on both synthetic and real-world data show that our methods outperform the state-of-the-art models in learning a meaningful graph topology from graph signals, in particular under heavy noise, missing values, and multiple dependency. △ Less

Submitted 23 August, 2020; originally announced August 2020.

Comments: 13 pages, with extra 3-page appendices

Journal ref: IEEE Transactions on Signal and Information Processing over Networks, 2021

arXiv:2007.12831 [pdf, other]

doi 10.1109/TIP.2021.3055632

A Self-Training Approach for Point-Supervised Object Detection and Counting in Crowds

Authors: Yi Wang, Junhui Hou, Xinyu Hou, Lap-Pui Chau

Abstract: In this paper, we propose a novel self-training approach named Crowd-SDNet that enables a typical object detector trained only with point-level annotations (i.e., objects are labeled with points) to estimate both the center points and sizes of crowded objects. Specifically, during training, we utilize the available point annotations to supervise the estimation of the center points of objects direc… ▽ More In this paper, we propose a novel self-training approach named Crowd-SDNet that enables a typical object detector trained only with point-level annotations (i.e., objects are labeled with points) to estimate both the center points and sizes of crowded objects. Specifically, during training, we utilize the available point annotations to supervise the estimation of the center points of objects directly. Based on a locally-uniform distribution assumption, we initialize pseudo object sizes from the point-level supervisory information, which are then leveraged to guide the regression of object sizes via a crowdedness-aware loss. Meanwhile, we propose a confidence and order-aware refinement scheme to continuously refine the initial pseudo object sizes such that the ability of the detector is increasingly boosted to detect and count objects in crowds simultaneously. Moreover, to address extremely crowded scenes, we propose an effective decoding method to improve the detector's representation ability. Experimental results on the WiderFace benchmark show that our approach significantly outperforms state-of-the-art point-supervised methods under both detection and counting tasks, i.e., our method improves the average precision by more than 10% and reduces the counting error by 31.2%. Besides, our method obtains the best results on the crowd counting and localization datasets (i.e., ShanghaiTech and NWPU-Crowd) and vehicle counting datasets (i.e., CARPK and PUCPR+) compared with state-of-the-art counting-by-detection methods. The code will be publicly available at https://github.com/WangyiNTU/Point-supervised-crowd-detection. △ Less

Submitted 18 February, 2021; v1 submitted 24 July, 2020; originally announced July 2020.

Comments: 12 pages. Accepted for Publication at IEEE Transactions on Image Processing

arXiv:2007.11882 [pdf, other]

Deep Spatial-angular Regularization for Compressive Light Field Reconstruction over Coded Apertures

Authors: Mantang Guo, Junhui Hou, **g **, Jie Chen, Lap-Pui Chau

Abstract: Coded aperture is a promising approach for capturing the 4-D light field (LF), in which the 4-D data are compressively modulated into 2-D coded measurements that are further decoded by reconstruction algorithms. The bottleneck lies in the reconstruction algorithms, resulting in rather limited reconstruction quality. To tackle this challenge, we propose a novel learning-based framework for the reco… ▽ More Coded aperture is a promising approach for capturing the 4-D light field (LF), in which the 4-D data are compressively modulated into 2-D coded measurements that are further decoded by reconstruction algorithms. The bottleneck lies in the reconstruction algorithms, resulting in rather limited reconstruction quality. To tackle this challenge, we propose a novel learning-based framework for the reconstruction of high-quality LFs from acquisitions via learned coded apertures. The proposed method incorporates the measurement observation into the deep learning framework elegantly to avoid relying entirely on data-driven priors for LF reconstruction. Specifically, we first formulate the compressive LF reconstruction as an inverse problem with an implicit regularization term. Then, we construct the regularization term with an efficient deep spatial-angular convolutional sub-network to comprehensively explore the signal distribution free from the limited representation ability and inefficiency of deterministic mathematical modeling. Experimental results show that the reconstructed LFs not only achieve much higher PSNR/SSIM but also preserve the LF parallax structure better, compared with state-of-the-art methods on both real and synthetic LF benchmarks. In addition, experiments show that our method is efficient and robust to noise, which is an essential advantage for a real camera system. The code is publicly available at \url{https://github.com/angmt2008/LFCA} △ Less

Submitted 23 July, 2020; originally announced July 2020.

arXiv:2006.03847 [pdf, other]

Learning Inconsistent Preferences with Gaussian Processes

Authors: Siu Lun Chau, Javier González, Dino Sejdinovic

Abstract: We revisit widely used preferential Gaussian processes by Chu et al.(2005) and challenge their modelling assumption that imposes rankability of data items via latent utility function values. We propose a generalisation of pgp which can capture more expressive latent preferential structures in the data and thus be used to model inconsistent preferences, i.e. where transitivity is violated, or to di… ▽ More We revisit widely used preferential Gaussian processes by Chu et al.(2005) and challenge their modelling assumption that imposes rankability of data items via latent utility function values. We propose a generalisation of pgp which can capture more expressive latent preferential structures in the data and thus be used to model inconsistent preferences, i.e. where transitivity is violated, or to discover clusters of comparable items via spectral decomposition of the learned preference functions. We also consider the properties of associated covariance kernel functions and its reproducing kernel Hilbert Space (RKHS), giving a simple construction that satisfies universality in the space of preference functions. Finally, we provide an extensive set of numerical experiments on simulated and real-world datasets showcasing the competitiveness of our proposed method with state-of-the-art. Our experimental findings support the conjecture that violations of rankability are ubiquitous in real-world preferential data. △ Less

Submitted 27 January, 2022; v1 submitted 6 June, 2020; originally announced June 2020.

arXiv:2005.04035 [pdf, other]

Spectral Ranking with Covariates

Authors: Siu Lun Chau, Mihai Cucuringu, Dino Sejdinovic

Abstract: We consider spectral approaches to the problem of ranking n players given their incomplete and noisy pairwise comparisons, but revisit this classical problem in light of player covariate information. We propose three spectral ranking methods that incorporate player covariates and are based on seriation, low-rank structure assumption and canonical correlation, respectively. Extensive numerical simu… ▽ More We consider spectral approaches to the problem of ranking n players given their incomplete and noisy pairwise comparisons, but revisit this classical problem in light of player covariate information. We propose three spectral ranking methods that incorporate player covariates and are based on seriation, low-rank structure assumption and canonical correlation, respectively. Extensive numerical simulations on both synthetic and real-world data sets demonstrated that our proposed methods compare favorably to existing state-of-the-art covariate-based ranking algorithms. △ Less

Submitted 6 April, 2022; v1 submitted 8 May, 2020; originally announced May 2020.

arXiv:2004.08796 [pdf, other]

When Residual Learning Meets Dense Aggregation: Rethinking the Aggregation of Deep Neural Networks

Authors: Zhiyu Zhu, Zhen-Peng Bian, Junhui Hou, Yi Wang, Lap-Pui Chau

Abstract: Various architectures (such as GoogLeNets, ResNets, and DenseNets) have been proposed. However, the existing networks usually suffer from either redundancy of convolutional layers or insufficient utilization of parameters. To handle these challenging issues, we propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations. Specifically, residual le… ▽ More Various architectures (such as GoogLeNets, ResNets, and DenseNets) have been proposed. However, the existing networks usually suffer from either redundancy of convolutional layers or insufficient utilization of parameters. To handle these challenging issues, we propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations. Specifically, residual learning aims to efficiently retrieve features from different convolutional blocks, while the micro-dense aggregation is able to enhance each block and avoid redundancy of convolutional layers by lessening residual aggregations. Moreover, the proposed micro-dense architecture has two characteristics: pyramidal multi-level feature learning which can widen the deeper layer in a block progressively, and dimension cardinality adaptive convolution which can balance each layer using linearly increasing dimension cardinality. The experimental results over three datasets (i.e., CIFAR-10, CIFAR-100, and ImageNet-1K) demonstrate that the proposed Micro-Dense Net with only 4M parameters can achieve higher classification accuracy than state-of-the-art networks, while being 12.1$\times$ smaller depends on the number of parameters. In addition, our micro-dense block can be integrated with neural architecture search based models to boost their performance, validating the advantage of our architecture. We believe our design and findings will be beneficial to the DNN community. △ Less

Submitted 24 April, 2020; v1 submitted 19 April, 2020; originally announced April 2020.

arXiv:1909.11862 [pdf, other]

Convolutional Neural Networks with Dynamic Regularization

Authors: Yi Wang, Zhen-Peng Bian, Junhui Hou, Lap-Pui Chau

Abstract: Regularization is commonly used for alleviating overfitting in machine learning. For convolutional neural networks (CNNs), regularization methods, such as DropBlock and Shake-Shake, have illustrated the improvement in the generalization performance. However, these methods lack a self-adaptive ability throughout training. That is, the regularization strength is fixed to a predefined schedule, and m… ▽ More Regularization is commonly used for alleviating overfitting in machine learning. For convolutional neural networks (CNNs), regularization methods, such as DropBlock and Shake-Shake, have illustrated the improvement in the generalization performance. However, these methods lack a self-adaptive ability throughout training. That is, the regularization strength is fixed to a predefined schedule, and manual adjustments are required to adapt to various network architectures. In this paper, we propose a dynamic regularization method for CNNs. Specifically, we model the regularization strength as a function of the training loss. According to the change of the training loss, our method can dynamically adjust the regularization strength in the training procedure, thereby balancing the underfitting and overfitting of CNNs. With dynamic regularization, a large-scale model is automatically regularized by the strong perturbation, and vice versa. Experimental results show that the proposed method can improve the generalization capability on off-the-shelf network architectures and outperform state-of-the-art regularization methods. △ Less

Submitted 30 December, 2020; v1 submitted 25 September, 2019; originally announced September 2019.

Comments: 7 pages. Accepted for Publication at IEEE TNNLS

arXiv:1903.02688 [pdf, other]

Stratified Labeling for Surface Consistent Parallax Correction and Occlusion Completion

Authors: Jie Chen, Lap-Pui Chau, Junhui Hou

Abstract: The light field faithfully records the spatial and angular configurations of the scene, which facilitates a wide range of imaging possibilities. In this work, we propose an LF synthesis algorithm which renders high quality novel LF views far outside the range of angular baselines of the given references. A stratified synthesis strategy is adopted which parses the scene content based on stratified… ▽ More The light field faithfully records the spatial and angular configurations of the scene, which facilitates a wide range of imaging possibilities. In this work, we propose an LF synthesis algorithm which renders high quality novel LF views far outside the range of angular baselines of the given references. A stratified synthesis strategy is adopted which parses the scene content based on stratified disparity layers and across a varying range of spatial granularities. Such a stratified methodology proves to help preserve scene structures over large perspective shifts, and it provides informative clues for inferring the textures of occluded regions. A generative-adversarial network model is further adopted for parallax correction and occlusion completion conditioned on the stratified synthesis features. Experiments show that our proposed model can provide more reliable novel view synthesis quality at large baseline extension ratios. Over 3dB quality improvement has been achieved against state-of-the-art LF view synthesis algorithms. △ Less

Submitted 23 March, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

arXiv:1810.12576 [pdf, other]

Improved Network Robustness with Adversary Critic

Authors: Alexander Matyasko, Lap-Pui Chau

Abstract: Ideally, what confuses neural network should be confusing to humans. However, recent experiments have shown that small, imperceptible perturbations can change the network prediction. To address this gap in perception, we propose a novel approach for learning robust classifier. Our main idea is: adversarial examples for the robust classifier should be indistinguishable from the regular data of the… ▽ More Ideally, what confuses neural network should be confusing to humans. However, recent experiments have shown that small, imperceptible perturbations can change the network prediction. To address this gap in perception, we propose a novel approach for learning robust classifier. Our main idea is: adversarial examples for the robust classifier should be indistinguishable from the regular data of the adversarial target. We formulate a problem of learning robust classifier in the framework of Generative Adversarial Networks (GAN), where the adversarial attack on classifier acts as a generator, and the critic network learns to distinguish between regular and adversarial images. The classifier cost is augmented with the objective that its adversarial examples should confuse the adversary critic. To improve the stability of the adversarial map**, we introduce adversarial cycle-consistency constraint which ensures that the adversarial map** of the adversarial examples is close to the original. In the experiments, we show the effectiveness of our defense. Our method surpasses in terms of robustness networks trained with adversarial training. Additionally, we verify in the experiments with human annotators on MTurk that adversarial examples are indeed visually confusing. Codes for the project are available at https://github.com/aam-at/adversary_critic. △ Less

Submitted 30 October, 2018; originally announced October 2018.

arXiv:1808.06207 [pdf, other]

Haze Density Estimation via Modeling of Scattering Coefficients of Iso-depth Regions

Authors: Jie Chen, Cheen-Hau Tan, Lap-Pui Chau

Abstract: Vision based haze density estimation is of practical implications for the purpose of precaution alarm and emergency reactions toward disastrous hazy weathers. In this paper, we introduce a haze density estimation framework based on modeling of scattering coefficients of iso-depth regions. A haze density metric of Normalized Scattering Coefficient (NSC) is proposed to measure current haze density l… ▽ More Vision based haze density estimation is of practical implications for the purpose of precaution alarm and emergency reactions toward disastrous hazy weathers. In this paper, we introduce a haze density estimation framework based on modeling of scattering coefficients of iso-depth regions. A haze density metric of Normalized Scattering Coefficient (NSC) is proposed to measure current haze density level with reference to two reference scales. Iso-depth regions are determined via superpixel segmentation. Efficient searching and matching of iso-depth units could be carried out for measurements via unstationary cameras. A robust dark SP selection method is used to produce reliable predictions for most out-door scenarios. △ Less

Submitted 19 August, 2018; originally announced August 2018.

Showing 1–50 of 72 results for author: Chau, L