Search | arXiv e-print repository

Giving each task what it needs -- leveraging structured sparsity for tailored multi-task learning

Authors: Richa Upadhyay, Ronald Phlypo, Rajkumar Saini, Marcus Liwicki

Abstract: Every task demands distinct feature representations, ranging from low-level to high-level attributes, so it is vital to address the specific needs of each task, especially in the Multi-task Learning (MTL) framework. This work, therefore, introduces Layer-Optimized Multi-Task (LOMT) models that utilize structured sparsity to refine feature selection for individual tasks and enhance the performance… ▽ More Every task demands distinct feature representations, ranging from low-level to high-level attributes, so it is vital to address the specific needs of each task, especially in the Multi-task Learning (MTL) framework. This work, therefore, introduces Layer-Optimized Multi-Task (LOMT) models that utilize structured sparsity to refine feature selection for individual tasks and enhance the performance of all tasks in a multi-task scenario. Structured or group sparsity systematically eliminates parameters from trivial channels and, eventually, entire layers within a convolution neural network during training. Consequently, the remaining layers provide the most optimal features for a given task. In this two-step approach, we subsequently leverage this sparsity-induced optimal layer information to build the LOMT models by connecting task-specific decoders to these strategically identified layers, deviating from conventional approaches that uniformly connect decoders at the end of the network. This tailored architecture optimizes the network, focusing on essential features while reducing redundancy. We validate the efficacy of the proposed approach on two datasets, ie NYU-v2 and CelebAMask-HD datasets, for multiple heterogeneous tasks. A detailed performance analysis of the LOMT models, in contrast to the conventional MTL models, reveals that the LOMT models outperform for most task combinations. The excellent qualitative and quantitative outcomes highlight the effectiveness of employing structured sparsity for optimal layer (or feature) selection. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.14874 [pdf, other]

Investigating Robustness of Open-Vocabulary Foundation Object Detectors under Distribution Shifts

Authors: Prakash Chandra Chhipa, Kanjar De, Meenakshi Subhash Chippa, Rajkumar Saini, Marcus Liwicki

Abstract: The challenge of Out-Of-Distribution (OOD) robustness remains a critical hurdle towards deploying deep vision models. Open-vocabulary object detection extends the capabilities of traditional object detection frameworks to recognize and classify objects beyond predefined categories. Investigating OOD robustness in open-vocabulary object detection is essential to increase the trustworthiness of thes… ▽ More The challenge of Out-Of-Distribution (OOD) robustness remains a critical hurdle towards deploying deep vision models. Open-vocabulary object detection extends the capabilities of traditional object detection frameworks to recognize and classify objects beyond predefined categories. Investigating OOD robustness in open-vocabulary object detection is essential to increase the trustworthiness of these models. This study presents a comprehensive robustness evaluation of zero-shot capabilities of three recent open-vocabulary foundation object detection models, namely OWL-ViT, YOLO World, and Grounding DINO. Experiments carried out on the COCO-O and COCO-C benchmarks encompassing distribution shifts highlight the challenges of the models' robustness. Source code shall be made available to the research community on GitHub. △ Less

Submitted 1 June, 2024; v1 submitted 1 April, 2024; originally announced May 2024.

Comments: 13 + 3 single column pages

arXiv:2405.02296 [pdf, other]

Möbius Transform for Mitigating Perspective Distortions in Representation Learning

Authors: Prakash Chandra Chhipa, Meenakshi Subhash Chippa, Kanjar De, Rajkumar Saini, Marcus Liwicki, Mubarak Shah

Abstract: Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships of visual concepts in images. Precisely estimating camera intrinsic and extrinsic parameters is a challenging task that prevents synthesizing perspective distortion. Non-availability of dedicated training data poses a critical barrier to develo** robust computer vision me… ▽ More Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships of visual concepts in images. Precisely estimating camera intrinsic and extrinsic parameters is a challenging task that prevents synthesizing perspective distortion. Non-availability of dedicated training data poses a critical barrier to develo** robust computer vision methods. Additionally, distortion correction methods make other computer vision tasks a multi-step approach and lack performance. In this work, we propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of Möbius transform to model real-world distortion without estimating camera intrinsic and extrinsic parameters and without the need for actual distorted data. Also, we present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset. The proposed method outperforms on existing benchmarks, ImageNet-E and ImageNet-X. Additionally, it significantly improves performance on ImageNet-PD while consistently performing on standard data distribution. Further, our method shows improved performance on three PD-affected real-world applications: crowd counting, fisheye image recognition, and person re-identification. We will release source code, dataset, and models for foster further research. △ Less

Submitted 7 March, 2024; originally announced May 2024.

arXiv:2403.15017 [pdf, other]

Vehicle Detection Performance in Nordic Region

Authors: Hamam Mokayed, Rajkumar Saini, Oluwatosin Adewumi, Lama Alkhaled, Bjorn Backe, Palaiahnakote Shivakumara, Olle Hagner, Yan Chai Hum

Abstract: This paper addresses the critical challenge of vehicle detection in the harsh winter conditions in the Nordic regions, characterized by heavy snowfall, reduced visibility, and low lighting. Due to their susceptibility to environmental distortions and occlusions, traditional vehicle detection methods have struggled in these adverse conditions. The advanced proposed deep learning architectures broug… ▽ More This paper addresses the critical challenge of vehicle detection in the harsh winter conditions in the Nordic regions, characterized by heavy snowfall, reduced visibility, and low lighting. Due to their susceptibility to environmental distortions and occlusions, traditional vehicle detection methods have struggled in these adverse conditions. The advanced proposed deep learning architectures brought promise, yet the unique difficulties of detecting vehicles in Nordic winters remain inadequately addressed. This study uses the Nordic Vehicle Dataset (NVD), which has UAV images from northern Sweden, to evaluate the performance of state-of-the-art vehicle detection algorithms under challenging weather conditions. Our methodology includes a comprehensive evaluation of single-stage, two-stage, and transformer-based detectors against the NVD. We propose a series of enhancements tailored to each detection framework, including data augmentation, hyperparameter tuning, transfer learning, and novel strategies designed explicitly for the DETR model. Our findings not only highlight the limitations of current detection systems in the Nordic environment but also offer promising directions for enhancing these algorithms for improved robustness and accuracy in vehicle detection amidst the complexities of winter landscapes. The code and the dataset are available at https://nvd.ltu-ai.dev △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: submitted to ICPR2024

arXiv:2308.12114 [pdf, other]

Less is More -- Towards parsimonious multi-task models using structured sparsity

Authors: Richa Upadhyay, Ronald Phlypo, Rajkumar Saini, Marcus Liwicki

Abstract: Model sparsification in deep learning promotes simpler, more interpretable models with fewer parameters. This not only reduces the model's memory footprint and computational needs but also shortens inference time. This work focuses on creating sparse models optimized for multiple tasks with fewer parameters. These parsimonious models also possess the potential to match or outperform dense models i… ▽ More Model sparsification in deep learning promotes simpler, more interpretable models with fewer parameters. This not only reduces the model's memory footprint and computational needs but also shortens inference time. This work focuses on creating sparse models optimized for multiple tasks with fewer parameters. These parsimonious models also possess the potential to match or outperform dense models in terms of performance. In this work, we introduce channel-wise l1/l2 group sparsity in the shared convolutional layers parameters (or weights) of the multi-task learning model. This approach facilitates the removal of extraneous groups i.e., channels (due to l1 regularization) and also imposes a penalty on the weights, further enhancing the learning efficiency for all tasks (due to l2 regularization). We analyzed the results of group sparsity in both single-task and multi-task settings on two widely-used Multi-Task Learning (MTL) datasets: NYU-v2 and CelebAMask-HQ. On both datasets, which consist of three different computer vision tasks each, multi-task models with approximately 70% sparsity outperform their dense equivalents. We also investigate how changing the degree of sparsification influences the model's performance, the overall sparsity percentage, the patterns of sparsity, and the inference time. △ Less

Submitted 30 November, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: accepted at First Conference on Parsimony and Learning (CPAL 2024)

arXiv:2308.08818 [pdf, ps, other]

User-Pair Selection for QoS-Aware Secrecy Rate Maximization in Untrusted NOMA

Authors: Sapna Thapar, Deepak Mishra, Ravikant Saini, Zhiguo Ding

Abstract: Non-orthogonal multiple access (NOMA) has been recognized as one of the key enabling technologies for future generation wireless networks. Sharing the same time-frequency resource among users imposes secrecy challenges in NOMA in the presence of untrusted users. This paper characterizes the impact of user-pair selection on the secrecy performance of an untrusted NOMA system. In this regard, an opt… ▽ More Non-orthogonal multiple access (NOMA) has been recognized as one of the key enabling technologies for future generation wireless networks. Sharing the same time-frequency resource among users imposes secrecy challenges in NOMA in the presence of untrusted users. This paper characterizes the impact of user-pair selection on the secrecy performance of an untrusted NOMA system. In this regard, an optimization problem is formulated to maximize the secrecy rate of the strong user while satisfying the quality of service (QoS) demands of the user with poorer channel conditions. To solve this problem, we first obtain optimal power allocation in a two-user NOMA system, and then investigate the user-pair selection problem in a more generalized four user NOMA system. Extensive performance evaluations are conducted to validate the accuracy of the proposed results and present valuable insights on the impact of various system parameters on the secrecy performance of the NOMA communication system. △ Less