Search | arXiv e-print repository

Exploring Camera Encoder Designs for Autonomous Driving Perception

Authors: Barath Lakshmanan, Joshua Chen, Shiyi Lan, Maying Shen, Zhiding Yu, Jose M. Alvarez

Abstract: The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accur… ▽ More The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accuracy in AV-related tasks, e.g., 3D Object Detection, there remains significant potential for improvement in network design due to the nuanced complexities of industrial-level AV dataset. Moreover, existing public AV benchmarks usually contain insufficient data, which might lead to inaccurate evaluation of those architectures.To reveal the AV-specific model insights, we start from a standard general-purpose encoder, ConvNeXt and progressively transform the design. We adjust different design parameters including width and depth of the model, stage compute ratio, attention mechanisms, and input resolution, supported by systematic analysis to each modifications. This customization yields an architecture optimized for AV camera encoder achieving 8.79% mAP improvement over the baseline. We believe our effort could become a sweet cookbook of image encoders for AV and pave the way to the next-level drive system. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2406.12079 [pdf, other]

Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint

Authors: Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, **gde Chen, Jose Alvarez

Abstract: As we push the boundaries of performance in various vision tasks, the models grow in size correspondingly. To keep up with this growth, we need very aggressive pruning techniques for efficient inference and deployment on edge devices. Existing pruning approaches are limited to channel pruning and struggle with aggressive parameter reductions. In this paper, we propose a novel multi-dimensional pru… ▽ More As we push the boundaries of performance in various vision tasks, the models grow in size correspondingly. To keep up with this growth, we need very aggressive pruning techniques for efficient inference and deployment on edge devices. Existing pruning approaches are limited to channel pruning and struggle with aggressive parameter reductions. In this paper, we propose a novel multi-dimensional pruning framework that jointly optimizes pruning across channels, layers, and blocks while adhering to latency constraints. We develop a latency modeling technique that accurately captures model-wide latency variations during pruning, which is crucial for achieving an optimal latency-accuracy trade-offs at high pruning ratio. We reformulate pruning as a Mixed-Integer Nonlinear Program (MINLP) to efficiently determine the optimal pruned structure with only a single pass. Our extensive results demonstrate substantial improvements over previous methods, particularly at large pruning ratios. In classification, our method significantly outperforms prior art HALP with a Top-1 accuracy of 70.0(v.s. 68.6) and an FPS of 5262 im/s(v.s. 4101 im/s). In 3D object detection, we establish a new state-of-the-art by pruning StreamPETR at a 45% pruning ratio, achieving higher FPS (37.3 vs. 31.7) and mAP (0.451 vs. 0.449) than the dense baseline. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Under Review

arXiv:2202.08341 [pdf, other]

Anomalib: A Deep Learning Library for Anomaly Detection

Authors: Samet Akcay, Dick Ameln, Ashwin Vaidya, Barath Lakshmanan, Nilesh Ahuja, Utku Genc

Abstract: This paper introduces anomalib, a novel library for unsupervised anomaly detection and localization. With reproducibility and modularity in mind, this open-source library provides algorithms from the literature and a set of tools to design custom anomaly detection algorithms via a plug-and-play approach. Anomalib comprises state-of-the-art anomaly detection algorithms that achieve top performance… ▽ More This paper introduces anomalib, a novel library for unsupervised anomaly detection and localization. With reproducibility and modularity in mind, this open-source library provides algorithms from the literature and a set of tools to design custom anomaly detection algorithms via a plug-and-play approach. Anomalib comprises state-of-the-art anomaly detection algorithms that achieve top performance on the benchmarks and that can be used off-the-shelf. In addition, the library provides components to design custom algorithms that could be tailored towards specific needs. Additional tools, including experiment trackers, visualizers, and hyper-parameter optimizers, make it simple to design and implement anomaly detection models. The library also supports OpenVINO model optimization and quantization for real-time deployment. Overall, anomalib is an extensive library for the design, implementation, and deployment of unsupervised anomaly detection models from data to the edge. △ Less

Submitted 16 February, 2022; originally announced February 2022.

arXiv:2010.03189 [pdf]

Theedhum Nandrum@Dravidian-CodeMix-FIRE2020: A Sentiment Polarity Classifier for YouTube Comments with Code-switching between Tamil, Malayalam and English

Authors: BalaSundaraRaman Lakshmanan, Sanjeeth Kumar Ravindranath

Abstract: Theedhum Nandrum is a sentiment polarity detection system using two approaches--a Stochastic Gradient Descent (SGD) based classifier and a Long Short-term Memory (LSTM) based Classifier. Our approach utilises language features like use of emoji, choice of scripts and code mixing which appeared quite marked in the datasets specified for the Dravidian Codemix - FIRE 2020 task. The hyperparameters fo… ▽ More Theedhum Nandrum is a sentiment polarity detection system using two approaches--a Stochastic Gradient Descent (SGD) based classifier and a Long Short-term Memory (LSTM) based Classifier. Our approach utilises language features like use of emoji, choice of scripts and code mixing which appeared quite marked in the datasets specified for the Dravidian Codemix - FIRE 2020 task. The hyperparameters for the SGD were tuned using GridSearchCV. Our system was ranked 4th in Tamil-English with a weighted average F1 score of 0.62 and 9th in Malayalam-English with a score of 0.65. We achieved a weighted average F1 score of 0.77 for Tamil-English using a Logistic Regression based model after the task deadline. This performance betters the top ranked classifier on this dataset by a wide margin. Our use of language-specific Soundex to harmonise the spelling variants in code-mixed data appears to be a novel application of Soundex. Our complete code is published in github at https://github.com/oligoglot/theedhum-nandrum. △ Less

Submitted 13 October, 2020; v1 submitted 7 October, 2020; originally announced October 2020.

Comments: FIRE 2020, December 16-20, 2020, Hyderabad, India

arXiv:1812.09127 [pdf, other]

A Smart Security System with Face Recognition

Authors: Trung Nguyen, Barth Lakshmanan, Weihua Sheng

Abstract: Web-based technology has improved drastically in the past decade. As a result, security technology has become a major help to protect our daily life. In this paper, we propose a robust security based on face recognition system (SoF). In particular, we develop this system to giving access into a home for authenticated users. The classifier is trained by using a new adaptive learning method. The tra… ▽ More Web-based technology has improved drastically in the past decade. As a result, security technology has become a major help to protect our daily life. In this paper, we propose a robust security based on face recognition system (SoF). In particular, we develop this system to giving access into a home for authenticated users. The classifier is trained by using a new adaptive learning method. The training data are initially collected from social networks. The accuracy of the classifier is incrementally improved as the user starts using the system. A novel method has been introduced to improve the classifier model by human interaction and social media. By using a deep learning framework - TensorFlow, it will be easy to reuse the framework to adopt with many devices and applications. △ Less

Submitted 3 December, 2018; originally announced December 2018.

arXiv:1308.4317 [pdf]

doi 10.1016/j.electacta.2011.10.020

Quantifying Oxidation Rates of Carbon Monoxide on a Pt/C Electrode

Authors: Siva Balasubramanian, Balasubramanian Lakshmanan, Christine E. Hetzke, Vijay A. Sethuraman, John W. Weidner

Abstract: The electrochemical oxidation of carbon monoxide adsorbed (COad) on platinum-on-carbon electrodes was studied via a methodology in which pre-adsorbed CO was partially oxidized by applying potentiostatic pulses for certain durations. The residual COad was analyzed using strip** voltammetry that involved the deconvolution of COad oxidation peaks of voltammograms to quantify the weakly and strongly… ▽ More The electrochemical oxidation of carbon monoxide adsorbed (COad) on platinum-on-carbon electrodes was studied via a methodology in which pre-adsorbed CO was partially oxidized by applying potentiostatic pulses for certain durations. The residual COad was analyzed using strip** voltammetry that involved the deconvolution of COad oxidation peaks of voltammograms to quantify the weakly and strongly bound species of COad. The data obtained for various potentials and temperatures were fit to a model based on a nucleation and growth mechanism. The resulting fit produced potential- and temperature-dependent rate parameters that provided insight into the oxidation mechanism of the two COad species. Irrespective of the applied potential or temperature, the concentration of weakly bound COad species decreased exponentially with time. In contrast, the strongly bound COad species showed a gradual transition of mechanisms, from progressive nucleation at relatively low potentials to exponential decay at high potentials. △ Less

Submitted 20 August, 2013; originally announced August 2013.

Comments: 15 pages, 9 figures

Journal ref: Electrochimica Acta, 58, 723-728, 2011

Showing 1–6 of 6 results for author: Lakshmanan, B