Search | arXiv e-print repository

OPIMA: Optical Processing-In-Memory for Convolutional Neural Network Acceleration

Authors: Febin Sunny, Amin Shafiee, Abhishek Balasubramaniam, Mahdi Nikdast, Sudeep Pasricha

Abstract: Recent advances in machine learning (ML) have spotlighted the pressing need for computing architectures that bridge the gap between memory bandwidth and processing power. The advent of deep neural networks has pushed traditional Von Neumann architectures to their limits due to the high latency and energy consumption costs associated with data movement between the processor and memory for these wor… ▽ More Recent advances in machine learning (ML) have spotlighted the pressing need for computing architectures that bridge the gap between memory bandwidth and processing power. The advent of deep neural networks has pushed traditional Von Neumann architectures to their limits due to the high latency and energy consumption costs associated with data movement between the processor and memory for these workloads. One of the solutions to overcome this bottleneck is to perform computation within the main memory through processing-in-memory (PIM), thereby limiting data movement and the costs associated with it. However, DRAM-based PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks and the need for frequent refresh operations. In this work, we introduce OPIMA, a PIM-based ML accelerator, architected within an optical main memory. OPIMA has been designed to leverage the inherent massive parallelism within main memory while performing high-speed, low-energy optical computation to accelerate ML models based on convolutional neural networks. We present a comprehensive analysis of OPIMA to guide design choices and operational mechanisms. Additionally, we evaluate the performance and energy consumption of OPIMA, comparing it with conventional electronic computing systems and emerging photonic PIM architectures. The experimental results show that OPIMA can achieve 2.98x higher throughput and 137x better energy efficiency than the best-known prior work. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2303.02191 [pdf]

R-TOSS: A Framework for Real-Time Object Detection using Semi-Structured Pruning

Authors: Abhishek Balasubramaniam, Febin P Sunny, Sudeep Pasricha

Abstract: Object detectors used in autonomous vehicles can have high memory and computational overheads. In this paper, we introduce a novel semi-structured pruning framework called R-TOSS that overcomes the shortcomings of state-of-the-art model pruning techniques. Experimental results on the JetsonTX2 show that R-TOSS has a compression rate of 4.4x on the YOLOv5 object detector with a 2.15x speedup in inf… ▽ More Object detectors used in autonomous vehicles can have high memory and computational overheads. In this paper, we introduce a novel semi-structured pruning framework called R-TOSS that overcomes the shortcomings of state-of-the-art model pruning techniques. Experimental results on the JetsonTX2 show that R-TOSS has a compression rate of 4.4x on the YOLOv5 object detector with a 2.15x speedup in inference time and 57.01% decrease in energy usage. R-TOSS also enables 2.89x compression on RetinaNet with a 1.86x speedup in inference time and 56.31% decrease in energy usage. We also demonstrate significant improvements compared to various state-of-the-art pruning techniques. △ Less

Submitted 3 March, 2023; originally announced March 2023.

arXiv:2205.03702 [pdf, other]

Keratoconus Classifier for Smartphone-based Corneal Topographer

Authors: Siddhartha Gairola, Pallavi Joshi, Anand Balasubramaniam, Kaushik Murali, Nipun Kwatra, Mohit Jain

Abstract: Keratoconus is a severe eye disease that leads to deformation of the cornea. It impacts people aged 10-25 years and is the leading cause of blindness in that demography. Corneal topography is the gold standard for keratoconus diagnosis. It is a non-invasive process performed using expensive and bulky medical devices called corneal topographers. This makes it inaccessible to large populations, espe… ▽ More Keratoconus is a severe eye disease that leads to deformation of the cornea. It impacts people aged 10-25 years and is the leading cause of blindness in that demography. Corneal topography is the gold standard for keratoconus diagnosis. It is a non-invasive process performed using expensive and bulky medical devices called corneal topographers. This makes it inaccessible to large populations, especially in the Global South. Low-cost smartphone-based corneal topographers, such as SmartKC, have been proposed to make keratoconus diagnosis accessible. Similar to medical-grade topographers, SmartKC outputs curvature heatmaps and quantitative metrics that need to be evaluated by doctors for keratoconus diagnosis. An automatic scheme for evaluation of these heatmaps and quantitative values can play a crucial role in screening keratoconus in areas where doctors are not available. In this work, we propose a dual-head convolutional neural network (CNN) for classifying keratoconus on the heatmaps generated by SmartKC. Since SmartKC is a new device and only had a small dataset (114 samples), we developed a 2-stage transfer learning strategy -- using historical data collected from a medical-grade topographer and a subset of SmartKC data -- to satisfactorily train our network. This, combined with our domain-specific data augmentations, achieved a sensitivity of 91.3% and a specificity of 94.2%. △ Less

Submitted 7 May, 2022; originally announced May 2022.

Comments: 4 pages

arXiv:2201.07706 [pdf]

Object Detection in Autonomous Vehicles: Status and Open Challenges

Authors: Abhishek Balasubramaniam, Sudeep Pasricha

Abstract: Object detection is a computer vision task that has become an integral part of many consumer applications today such as surveillance and security systems, mobile text recognition, and diagnosing diseases from MRI/CT scans. Object detection is also one of the critical components to support autonomous driving. Autonomous vehicles rely on the perception of their surroundings to ensure safe and robust… ▽ More Object detection is a computer vision task that has become an integral part of many consumer applications today such as surveillance and security systems, mobile text recognition, and diagnosing diseases from MRI/CT scans. Object detection is also one of the critical components to support autonomous driving. Autonomous vehicles rely on the perception of their surroundings to ensure safe and robust driving performance. This perception system uses object detection algorithms to accurately determine objects such as pedestrians, vehicles, traffic signs, and barriers in the vehicle's vicinity. Deep learning-based object detectors play a vital role in finding and localizing these objects in real-time. This article discusses the state-of-the-art in object detectors and open challenges for their integration into autonomous vehicles. △ Less

Submitted 19 January, 2022; originally announced January 2022.

arXiv:2111.03592 [pdf, other]

Nonnegative Matrix Factorization to understand Spatio-Temporal Traffic Pattern Variations during COVID-19: A Case Study

Authors: Anandkumar Balasubramaniam, Thirunavukarasu Balasubramaniam, Rathinaraja Jeyaraj, Anand Paul, Richi Nayak

Abstract: Due to the rapid developments in Intelligent Transportation System (ITS) and increasing trend in the number of vehicles on road, abundant of road traffic data is generated and available. Understanding spatio-temporal traffic patterns from this data is crucial and has been effectively hel** in traffic plannings, road constructions, etc. However, understanding traffic patterns during COVID-19 pand… ▽ More Due to the rapid developments in Intelligent Transportation System (ITS) and increasing trend in the number of vehicles on road, abundant of road traffic data is generated and available. Understanding spatio-temporal traffic patterns from this data is crucial and has been effectively hel** in traffic plannings, road constructions, etc. However, understanding traffic patterns during COVID-19 pandemic is quite challenging and important as there is a huge difference in-terms of people's and vehicle's travel behavioural patterns. In this paper, a case study is conducted to understand the variations in spatio-temporal traffic patterns during COVID-19. We apply nonnegative matrix factorization (NMF) to elicit patterns. The NMF model outputs are analysed based on the spatio-temporal pattern behaviours observed during the year 2019 and 2020, which is before pandemic and during pandemic situations respectively, in Great Britain. The outputs of the analysed spatio-temporal traffic pattern variation behaviours will be useful in the fields of traffic management in Intelligent Transportation System and management in various stages of pandemic or unavoidable scenarios in-relation to road traffic. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Comments: Accepted in the 19th Australasian Data Mining Conference 2021

arXiv:2111.01354 [pdf, other]

SmartKC: Smartphone-based Corneal Topographer for Keratoconus Detection

Authors: Siddhartha Gairola, Murtuza Bohra, Nadeem Shaheer, Navya Jayaprakash, Pallavi Joshi, Anand Balasubramaniam, Kaushik Murali, Nipun Kwatra, Mohit Jain

Abstract: Keratoconus is a severe eye disease affecting the cornea (the clear, dome-shaped outer surface of the eye), causing it to become thin and develop a conical bulge. The diagnosis of keratoconus requires sophisticated ophthalmic devices which are non-portable and very expensive. This makes early detection of keratoconus inaccessible to large populations in low- and middle-income countries, making it… ▽ More Keratoconus is a severe eye disease affecting the cornea (the clear, dome-shaped outer surface of the eye), causing it to become thin and develop a conical bulge. The diagnosis of keratoconus requires sophisticated ophthalmic devices which are non-portable and very expensive. This makes early detection of keratoconus inaccessible to large populations in low- and middle-income countries, making it a leading cause for partial/complete blindness among such populations. We propose SmartKC, a low-cost, smartphone-based keratoconus diagnosis system comprising of a 3D-printed placido's disc attachment, an LED light strip, and an intelligent smartphone app to capture the reflection of the placido rings on the cornea. An image processing pipeline analyzes the corneal image and uses the smartphone's camera parameters, the placido rings' 3D location, the pixel location of the reflected placido rings and the setup's working distance to construct the corneal surface, via the Arc-Step method and Zernike polynomials based surface fitting. In a clinical study with 101 distinct eyes, we found that SmartKC achieves a sensitivity of 94.1% and a specificity of 100.0%. Moreover, the quantitative curvature estimates (sim-K) strongly correlate with a gold-standard medical device (Pearson correlation coefficient =0.78). Our results indicate that SmartKC has the potential to be used as a keratoconus screening tool under real-world medical settings. △ Less

Submitted 21 January, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: Change Log: + Fixed sim-K computation (updated Section 5.5.3); re-ran our pipeline with the updated sim-K values (updated Figure 7); + Conducted the comparative evaluation with doctors again (total 4 doctors), and got improved results (updated Section 7.2 and Table 2); [Note: This is an updated version of the paper that was accepted for publication in IMWUT 2021.]

arXiv:2106.16006 [pdf, other]

Improving the Efficiency of Transformers for Resource-Constrained Devices

Authors: Hamid Tabani, Ajay Balasubramaniam, Shabbir Marzban, Elahe Arani, Bahram Zonooz

Abstract: Transformers provide promising accuracy and have become popular and used in various domains such as natural language processing and computer vision. However, due to their massive number of model parameters, memory and computation requirements, they are not suitable for resource-constrained low-power devices. Even with high-performance and specialized devices, the memory bandwidth can become a perf… ▽ More Transformers provide promising accuracy and have become popular and used in various domains such as natural language processing and computer vision. However, due to their massive number of model parameters, memory and computation requirements, they are not suitable for resource-constrained low-power devices. Even with high-performance and specialized devices, the memory bandwidth can become a performance-limiting bottleneck. In this paper, we present a performance analysis of state-of-the-art vision transformers on several devices. We propose to reduce the overall memory footprint and memory transfers by clustering the model parameters. We show that by using only 64 clusters to represent model parameters, it is possible to reduce the data transfer from the main memory by more than 4x, achieve up to 22% speedup and 39% energy savings on mobile devices with less than 0.1% accuracy loss. △ Less

Submitted 30 June, 2021; originally announced June 2021.

Comments: This paper is accepted as a full paper at 24th Euromicro Conference on Digital System Design (DSD)

arXiv:2105.02613 [pdf, other]

Challenges and Obstacles Towards Deploying Deep Learning Models on Mobile Devices

Authors: Hamid Tabani, Ajay Balasubramaniam, Elahe Arani, Bahram Zonooz

Abstract: From computer vision and speech recognition to forecasting trajectories in autonomous vehicles, deep learning approaches are at the forefront of so many domains. Deep learning models are developed using plethora of high-level, generic frameworks and libraries. Running those models on the mobile devices require hardware-aware optimizations and in most cases converting the models to other formats or… ▽ More From computer vision and speech recognition to forecasting trajectories in autonomous vehicles, deep learning approaches are at the forefront of so many domains. Deep learning models are developed using plethora of high-level, generic frameworks and libraries. Running those models on the mobile devices require hardware-aware optimizations and in most cases converting the models to other formats or using a third-party framework. In reality, most of the developed models need to undergo a process of conversion, adaptation, and, in some cases, full retraining to match the requirements and features of the framework that is deploying the model on the target platform. Variety of hardware platforms with heterogeneous computing elements, from wearable devices to high-performance GPU clusters are used to run deep learning models. In this paper, we present the existing challenges, obstacles, and practical solutions towards deploying deep learning models on mobile devices. △ Less

Submitted 6 May, 2021; originally announced May 2021.

arXiv:2006.04532 [pdf, other]

Detecting Problem Statements in Peer Assessments

Authors: Yunkai Xiao, Gabriel Zingle, Qin** Jia, Harsh R. Shah, Yi Zhang, Tianyi Li, Mohsin Karovaliya, Weixiang Zhao, Yang Song, Jie Ji, Ashwin Balasubramaniam, Harshit Patel, Priyankha Bhalasubbramanian, Vikram Patel, Edward F. Gehringer

Abstract: Effective peer assessment requires students to be attentive to the deficiencies in the work they rate. Thus, their reviews should identify problems. But what ways are there to check that they do? We attempt to automate the process of deciding whether a review comment detects a problem. We use over 18,000 review comments that were labeled by the reviewees as either detecting or not detecting a prob… ▽ More Effective peer assessment requires students to be attentive to the deficiencies in the work they rate. Thus, their reviews should identify problems. But what ways are there to check that they do? We attempt to automate the process of deciding whether a review comment detects a problem. We use over 18,000 review comments that were labeled by the reviewees as either detecting or not detecting a problem with the work. We deploy several traditional machine-learning models, as well as neural-network models using GloVe and BERT embeddings. We find that the best performer is the Hierarchical Attention Network classifier, followed by the Bidirectional Gated Recurrent Units (GRU) Attention and Capsule model with scores of 93.1% and 90.5% respectively. The best non-neural network model was the support vector machine with a score of 89.71%. This is followed by the Stochastic Gradient Descent model and the Logistic Regression model with 89.70% and 88.98%. △ Less

Submitted 29 May, 2020; originally announced June 2020.

Comments: 8 pages, 9 images. Extended version of a paper published at EDM 2020, 13th International Conference on Educational Data Mining

ACM Class: I.2.7

arXiv:2006.02106 [pdf, other]

doi 10.1093/mnras/stab1356

The GALAH survey: A census of lithium-rich giant stars

Authors: Sarah Martell, Jeffrey Simpson, Adithya Balasubramaniam, Sven Buder, Sanjib Sharma, Marc Hon, Dennis Stello, Yuan-Sen Ting, Martin Asplund, Joss Bland-Hawthorn, Gayandhi De Silva, Ken Freeman, Michael Hayden, Janez Kos, Geraint Lewis, Karin Lind, Daniel Zucker, Tomaz Zwitter, Simon Campbell, Klemen Cotar, Jonathan Horner, Benjamin Montet, Rob Wittenmyer

Abstract: We investigate the properties of 1262 red giant stars with high photospheric abundances of lithium observed by the GALAH and \Ktwo-HERMES surveys, and discuss them in the context of proposed mechanisms for lithium enrichment and re-depletion in giant stars. We confirm that Li-rich giants are rare, making up only 1.2 per cent of our giant star sample. We use stellar parameters from the third public… ▽ More We investigate the properties of 1262 red giant stars with high photospheric abundances of lithium observed by the GALAH and \Ktwo-HERMES surveys, and discuss them in the context of proposed mechanisms for lithium enrichment and re-depletion in giant stars. We confirm that Li-rich giants are rare, making up only 1.2 per cent of our giant star sample. We use stellar parameters from the third public data release from the GALAH survey and a Bayesian isochrone analysis to divide the sample into first-ascent red giant branch and red clump stars, and confirm these classifications using asteroseismic data from \Ktwo. We find that red clump stars are 2.5 times as likely to be lithium-rich as red giant branch stars, in agreement with other recent work. The probability for a star to be lithium-rich is affected by a number of factors, though the causality in those correlations is not entirely clear. We show for the first time that primary and secondary red clump stars have distinctly different lithium enrichment patterns. The data set discussed here is large and heterogeneous in terms of evolutionary phase, metallicity, rotation rate and mass. We expect that if the various mechanisms that have been proposed for lithium enrichment in evolved stars are in fact active, they should all contribute to this sample of lithium-rich giants at some level. △ Less

Submitted 10 May, 2021; v1 submitted 3 June, 2020; originally announced June 2020.

Comments: 17 pages, 12 figures, MNRAS accepted

Showing 1–10 of 10 results for author: Balasubramaniam, A