-
OPIMA: Optical Processing-In-Memory for Convolutional Neural Network Acceleration
Authors:
Febin Sunny,
Amin Shafiee,
Abhishek Balasubramaniam,
Mahdi Nikdast,
Sudeep Pasricha
Abstract:
Recent advances in machine learning (ML) have spotlighted the pressing need for computing architectures that bridge the gap between memory bandwidth and processing power. The advent of deep neural networks has pushed traditional Von Neumann architectures to their limits due to the high latency and energy consumption costs associated with data movement between the processor and memory for these wor…
▽ More
Recent advances in machine learning (ML) have spotlighted the pressing need for computing architectures that bridge the gap between memory bandwidth and processing power. The advent of deep neural networks has pushed traditional Von Neumann architectures to their limits due to the high latency and energy consumption costs associated with data movement between the processor and memory for these workloads. One of the solutions to overcome this bottleneck is to perform computation within the main memory through processing-in-memory (PIM), thereby limiting data movement and the costs associated with it. However, DRAM-based PIM struggles to achieve high throughput and energy efficiency due to internal data movement bottlenecks and the need for frequent refresh operations. In this work, we introduce OPIMA, a PIM-based ML accelerator, architected within an optical main memory. OPIMA has been designed to leverage the inherent massive parallelism within main memory while performing high-speed, low-energy optical computation to accelerate ML models based on convolutional neural networks. We present a comprehensive analysis of OPIMA to guide design choices and operational mechanisms. Additionally, we evaluate the performance and energy consumption of OPIMA, comparing it with conventional electronic computing systems and emerging photonic PIM architectures. The experimental results show that OPIMA can achieve 2.98x higher throughput and 137x better energy efficiency than the best-known prior work.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
R-TOSS: A Framework for Real-Time Object Detection using Semi-Structured Pruning
Authors:
Abhishek Balasubramaniam,
Febin P Sunny,
Sudeep Pasricha
Abstract:
Object detectors used in autonomous vehicles can have high memory and computational overheads. In this paper, we introduce a novel semi-structured pruning framework called R-TOSS that overcomes the shortcomings of state-of-the-art model pruning techniques. Experimental results on the JetsonTX2 show that R-TOSS has a compression rate of 4.4x on the YOLOv5 object detector with a 2.15x speedup in inf…
▽ More
Object detectors used in autonomous vehicles can have high memory and computational overheads. In this paper, we introduce a novel semi-structured pruning framework called R-TOSS that overcomes the shortcomings of state-of-the-art model pruning techniques. Experimental results on the JetsonTX2 show that R-TOSS has a compression rate of 4.4x on the YOLOv5 object detector with a 2.15x speedup in inference time and 57.01% decrease in energy usage. R-TOSS also enables 2.89x compression on RetinaNet with a 1.86x speedup in inference time and 56.31% decrease in energy usage. We also demonstrate significant improvements compared to various state-of-the-art pruning techniques.
△ Less
Submitted 3 March, 2023;
originally announced March 2023.
-
Keratoconus Classifier for Smartphone-based Corneal Topographer
Authors:
Siddhartha Gairola,
Pallavi Joshi,
Anand Balasubramaniam,
Kaushik Murali,
Nipun Kwatra,
Mohit Jain
Abstract:
Keratoconus is a severe eye disease that leads to deformation of the cornea. It impacts people aged 10-25 years and is the leading cause of blindness in that demography. Corneal topography is the gold standard for keratoconus diagnosis. It is a non-invasive process performed using expensive and bulky medical devices called corneal topographers. This makes it inaccessible to large populations, espe…
▽ More
Keratoconus is a severe eye disease that leads to deformation of the cornea. It impacts people aged 10-25 years and is the leading cause of blindness in that demography. Corneal topography is the gold standard for keratoconus diagnosis. It is a non-invasive process performed using expensive and bulky medical devices called corneal topographers. This makes it inaccessible to large populations, especially in the Global South. Low-cost smartphone-based corneal topographers, such as SmartKC, have been proposed to make keratoconus diagnosis accessible. Similar to medical-grade topographers, SmartKC outputs curvature heatmaps and quantitative metrics that need to be evaluated by doctors for keratoconus diagnosis. An automatic scheme for evaluation of these heatmaps and quantitative values can play a crucial role in screening keratoconus in areas where doctors are not available. In this work, we propose a dual-head convolutional neural network (CNN) for classifying keratoconus on the heatmaps generated by SmartKC. Since SmartKC is a new device and only had a small dataset (114 samples), we developed a 2-stage transfer learning strategy -- using historical data collected from a medical-grade topographer and a subset of SmartKC data -- to satisfactorily train our network. This, combined with our domain-specific data augmentations, achieved a sensitivity of 91.3% and a specificity of 94.2%.
△ Less
Submitted 7 May, 2022;
originally announced May 2022.
-
Object Detection in Autonomous Vehicles: Status and Open Challenges
Authors:
Abhishek Balasubramaniam,
Sudeep Pasricha
Abstract:
Object detection is a computer vision task that has become an integral part of many consumer applications today such as surveillance and security systems, mobile text recognition, and diagnosing diseases from MRI/CT scans. Object detection is also one of the critical components to support autonomous driving. Autonomous vehicles rely on the perception of their surroundings to ensure safe and robust…
▽ More
Object detection is a computer vision task that has become an integral part of many consumer applications today such as surveillance and security systems, mobile text recognition, and diagnosing diseases from MRI/CT scans. Object detection is also one of the critical components to support autonomous driving. Autonomous vehicles rely on the perception of their surroundings to ensure safe and robust driving performance. This perception system uses object detection algorithms to accurately determine objects such as pedestrians, vehicles, traffic signs, and barriers in the vehicle's vicinity. Deep learning-based object detectors play a vital role in finding and localizing these objects in real-time. This article discusses the state-of-the-art in object detectors and open challenges for their integration into autonomous vehicles.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
Nonnegative Matrix Factorization to understand Spatio-Temporal Traffic Pattern Variations during COVID-19: A Case Study
Authors:
Anandkumar Balasubramaniam,
Thirunavukarasu Balasubramaniam,
Rathinaraja Jeyaraj,
Anand Paul,
Richi Nayak
Abstract:
Due to the rapid developments in Intelligent Transportation System (ITS) and increasing trend in the number of vehicles on road, abundant of road traffic data is generated and available. Understanding spatio-temporal traffic patterns from this data is crucial and has been effectively hel** in traffic plannings, road constructions, etc. However, understanding traffic patterns during COVID-19 pand…
▽ More
Due to the rapid developments in Intelligent Transportation System (ITS) and increasing trend in the number of vehicles on road, abundant of road traffic data is generated and available. Understanding spatio-temporal traffic patterns from this data is crucial and has been effectively hel** in traffic plannings, road constructions, etc. However, understanding traffic patterns during COVID-19 pandemic is quite challenging and important as there is a huge difference in-terms of people's and vehicle's travel behavioural patterns. In this paper, a case study is conducted to understand the variations in spatio-temporal traffic patterns during COVID-19. We apply nonnegative matrix factorization (NMF) to elicit patterns. The NMF model outputs are analysed based on the spatio-temporal pattern behaviours observed during the year 2019 and 2020, which is before pandemic and during pandemic situations respectively, in Great Britain. The outputs of the analysed spatio-temporal traffic pattern variation behaviours will be useful in the fields of traffic management in Intelligent Transportation System and management in various stages of pandemic or unavoidable scenarios in-relation to road traffic.
△ Less
Submitted 5 November, 2021;
originally announced November 2021.
-
SmartKC: Smartphone-based Corneal Topographer for Keratoconus Detection
Authors:
Siddhartha Gairola,
Murtuza Bohra,
Nadeem Shaheer,
Navya Jayaprakash,
Pallavi Joshi,
Anand Balasubramaniam,
Kaushik Murali,
Nipun Kwatra,
Mohit Jain
Abstract:
Keratoconus is a severe eye disease affecting the cornea (the clear, dome-shaped outer surface of the eye), causing it to become thin and develop a conical bulge. The diagnosis of keratoconus requires sophisticated ophthalmic devices which are non-portable and very expensive. This makes early detection of keratoconus inaccessible to large populations in low- and middle-income countries, making it…
▽ More
Keratoconus is a severe eye disease affecting the cornea (the clear, dome-shaped outer surface of the eye), causing it to become thin and develop a conical bulge. The diagnosis of keratoconus requires sophisticated ophthalmic devices which are non-portable and very expensive. This makes early detection of keratoconus inaccessible to large populations in low- and middle-income countries, making it a leading cause for partial/complete blindness among such populations. We propose SmartKC, a low-cost, smartphone-based keratoconus diagnosis system comprising of a 3D-printed placido's disc attachment, an LED light strip, and an intelligent smartphone app to capture the reflection of the placido rings on the cornea. An image processing pipeline analyzes the corneal image and uses the smartphone's camera parameters, the placido rings' 3D location, the pixel location of the reflected placido rings and the setup's working distance to construct the corneal surface, via the Arc-Step method and Zernike polynomials based surface fitting. In a clinical study with 101 distinct eyes, we found that SmartKC achieves a sensitivity of 94.1% and a specificity of 100.0%. Moreover, the quantitative curvature estimates (sim-K) strongly correlate with a gold-standard medical device (Pearson correlation coefficient =0.78). Our results indicate that SmartKC has the potential to be used as a keratoconus screening tool under real-world medical settings.
△ Less
Submitted 21 January, 2022; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Improving the Efficiency of Transformers for Resource-Constrained Devices
Authors:
Hamid Tabani,
Ajay Balasubramaniam,
Shabbir Marzban,
Elahe Arani,
Bahram Zonooz
Abstract:
Transformers provide promising accuracy and have become popular and used in various domains such as natural language processing and computer vision. However, due to their massive number of model parameters, memory and computation requirements, they are not suitable for resource-constrained low-power devices. Even with high-performance and specialized devices, the memory bandwidth can become a perf…
▽ More
Transformers provide promising accuracy and have become popular and used in various domains such as natural language processing and computer vision. However, due to their massive number of model parameters, memory and computation requirements, they are not suitable for resource-constrained low-power devices. Even with high-performance and specialized devices, the memory bandwidth can become a performance-limiting bottleneck. In this paper, we present a performance analysis of state-of-the-art vision transformers on several devices. We propose to reduce the overall memory footprint and memory transfers by clustering the model parameters. We show that by using only 64 clusters to represent model parameters, it is possible to reduce the data transfer from the main memory by more than 4x, achieve up to 22% speedup and 39% energy savings on mobile devices with less than 0.1% accuracy loss.
△ Less
Submitted 30 June, 2021;
originally announced June 2021.
-
Challenges and Obstacles Towards Deploying Deep Learning Models on Mobile Devices
Authors:
Hamid Tabani,
Ajay Balasubramaniam,
Elahe Arani,
Bahram Zonooz
Abstract:
From computer vision and speech recognition to forecasting trajectories in autonomous vehicles, deep learning approaches are at the forefront of so many domains. Deep learning models are developed using plethora of high-level, generic frameworks and libraries. Running those models on the mobile devices require hardware-aware optimizations and in most cases converting the models to other formats or…
▽ More
From computer vision and speech recognition to forecasting trajectories in autonomous vehicles, deep learning approaches are at the forefront of so many domains. Deep learning models are developed using plethora of high-level, generic frameworks and libraries. Running those models on the mobile devices require hardware-aware optimizations and in most cases converting the models to other formats or using a third-party framework. In reality, most of the developed models need to undergo a process of conversion, adaptation, and, in some cases, full retraining to match the requirements and features of the framework that is deploying the model on the target platform. Variety of hardware platforms with heterogeneous computing elements, from wearable devices to high-performance GPU clusters are used to run deep learning models. In this paper, we present the existing challenges, obstacles, and practical solutions towards deploying deep learning models on mobile devices.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
Detecting Problem Statements in Peer Assessments
Authors:
Yunkai Xiao,
Gabriel Zingle,
Qin** Jia,
Harsh R. Shah,
Yi Zhang,
Tianyi Li,
Mohsin Karovaliya,
Weixiang Zhao,
Yang Song,
Jie Ji,
Ashwin Balasubramaniam,
Harshit Patel,
Priyankha Bhalasubbramanian,
Vikram Patel,
Edward F. Gehringer
Abstract:
Effective peer assessment requires students to be attentive to the deficiencies in the work they rate. Thus, their reviews should identify problems. But what ways are there to check that they do? We attempt to automate the process of deciding whether a review comment detects a problem. We use over 18,000 review comments that were labeled by the reviewees as either detecting or not detecting a prob…
▽ More
Effective peer assessment requires students to be attentive to the deficiencies in the work they rate. Thus, their reviews should identify problems. But what ways are there to check that they do? We attempt to automate the process of deciding whether a review comment detects a problem. We use over 18,000 review comments that were labeled by the reviewees as either detecting or not detecting a problem with the work. We deploy several traditional machine-learning models, as well as neural-network models using GloVe and BERT embeddings. We find that the best performer is the Hierarchical Attention Network classifier, followed by the Bidirectional Gated Recurrent Units (GRU) Attention and Capsule model with scores of 93.1% and 90.5% respectively. The best non-neural network model was the support vector machine with a score of 89.71%. This is followed by the Stochastic Gradient Descent model and the Logistic Regression model with 89.70% and 88.98%.
△ Less
Submitted 29 May, 2020;
originally announced June 2020.
-
The GALAH survey: A census of lithium-rich giant stars
Authors:
Sarah Martell,
Jeffrey Simpson,
Adithya Balasubramaniam,
Sven Buder,
Sanjib Sharma,
Marc Hon,
Dennis Stello,
Yuan-Sen Ting,
Martin Asplund,
Joss Bland-Hawthorn,
Gayandhi De Silva,
Ken Freeman,
Michael Hayden,
Janez Kos,
Geraint Lewis,
Karin Lind,
Daniel Zucker,
Tomaz Zwitter,
Simon Campbell,
Klemen Cotar,
Jonathan Horner,
Benjamin Montet,
Rob Wittenmyer
Abstract:
We investigate the properties of 1262 red giant stars with high photospheric abundances of lithium observed by the GALAH and \Ktwo-HERMES surveys, and discuss them in the context of proposed mechanisms for lithium enrichment and re-depletion in giant stars. We confirm that Li-rich giants are rare, making up only 1.2 per cent of our giant star sample. We use stellar parameters from the third public…
▽ More
We investigate the properties of 1262 red giant stars with high photospheric abundances of lithium observed by the GALAH and \Ktwo-HERMES surveys, and discuss them in the context of proposed mechanisms for lithium enrichment and re-depletion in giant stars. We confirm that Li-rich giants are rare, making up only 1.2 per cent of our giant star sample. We use stellar parameters from the third public data release from the GALAH survey and a Bayesian isochrone analysis to divide the sample into first-ascent red giant branch and red clump stars, and confirm these classifications using asteroseismic data from \Ktwo. We find that red clump stars are 2.5 times as likely to be lithium-rich as red giant branch stars, in agreement with other recent work. The probability for a star to be lithium-rich is affected by a number of factors, though the causality in those correlations is not entirely clear. We show for the first time that primary and secondary red clump stars have distinctly different lithium enrichment patterns. The data set discussed here is large and heterogeneous in terms of evolutionary phase, metallicity, rotation rate and mass. We expect that if the various mechanisms that have been proposed for lithium enrichment in evolved stars are in fact active, they should all contribute to this sample of lithium-rich giants at some level.
△ Less
Submitted 10 May, 2021; v1 submitted 3 June, 2020;
originally announced June 2020.