Search | arXiv e-print repository

OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

Authors: Allen Roush, Yusuf Shabazz, Arvind Balaji, Peter Zhang, Stefano Mezza, Markus Zhang, Sanjay Basu, Sriram Vishwanath, Mehdi Fatemi, Ravid Shwartz-Ziv

Abstract: We introduce OpenDebateEvidence, a comprehensive dataset for argument mining and summarization sourced from the American Competitive Debate community. This dataset includes over 3.5 million documents with rich metadata, making it one of the most extensive collections of debate evidence. OpenDebateEvidence captures the complexity of arguments in high school and college debates, providing valuable r… ▽ More We introduce OpenDebateEvidence, a comprehensive dataset for argument mining and summarization sourced from the American Competitive Debate community. This dataset includes over 3.5 million documents with rich metadata, making it one of the most extensive collections of debate evidence. OpenDebateEvidence captures the complexity of arguments in high school and college debates, providing valuable resources for training and evaluation. Our extensive experiments demonstrate the efficacy of fine-tuning state-of-the-art large language models for argumentative abstractive summarization across various methods, models, and datasets. By providing this comprehensive resource, we aim to advance computational argumentation and support practical applications for debaters, educators, and researchers. OpenDebateEvidence is publicly available to support further research and innovation in computational argumentation. Access it here: https://huggingface.co/datasets/Yusuf5/OpenCaselist △ Less

Submitted 5 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted for Publication to ARGMIN 2024 at ACL2024

arXiv:2404.10689 [pdf, other]

Network architecture search of X-ray based scientific applications

Authors: Adarsha Balaji, Ramyad Hadidi, Gregory Kollmer, Mohammed E. Fouda, Prasanna Balaprakash

Abstract: X-ray and electron diffraction-based microscopy use bragg peak detection and ptychography to perform 3-D imaging at an atomic resolution. Typically, these techniques are implemented using computationally complex tasks such as a Psuedo-Voigt function or solving a complex inverse problem. Recently, the use of deep neural networks has improved the existing state-of-the-art approaches. However, the de… ▽ More X-ray and electron diffraction-based microscopy use bragg peak detection and ptychography to perform 3-D imaging at an atomic resolution. Typically, these techniques are implemented using computationally complex tasks such as a Psuedo-Voigt function or solving a complex inverse problem. Recently, the use of deep neural networks has improved the existing state-of-the-art approaches. However, the design and development of the neural network models depends on time and labor intensive tuning of the model by application experts. To that end, we propose a hyperparameter (HPS) and neural architecture search (NAS) approach to automate the design and optimization of the neural network models for model size, energy consumption and throughput. We demonstrate the improved performance of the auto-tuned models when compared to the manually tuned BraggNN and PtychoNN benchmark. We study and demonstrate the importance of the exploring the search space of tunable hyperparameters in enhancing the performance of bragg peak detection and ptychographic reconstruction. Our NAS and HPS of (1) BraggNN achieves a 31.03\% improvement in bragg peak detection accuracy with a 87.57\% reduction in model size, and (2) PtychoNN achieves a 16.77\% improvement in model accuracy and a 12.82\% reduction in model size when compared to the baseline PtychoNN model. When inferred on the Orin-AGX platform, the optimized Braggnn and Ptychonn models demonstrate a 10.51\% and 9.47\% reduction in inference latency and a 44.18\% and 15.34\% reduction in energy consumption when compared to their respective baselines, when inferred in the Orin-AGX edge platform. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2402.09381 [pdf, other]

GraSSRep: Graph-Based Self-Supervised Learning for Repeat Detection in Metagenomic Assembly

Authors: Ali Azizpour, Advait Balaji, Todd J. Treangen, Santiago Segarra

Abstract: Repetitive DNA (repeats) poses significant challenges for accurate and efficient genome assembly and sequence alignment. This is particularly true for metagenomic data, where genome dynamics such as horizontal gene transfer, gene duplication, and gene loss/gain complicate accurate genome assembly from metagenomic communities. Detecting repeats is a crucial first step in overcoming these challenges… ▽ More Repetitive DNA (repeats) poses significant challenges for accurate and efficient genome assembly and sequence alignment. This is particularly true for metagenomic data, where genome dynamics such as horizontal gene transfer, gene duplication, and gene loss/gain complicate accurate genome assembly from metagenomic communities. Detecting repeats is a crucial first step in overcoming these challenges. To address this issue, we propose GraSSRep, a novel approach that leverages the assembly graph's structure through graph neural networks (GNNs) within a self-supervised learning framework to classify DNA sequences into repetitive and non-repetitive categories. Specifically, we frame this problem as a node classification task within a metagenomic assembly graph. In a self-supervised fashion, we rely on a high-precision (but low-recall) heuristic to generate pseudo-labels for a small proportion of the nodes. We then use those pseudo-labels to train a GNN embedding and a random forest classifier to propagate the labels to the remaining nodes. In this way, GraSSRep combines sequencing features with pre-defined and learned graph features to achieve state-of-the-art performance in repeat detection. We evaluate our method using simulated and synthetic metagenomic datasets. The results on the simulated data highlight our GraSSRep's robustness to repeat attributes, demonstrating its effectiveness in handling the complexity of repeated sequences. Additionally, our experiments with synthetic metagenomic datasets reveal that incorporating the graph structure and the GNN enhances our detection performance. Finally, in comparative analyses, GraSSRep outperforms existing repeat detection tools with respect to precision and recall. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2311.08101 [pdf, other]

Non-monotonic emergence of order from chaos in turbulent thermo-acoustic fluid systems

Authors: Aswin Balaji, Shruti Tandon, Norbert Marwan, Jürgen Kurths, R. I. Sujith

Abstract: Self-sustained order can emerge in complex systems due to internal feedback between coupled subsystems. Here, we present our discovery of a non-monotonic emergence of order amidst chaos in a turbulent thermo-acoustic fluid system. Fluctuations play a vital role in determining the dynamical state and transitions in a system. In this work, we use complex networks to encode jumps in amplitude scales… ▽ More Self-sustained order can emerge in complex systems due to internal feedback between coupled subsystems. Here, we present our discovery of a non-monotonic emergence of order amidst chaos in a turbulent thermo-acoustic fluid system. Fluctuations play a vital role in determining the dynamical state and transitions in a system. In this work, we use complex networks to encode jumps in amplitude scales owing to fluctuations as links between nodes representing amplitude bins. The number of possible amplitude transitions at a fixed timescale reflects the complexity of dynamics at that timescale. The network entropy quantifies the number of and uncertainty associated with such transitions. Using network entropy, we show that the uncertainty in fluctuations first increases and then decreases as the system transitions from chaos via intermittency to order. The competition between turbulence and nonlinear interactions leads to such non-monotonic emergence of order amidst chaos in turbulent thermo-acoustic fluid systems. △ Less

Submitted 6 July, 2024; v1 submitted 14 November, 2023; originally announced November 2023.

arXiv:2203.05311 [pdf, other]

Design-Technology Co-Optimization for NVM-based Neuromorphic Processing Elements

Authors: Shihao Song, Adarsha Balaji, Anup Das, Nagarajan Kandasamy

Abstract: Neuromorphic hardware platforms can significantly lower the energy overhead of a machine learning inference task. We present a design-technology tradeoff analysis to implement such inference tasks on the processing elements (PEs) of a Non- Volatile Memory (NVM)-based neuromorphic hardware. Through detailed circuit-level simulations at scaled process technology nodes, we show the negative impact of… ▽ More Neuromorphic hardware platforms can significantly lower the energy overhead of a machine learning inference task. We present a design-technology tradeoff analysis to implement such inference tasks on the processing elements (PEs) of a Non- Volatile Memory (NVM)-based neuromorphic hardware. Through detailed circuit-level simulations at scaled process technology nodes, we show the negative impact of technology scaling on the information-processing latency, which impacts the quality-of-service (QoS) of an embedded ML system. At a finer granularity, the latency inside a PE depends on 1) the delay introduced by parasitic components on its current paths, and 2) the varying delay to sense different resistance states of its NVM cells. Based on these two observations, we make the following three contributions. First, on the technology front, we propose an optimization scheme where the NVM resistance state that takes the longest time to sense is set on current paths having the least delay, and vice versa, reducing the average PE latency, which improves the QoS. Second, on the architecture front, we introduce isolation transistors within each PE to partition it into regions that can be individually power-gated, reducing both latency and energy. Finally, on the system-software front, we propose a mechanism to leverage the proposed technological and architectural enhancements when implementing a machine-learning inference task on neuromorphic PEs of the hardware. Evaluations with a recent neuromorphic hardware architecture show that our proposed design-technology co-optimization approach improves both performance and energy efficiency of machine-learning inference tasks without incurring high cost-per-bit. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: Accepted for publication at ACM TECS

arXiv:2202.08897 [pdf, other]

Implementing Spiking Neural Networks on Neuromorphic Architectures: A Review

Authors: Phu Khanh Huynh, M. Lakshmi Varshika, Ankita Paul, Murat Isik, Adarsha Balaji, Anup Das

Abstract: Recently, both industry and academia have proposed several different neuromorphic systems to execute machine learning applications that are designed using Spiking Neural Networks (SNNs). With the growing complexity on design and technology fronts, programming such systems to admit and execute a machine learning application is becoming increasingly challenging. Additionally, neuromorphic systems ar… ▽ More Recently, both industry and academia have proposed several different neuromorphic systems to execute machine learning applications that are designed using Spiking Neural Networks (SNNs). With the growing complexity on design and technology fronts, programming such systems to admit and execute a machine learning application is becoming increasingly challenging. Additionally, neuromorphic systems are required to guarantee real-time performance, consume lower energy, and provide tolerance to logic and memory failures. Consequently, there is a clear need for system software frameworks that can implement machine learning applications on current and emerging neuromorphic systems, and simultaneously address performance, energy, and reliability. Here, we provide a comprehensive overview of such frameworks proposed for both, platform-based design and hardware-software co-design. We highlight challenges and opportunities that the future holds in the area of system software technology for neuromorphic computing. △ Less

Submitted 17 February, 2022; originally announced February 2022.

arXiv:2111.11838 [pdf, other]

Design of Many-Core Big Little μBrain for Energy-Efficient Embedded Neuromorphic Computing

Authors: M. Lakshmi Varshika, Adarsha Balaji, Federico Corradi, Anup Das, Jan Stuijt, Francky Catthoor

Abstract: As spiking-based deep learning inference applications are increasing in embedded systems, these systems tend to integrate neuromorphic accelerators such as $μ$Brain to improve energy efficiency. We propose a $μ$Brain-based scalable many-core neuromorphic hardware design to accelerate the computations of spiking deep convolutional neural networks (SDCNNs). To increase energy efficiency, cores are d… ▽ More As spiking-based deep learning inference applications are increasing in embedded systems, these systems tend to integrate neuromorphic accelerators such as $μ$Brain to improve energy efficiency. We propose a $μ$Brain-based scalable many-core neuromorphic hardware design to accelerate the computations of spiking deep convolutional neural networks (SDCNNs). To increase energy efficiency, cores are designed to be heterogeneous in terms of their neuron and synapse capacity (big cores have higher capacity than the little ones), and they are interconnected using a parallel segmented bus interconnect, which leads to lower latency and energy compared to a traditional mesh-based Network-on-Chip (NoC). We propose a system software framework called SentryOS to map SDCNN inference applications to the proposed design. SentryOS consists of a compiler and a run-time manager. The compiler compiles an SDCNN application into subnetworks by exploiting the internal architecture of big and little $μ$Brain cores. The run-time manager schedules these sub-networks onto cores and pipeline their execution to improve throughput. We evaluate the proposed big little many-core neuromorphic design and the system software framework with five commonlyused SDCNN inference applications and show that the proposed solution reduces energy (between 37% and 98%), reduces latency (between 9% and 25%), and increases application throughput (between 20% and 36%). We also show that SentryOS can be easily extended for other spiking neuromorphic accelerators. △ Less

Submitted 23 November, 2021; originally announced November 2021.

Comments: Accepted for publication at DATE 2022

arXiv:2108.02023 [pdf, other]

DFSynthesizer: Dataflow-based Synthesis of Spiking Neural Networks to Neuromorphic Hardware

Authors: Shihao Song, Harry Chong, Adarsha Balaji, Anup Das, James Shackleford, Nagarajan Kandasamy

Abstract: Spiking Neural Networks (SNN) are an emerging computation model, which uses event-driven activation and bio-inspired learning algorithms. SNN-based machine-learning programs are typically executed on tile- based neuromorphic hardware platforms, where each tile consists of a computation unit called crossbar, which maps neurons and synapses of the program. However, synthesizing such programs on an o… ▽ More Spiking Neural Networks (SNN) are an emerging computation model, which uses event-driven activation and bio-inspired learning algorithms. SNN-based machine-learning programs are typically executed on tile- based neuromorphic hardware platforms, where each tile consists of a computation unit called crossbar, which maps neurons and synapses of the program. However, synthesizing such programs on an off-the-shelf neuromorphic hardware is challenging. This is because of the inherent resource and latency limitations of the hardware, which impact both model performance, e.g., accuracy, and hardware performance, e.g., throughput. We propose DFSynthesizer, an end-to-end framework for synthesizing SNN-based machine learning programs to neuromorphic hardware. The proposed framework works in four steps. First, it analyzes a machine-learning program and generates SNN workload using representative data. Second, it partitions the SNN workload and generates clusters that fit on crossbars of the target neuromorphic hardware. Third, it exploits the rich semantics of Synchronous Dataflow Graph (SDFG) to represent a clustered SNN program, allowing for performance analysis in terms of key hardware constraints such as number of crossbars, dimension of each crossbar, buffer space on tiles, and tile communication bandwidth. Finally, it uses a novel scheduling algorithm to execute clusters on crossbars of the hardware, guaranteeing hardware performance. We evaluate DFSynthesizer with 10 commonly used machine-learning programs. Our results demonstrate that DFSynthesizer provides much tighter performance guarantee compared to current map** approaches. △ Less

Submitted 4 August, 2021; originally announced August 2021.

Comments: Accepted for publication at ACM Transactions on Embedded Computing

arXiv:2105.02038 [pdf, other]

Dynamic Reliability Management in Neuromorphic Computing

Authors: Shihao Song, Jui Hanamshet, Adarsha Balaji, Anup Das, Jeffrey L. Krichmar, Nikil D. Dutt, Nagarajan Kandasamy, Francky Catthoor

Abstract: Neuromorphic computing systems uses non-volatile memory (NVM) to implement high-density and low-energy synaptic storage. Elevated voltages and currents needed to operate NVMs cause aging of CMOS-based transistors in each neuron and synapse circuit in the hardware, drifting the transistor's parameters from their nominal values. Aggressive device scaling increases power density and temperature, whic… ▽ More Neuromorphic computing systems uses non-volatile memory (NVM) to implement high-density and low-energy synaptic storage. Elevated voltages and currents needed to operate NVMs cause aging of CMOS-based transistors in each neuron and synapse circuit in the hardware, drifting the transistor's parameters from their nominal values. Aggressive device scaling increases power density and temperature, which accelerates the aging, challenging the reliable operation of neuromorphic systems. Existing reliability-oriented techniques periodically de-stress all neuron and synapse circuits in the hardware at fixed intervals, assuming worst-case operating conditions, without actually tracking their aging at run time. To de-stress these circuits, normal operation must be interrupted, which introduces latency in spike generation and propagation, impacting the inter-spike interval and hence, performance, e.g., accuracy. We propose a new architectural technique to mitigate the aging-related reliability problems in neuromorphic systems, by designing an intelligent run-time manager (NCRTM), which dynamically destresses neuron and synapse circuits in response to the short-term aging in their CMOS transistors during the execution of machine learning workloads, with the objective of meeting a reliability target. NCRTM de-stresses these circuits only when it is absolutely necessary to do so, otherwise reducing the performance impact by scheduling de-stress operations off the critical path. We evaluate NCRTM with state-of-the-art machine learning workloads on a neuromorphic hardware. Our results demonstrate that NCRTM significantly improves the reliability of neuromorphic hardware, with marginal impact on performance. △ Less

Submitted 5 May, 2021; originally announced May 2021.

Comments: Accepted in ACM JETC

arXiv:2105.01795 [pdf, other]

NeuroXplorer 1.0: An Extensible Framework for Architectural Exploration with Spiking Neural Networks

Authors: Adarsha Balaji, Shihao Song, Twisha Titirsha, Anup Das, Jeffrey Krichmar, Nikil Dutt, James Shackleford, Nagarajan Kandasamy, Francky Catthoor

Abstract: Recently, both industry and academia have proposed many different neuromorphic architectures to execute applications that are designed with Spiking Neural Network (SNN). Consequently, there is a growing need for an extensible simulation framework that can perform architectural explorations with SNNs, including both platform-based design of today's hardware, and hardware-software co-design and desi… ▽ More Recently, both industry and academia have proposed many different neuromorphic architectures to execute applications that are designed with Spiking Neural Network (SNN). Consequently, there is a growing need for an extensible simulation framework that can perform architectural explorations with SNNs, including both platform-based design of today's hardware, and hardware-software co-design and design-technology co-optimization of the future. We present NeuroXplorer, a fast and extensible framework that is based on a generalized template for modeling a neuromorphic architecture that can be infused with the specific details of a given hardware and/or technology. NeuroXplorer can perform both low-level cycle-accurate architectural simulations and high-level analysis with data-flow abstractions. NeuroXplorer's optimization engine can incorporate hardware-oriented metrics such as energy, throughput, and latency, as well as SNN-oriented metrics such as inter-spike interval distortion and spike disorder, which directly impact SNN performance. We demonstrate the architectural exploration capabilities of NeuroXplorer through case studies with many state-of-the-art machine learning models. △ Less

Submitted 4 May, 2021; originally announced May 2021.

arXiv:2103.12231 [pdf, other]

On the Role of System Software in Energy Management of Neuromorphic Computing

Authors: Twisha Titirsha, Shihao Song, Adarsha Balaji, Anup Das

Abstract: Neuromorphic computing systems such as DYNAPs and Loihi have recently been introduced to the computing community to improve performance and energy efficiency of machine learning programs, especially those that are implemented using Spiking Neural Network (SNN). The role of a system software for neuromorphic systems is to cluster a large machine learning model (e.g., with many neurons and synapses)… ▽ More Neuromorphic computing systems such as DYNAPs and Loihi have recently been introduced to the computing community to improve performance and energy efficiency of machine learning programs, especially those that are implemented using Spiking Neural Network (SNN). The role of a system software for neuromorphic systems is to cluster a large machine learning model (e.g., with many neurons and synapses) and map these clusters to the computing resources of the hardware. In this work, we formulate the energy consumption of a neuromorphic hardware, considering the power consumed by neurons and synapses, and the energy consumed in communicating spikes on the interconnect. Based on such formulation, we first evaluate the role of a system software in managing the energy consumption of neuromorphic systems. Next, we formulate a simple heuristic-based map** approach to place the neurons and synapses onto the computing resources to reduce energy consumption. We evaluate our approach with 10 machine learning applications and demonstrate that the proposed map** approach leads to a significant reduction of energy consumption of neuromorphic computing systems. △ Less

Submitted 22 March, 2021; originally announced March 2021.

Comments: To appear in 18th Computer Frontiers 2021

arXiv:2011.14133 [pdf, other]

Towards Fast and Light-Weight Restoration of Dark Images

Authors: Mohit Lamba, Atul Balaji, Kaushik Mitra

Abstract: The ability to capture good quality images in the dark and near-zero lux conditions has been a long-standing pursuit of the computer vision community. The seminal work by Chen et al. [5] has especially caused renewed interest in this area, resulting in methods that build on top of their work in a bid to improve the reconstruction. However, for practical utility and deployment of low-light enhancem… ▽ More The ability to capture good quality images in the dark and near-zero lux conditions has been a long-standing pursuit of the computer vision community. The seminal work by Chen et al. [5] has especially caused renewed interest in this area, resulting in methods that build on top of their work in a bid to improve the reconstruction. However, for practical utility and deployment of low-light enhancement algorithms on edge devices such as embedded systems, surveillance cameras, autonomous robots and smartphones, the solution must respect additional constraints such as limited GPU memory and processing power. With this in mind, we propose a deep neural network architecture that aims to strike a balance between the network latency, memory utilization, model parameters, and reconstruction quality. The key idea is to forbid computations in the High-Resolution (HR) space and limit them to a Low-Resolution (LR) space. However, doing the bulk of computations in the LR space causes artifacts in the restored image. We thus propose Pack and UnPack operations, which allow us to effectively transit between the HR and LR spaces without incurring much artifacts in the restored image. We show that we can enhance a full resolution, 2848 x 4256, extremely dark single-image in the ballpark of 3 seconds even on a CPU. We achieve this with 2 - 7x fewer model parameters, 2 - 3x lower memory utilization, 5 - 20x speed up and yet maintain a competitive image reconstruction quality compared to the state-of-the-art algorithms. △ Less

Submitted 28 November, 2020; originally announced November 2020.

arXiv:2011.13965 [pdf, ps, other]

Compiling Spiking Neural Networks to Mitigate Neuromorphic Hardware Constraints

Authors: Adarsha Balaji, Anup Das

Abstract: Spiking Neural Networks (SNNs) are efficient computation models to perform spatio-temporal pattern recognition on {resource}- and {power}-constrained platforms. SNNs executed on neuromorphic hardware can further reduce energy consumption of these platforms. With increasing model size and complexity, map** SNN-based applications to tile-based neuromorphic hardware is becoming increasingly challen… ▽ More Spiking Neural Networks (SNNs) are efficient computation models to perform spatio-temporal pattern recognition on {resource}- and {power}-constrained platforms. SNNs executed on neuromorphic hardware can further reduce energy consumption of these platforms. With increasing model size and complexity, map** SNN-based applications to tile-based neuromorphic hardware is becoming increasingly challenging. This is attributed to the limitations of neuro-synaptic cores, viz. a crossbar, to accommodate only a fixed number of pre-synaptic connections per post-synaptic neuron. For complex SNN-based models that have many neurons and pre-synaptic connections per neuron, (1) connections may need to be pruned after training to fit onto the crossbar resources, leading to a loss in model quality, e.g., accuracy, and (2) the neurons and synapses need to be partitioned and placed on the neuro-sypatic cores of the hardware, which could lead to increased latency and energy consumption. In this work, we propose (1) a novel unrolling technique that decomposes a neuron function with many pre-synaptic connections into a sequence of homogeneous neural units to significantly improve the crossbar utilization and retain all pre-synaptic connections, and (2) SpiNeMap, a novel methodology to map SNNs on neuromorphic hardware with an aim to minimize energy consumption and spike latency. △ Less

Submitted 27 November, 2020; originally announced November 2020.

arXiv:2011.07251 [pdf]

DebateSum: A large-scale argument mining and summarization dataset

Authors: Allen Roush, Arvind Balaji

Abstract: Prior work in Argument Mining frequently alludes to its potential applications in automatic debating systems. Despite this focus, almost no datasets or models exist which apply natural language processing techniques to problems found within competitive formal debate. To remedy this, we present the DebateSum dataset. DebateSum consists of 187,386 unique pieces of evidence with corresponding argumen… ▽ More Prior work in Argument Mining frequently alludes to its potential applications in automatic debating systems. Despite this focus, almost no datasets or models exist which apply natural language processing techniques to problems found within competitive formal debate. To remedy this, we present the DebateSum dataset. DebateSum consists of 187,386 unique pieces of evidence with corresponding argument and extractive summaries. DebateSum was made using data compiled by competitors within the National Speech and Debate Association over a 7-year period. We train several transformer summarization models to benchmark summarization performance on DebateSum. We also introduce a set of fasttext word-vectors trained on DebateSum called debate2vec. Finally, we present a search engine for this dataset which is utilized extensively by members of the National Speech and Debate Association today. The DebateSum search engine is available to the public here: http://www.debate.cards △ Less

Submitted 14 November, 2020; originally announced November 2020.

Comments: Accepted for oral presentation at the 7th Workshop on Argument Mining (ARGMIN 2020) held at The 28th International Conference on Computational Linguistics (COLING 2020)

arXiv:2009.09298 [pdf, other]

Enabling Resource-Aware Map** of Spiking Neural Networks via Spatial Decomposition

Authors: Adarsha Balaji, Shihao Song, Anup Das, Jeffrey Krichmar, Nikil Dutt, James Shackleford, Nagarajan Kandasamy, Francky Catthoor

Abstract: With growing model complexity, map** Spiking Neural Network (SNN)-based applications to tile-based neuromorphic hardware is becoming increasingly challenging. This is because the synaptic storage resources on a tile, viz. a crossbar, can accommodate only a fixed number of pre-synaptic connections per post-synaptic neuron. For complex SNN models that have many pre-synaptic connections per neuron,… ▽ More With growing model complexity, map** Spiking Neural Network (SNN)-based applications to tile-based neuromorphic hardware is becoming increasingly challenging. This is because the synaptic storage resources on a tile, viz. a crossbar, can accommodate only a fixed number of pre-synaptic connections per post-synaptic neuron. For complex SNN models that have many pre-synaptic connections per neuron, some connections may need to be pruned after training to fit onto the tile resources, leading to a loss in model quality, e.g., accuracy. In this work, we propose a novel unrolling technique that decomposes a neuron function with many pre-synaptic connections into a sequence of homogeneous neural units, where each neural unit is a function computation node, with two pre-synaptic connections. This spatial decomposition technique significantly improves crossbar utilization and retains all pre-synaptic connections, resulting in no loss of the model quality derived from connection pruning. We integrate the proposed technique within an existing SNN map** framework and evaluate it using machine learning applications on the DYNAP-SE state-of-the-art neuromorphic hardware. Our results demonstrate an average 60% lower crossbar requirement, 9x higher synapse utilization, 62% lower wasted energy on the hardware, and between 0.8% and 4.6% increase in model quality. △ Less

Submitted 19 September, 2020; originally announced September 2020.

Comments: Accepted for publication of IEEE Embedded Systems Letters

arXiv:2006.06777 [pdf, other]

Run-time Map** of Spiking Neural Networks to Neuromorphic Hardware

Authors: Adarsha Balaji, Thibaut Marty, Anup Das, Francky Catthoor

Abstract: In this paper, we propose a design methodology to partition and map the neurons and synapses of online learning SNN-based applications to neuromorphic architectures at {run-time}. Our design methodology operates in two steps -- step 1 is a layer-wise greedy approach to partition SNNs into clusters of neurons and synapses incorporating the constraints of the neuromorphic architecture, and step 2 is… ▽ More In this paper, we propose a design methodology to partition and map the neurons and synapses of online learning SNN-based applications to neuromorphic architectures at {run-time}. Our design methodology operates in two steps -- step 1 is a layer-wise greedy approach to partition SNNs into clusters of neurons and synapses incorporating the constraints of the neuromorphic architecture, and step 2 is a hill-climbing optimization algorithm that minimizes the total spikes communicated between clusters, improving energy consumption on the shared interconnect of the architecture. We conduct experiments to evaluate the feasibility of our algorithm using synthetic and realistic SNN-based applications. We demonstrate that our algorithm reduces SNN map** time by an average 780x compared to a state-of-the-art design-time based SNN partitioning approach with only 6.25\% lower solution quality. △ Less

Submitted 11 June, 2020; originally announced June 2020.

Comments: Accepted in Springer Journal of Signal Processing Systems

arXiv:2004.03717 [pdf, other]

doi 10.1145/3372799.3394364

Compiling Spiking Neural Networks to Neuromorphic Hardware

Authors: Shihao Song, Adarsha Balaji, Anup Das, Nagarajan Kandasamy, James Shackleford

Abstract: Machine learning applications that are implemented with spike-based computation model, e.g., Spiking Neural Network (SNN), have a great potential to lower the energy consumption when they are executed on a neuromorphic hardware. However, compiling and map** an SNN to the hardware is challenging, especially when compute and storage resources of the hardware (viz. crossbar) need to be shared among… ▽ More Machine learning applications that are implemented with spike-based computation model, e.g., Spiking Neural Network (SNN), have a great potential to lower the energy consumption when they are executed on a neuromorphic hardware. However, compiling and map** an SNN to the hardware is challenging, especially when compute and storage resources of the hardware (viz. crossbar) need to be shared among the neurons and synapses of the SNN. We propose an approach to analyze and compile SNNs on a resource-constrained neuromorphic hardware, providing guarantee on key performance metrics such as execution time and throughput. Our approach makes the following three key contributions. First, we propose a greedy technique to partition an SNN into clusters of neurons and synapses such that each cluster can fit on to the resources of a crossbar. Second, we exploit the rich semantics and expressiveness of Synchronous Dataflow Graphs (SDFGs) to represent a clustered SNN and analyze its performance using Max-Plus Algebra, considering the available compute and storage capacities, buffer sizes, and communication bandwidth. Third, we propose a self-timed execution-based fast technique to compile and admit SNN-based applications to a neuromorphic hardware at run-time, adapting dynamically to the available resources on the hardware. We evaluate our approach with standard SNN-based applications and demonstrate a significant performance improvement compared to current practices. △ Less

Submitted 12 May, 2020; v1 submitted 7 April, 2020; originally announced April 2020.

Comments: 10 pages, 17 figures, accepted at 21st ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES 2020)

arXiv:2003.09696 [pdf, other]

PyCARL: A PyNN Interface for Hardware-Software Co-Simulation of Spiking Neural Network

Authors: Adarsha Balaji, Prathyusha Adiraju, Hirak J. Kashyap, Anup Das, Jeffrey L. Krichmar, Nikil D. Dutt, Francky Catthoor

Abstract: We present PyCARL, a PyNN-based common Python programming interface for hardware-software co-simulation of spiking neural network (SNN). Through PyCARL, we make the following two key contributions. First, we provide an interface of PyNN to CARLsim, a computationally-efficient, GPU-accelerated and biophysically-detailed SNN simulator. PyCARL facilitates joint development of machine learning models… ▽ More We present PyCARL, a PyNN-based common Python programming interface for hardware-software co-simulation of spiking neural network (SNN). Through PyCARL, we make the following two key contributions. First, we provide an interface of PyNN to CARLsim, a computationally-efficient, GPU-accelerated and biophysically-detailed SNN simulator. PyCARL facilitates joint development of machine learning models and code sharing between CARLsim and PyNN users, promoting an integrated and larger neuromorphic community. Second, we integrate cycle-accurate models of state-of-the-art neuromorphic hardware such as TrueNorth, Loihi, and DynapSE in PyCARL, to accurately model hardware latencies that delay spikes between communicating neurons and degrade performance. PyCARL allows users to analyze and optimize the performance difference between software-only simulation and hardware-software co-simulation of their machine learning models. We show that system designers can also use PyCARL to perform design-space exploration early in the product development stage, facilitating faster time-to-deployment of neuromorphic products. We evaluate the memory usage and simulation time of PyCARL using functionality tests, synthetic SNNs, and realistic applications. Our results demonstrate that for large SNNs, PyCARL does not lead to any significant overhead compared to CARLsim. We also use PyCARL to analyze these SNNs for a state-of-the-art neuromorphic hardware and demonstrate a significant performance deviation from software-only simulations. PyCARL allows to evaluate and minimize such differences early during model development. △ Less

Submitted 12 May, 2020; v1 submitted 21 March, 2020; originally announced March 2020.

Comments: 10 pages, 25 figures. Accepted for publication at International Joint Conference on Neural Networks (IJCNN) 2020

arXiv:2003.06080 [pdf]

doi 10.1016/j.artmed.2021.102072

Coronary Artery Segmentation from Intravascular Optical Coherence Tomography Using Deep Capsules

Authors: Arjun Balaji, Lachlan Kelsey, Kamran Majeed, Carl Schultz, Barry Doyle

Abstract: The segmentation and analysis of coronary arteries from intravascular optical coherence tomography (IVOCT) is an important aspect of diagnosing and managing coronary artery disease. Current image processing methods are hindered by the time needed to generate expert-labelled datasets and the potential for bias during the analysis. Therefore, automated, robust, unbiased and timely geometry extractio… ▽ More The segmentation and analysis of coronary arteries from intravascular optical coherence tomography (IVOCT) is an important aspect of diagnosing and managing coronary artery disease. Current image processing methods are hindered by the time needed to generate expert-labelled datasets and the potential for bias during the analysis. Therefore, automated, robust, unbiased and timely geometry extraction from IVOCT, using image processing, would be beneficial to clinicians. With clinical application in mind, we aim to develop a model with a small memory footprint that is fast at inference time without sacrificing segmentation quality. Using a large IVOCT dataset of 12,011 expert-labelled images from 22 patients, we construct a new deep learning method based on capsules which automatically produces lumen segmentations. Our dataset contains images with both blood and light artefacts (22.8%), as well as metallic (23.1%) and bioresorbable stents (2.5%). We split the dataset into a training (70%), validation (20%) and test (10%) set and rigorously investigate design variations with respect to upsampling regimes and input selection. We show that our developments lead to a model, DeepCap, that is on par with state-of-the-art machine learning methods in terms of segmentation quality and robustness, while using as little as 12% of the parameters. This enables DeepCap to have per image inference times up to 70% faster on GPU and up to 95% faster on CPU compared to other state-of-the-art models. DeepCap is a robust automated segmentation tool that can aid clinicians to extract unbiased geometrical data from IVOCT. △ Less

Submitted 7 April, 2021; v1 submitted 12 March, 2020; originally announced March 2020.

Comments: This version has been accepted in Artificial Intelligence in Medicine. Main paper: 28 pages, 9 figures, 4 tables. Supplementary Material: 3 pages, 3 figures

arXiv:1911.00548 [pdf, other]

A Framework to Explore Workload-Specific Performance and Lifetime Trade-offs in Neuromorphic Computing

Authors: Adarsha Balaji, Shihao Song, Anup Das, Nikil Dutt, Jeff Krichmar, Nagarajan Kandasamy, Francky Catthoor

Abstract: Neuromorphic hardware with non-volatile memory (NVM) can implement machine learning workload in an energy-efficient manner. Unfortunately, certain NVMs such as phase change memory (PCM) require high voltages for correct operation. These voltages are supplied from an on-chip charge pump. If the charge pump is activated too frequently, its internal CMOS devices do not recover from stress, accelerati… ▽ More Neuromorphic hardware with non-volatile memory (NVM) can implement machine learning workload in an energy-efficient manner. Unfortunately, certain NVMs such as phase change memory (PCM) require high voltages for correct operation. These voltages are supplied from an on-chip charge pump. If the charge pump is activated too frequently, its internal CMOS devices do not recover from stress, accelerating their aging and leading to negative bias temperature instability (NBTI) generated defects. Forcefully discharging the stressed charge pump can lower the aging rate of its CMOS devices, but makes the neuromorphic hardware unavailable to perform computations while its charge pump is being discharged. This negatively impacts performance such as latency and accuracy of the machine learning workload being executed. In this paper, we propose a novel framework to exploit workload-specific performance and lifetime trade-offs in neuromorphic computing. Our framework first extracts the precise times at which a charge pump in the hardware is activated to support neural computations within a workload. This timing information is then used with a characterized NBTI reliability model to estimate the charge pump's aging during the workload execution. We use our framework to evaluate workload-specific performance and reliability impacts of using 1) different SNN map** strategies and 2) different charge pump discharge strategies. We show that our framework can be used by system designers to explore performance and reliability trade-offs early in the design of neuromorphic hardware such that appropriate reliability-oriented design margins can be set. △ Less

Submitted 1 November, 2019; originally announced November 2019.

Comments: 4 pages, 5 figures, 13 references, accepted for publication at IEEE Computer Architecture Letters

arXiv:1909.01843 [pdf, other]

Map** Spiking Neural Networks to Neuromorphic Hardware

Authors: Adarsha Balaji, Anup Das, Yuefeng Wu, Khanh Huynh, Francesco Dell'Anna, Giacomo Indiveri, Jeffrey L. Krichmar, Nikil Dutt, Siebren Schaafsma, Francky Catthoor

Abstract: Neuromorphic hardware platforms implement biological neurons and synapses to execute spiking neural networks (SNNs) in an energy-efficient manner. We present SpiNeMap, a design methodology to map SNNs to crossbar-based neuromorphic hardware, minimizing spike latency and energy consumption. SpiNeMap operates in two steps: SpiNeCluster and SpiNePlacer. SpiNeCluster is a heuristic-based clustering te… ▽ More Neuromorphic hardware platforms implement biological neurons and synapses to execute spiking neural networks (SNNs) in an energy-efficient manner. We present SpiNeMap, a design methodology to map SNNs to crossbar-based neuromorphic hardware, minimizing spike latency and energy consumption. SpiNeMap operates in two steps: SpiNeCluster and SpiNePlacer. SpiNeCluster is a heuristic-based clustering technique to partition SNNs into clusters of synapses, where intracluster local synapses are mapped within crossbars of the hardware and inter-cluster global synapses are mapped to the shared interconnect. SpiNeCluster minimizes the number of spikes on global synapses, which reduces spike congestion on the shared interconnect, improving application performance. SpiNePlacer then finds the best placement of local and global synapses on the hardware using a meta-heuristic-based approach to minimize energy consumption and spike latency. We evaluate SpiNeMap using synthetic and realistic SNNs on the DynapSE neuromorphic hardware. We show that SpiNeMap reduces average energy consumption by 45% and average spike latency by 21%, compared to state-of-the-art techniques. △ Less

Submitted 4 September, 2019; originally announced September 2019.

Comments: 14 pages, 14 images, 69 references, Accepted in IEEE Transactions on Very Large Scale Integration (VLSI) Systems

arXiv:1812.10636 [pdf]

Chart-Text: A Fully Automated Chart Image Descriptor

Authors: Abhijit Balaji, Thuvaarakkesh Ramanathan, Venkateshwarlu Sonathi

Abstract: Images greatly help in understanding, interpreting and visualizing data. Adding textual description to images is the first and foremost principle of web accessibility. Visually impaired users using screen readers will use these textual descriptions to get better understanding of images present in digital contents. In this paper, we propose Chart-Text a novel fully automated system that creates tex… ▽ More Images greatly help in understanding, interpreting and visualizing data. Adding textual description to images is the first and foremost principle of web accessibility. Visually impaired users using screen readers will use these textual descriptions to get better understanding of images present in digital contents. In this paper, we propose Chart-Text a novel fully automated system that creates textual description of chart images. Given a PNG image of a chart, our Chart-Text system creates a complete textual description of it. First, the system classifies the type of chart and then it detects and classifies the labels and texts in the charts. Finally, it uses specific image processing algorithms to extract relevant information from the chart images. Our proposed system achieves an accuracy of 99.72% in classifying the charts and an accuracy of 78.9% in extracting the data and creating the corresponding textual description. △ Less

Submitted 27 December, 2018; originally announced December 2018.

arXiv:1808.06492 [pdf, other]

Benchmarking Automatic Machine Learning Frameworks

Authors: Adithya Balaji, Alexander Allen

Abstract: AutoML serves as the bridge between varying levels of expertise when designing machine learning systems and expedites the data science process. A wide range of techniques is taken to address this, however there does not exist an objective comparison of these techniques. We present a benchmark of current open source AutoML solutions using open source datasets. We test auto-sklearn, TPOT, auto_ml, a… ▽ More AutoML serves as the bridge between varying levels of expertise when designing machine learning systems and expedites the data science process. A wide range of techniques is taken to address this, however there does not exist an objective comparison of these techniques. We present a benchmark of current open source AutoML solutions using open source datasets. We test auto-sklearn, TPOT, auto_ml, and H2O's AutoML solution against a compiled set of regression and classification datasets sourced from OpenML and find that auto-sklearn performs the best across classification datasets and TPOT performs the best across regression datasets. △ Less

Submitted 16 August, 2018; originally announced August 2018.

Comments: 9 pages, 8 figures, 5 tables

arXiv:1712.04978 [pdf, other]

Trajectory Optimization for Curvature Bounded Non-Holonomic Vehicles: Application to Autonomous Driving

Authors: Mithun Babu, Yash Oza, C. A. Balaji, Arun Kumar Singh, Bharath Gopakarishnan, K. Madhava Kirshna

Abstract: In this paper, we propose a trajectory optimization for computing smooth collision free trajectories for nonholonomic curvature bounded vehicles among static and dynamic obstacles. One of the key novelties of our formulation is a hierarchal optimization routine which alternately operates in the space of angular accelerations and linear velocities. That is, the optimization has a two layer structur… ▽ More In this paper, we propose a trajectory optimization for computing smooth collision free trajectories for nonholonomic curvature bounded vehicles among static and dynamic obstacles. One of the key novelties of our formulation is a hierarchal optimization routine which alternately operates in the space of angular accelerations and linear velocities. That is, the optimization has a two layer structure wherein angular accelerations are optimized kee** the linear velocities fixed and vice versa. If the vehicle/obstacles are modeled as circles than the velocity optimization layer can be shown to have the computationally efficient difference of convex structure commonly observed for linear systems. This leads to a less conservative approximation as compared to that obtained by approximating each polygon individually through its circumscribing circle. On the other hand, it leads to optimization problem with less number of constraints as compared to that obtained when approximating polygons as multiple overlap** circles. We use the proposed trajectory optimization as the basis for constructing a Model Predictive Control framework for navigating an autonomous car in complex scenarios like overtaking, lane changing and merging. Moreover, we also highlight the advantages provided by the alternating optimization routine. Specifically, we show it produces trajectories which have comparable arc lengths and smoothness as compared to those produced with joint simultaneous optimization in the space of angular accelerations and linear velocities. However, importantly, the alternating optimization provides some gain in computational time. △ Less

Submitted 8 March, 2018; v1 submitted 13 December, 2017; originally announced December 2017.

Comments: 7 pages

arXiv:1303.1913 [pdf]

Design and Development of Artificial Neural Networking (ANN) system using sigmoid activation function to predict annual rice production in Tamilnadu

Authors: S. Arun Balaji, K. Baskaran

Abstract: Prediction of annual rice production in all the 31 districts of Tamilnadu is an important decision for the Government of Tamilnadu. Rice production is a complex process and non linear problem involving soil, crop, weather, pest, disease, capital, labour and management parameters. ANN software was designed and developed with Feed Forward Back Propagation (FFBP) network to predict rice production. T… ▽ More Prediction of annual rice production in all the 31 districts of Tamilnadu is an important decision for the Government of Tamilnadu. Rice production is a complex process and non linear problem involving soil, crop, weather, pest, disease, capital, labour and management parameters. ANN software was designed and developed with Feed Forward Back Propagation (FFBP) network to predict rice production. The input layer has six independent variables like area of cultivation and rice production in three seasons like Kuruvai, Samba and Kodai. The popular sigmoid activation function was adopted to convert input data into sigmoid values. The hidden layer computes the summation of six sigmoid values with six sets of weightages. The final output was converted into sigmoid values using a sigmoid transfer function. ANN outputs are the predicted results. The error between original data and ANN output values were computed. A threshold value of 10-9 was used to test whether the error is greater than the threshold level. If the error is greater than threshold then updating of weights was done all summations were done by back propagation. This process was repeated until error equal to zero. The predicted results were printed and it was found to be exactly matching with the expected values. It shows that the ANN prediction was 100% accurate. △ Less

Submitted 8 March, 2013; originally announced March 2013.

Comments: 19 pages, 7 figures, published in the International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.3, No.1, February 2013

Report number: International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.3, No.1, February 2013 MSC Class: 14J60 (Primary) 14F05; 14J26 (Secondary) ACM Class: F.2.2; I.2.7

Journal ref: IJCSEIT, Vol.3, No.1, February 2013

Showing 1–25 of 25 results for author: Balaji, A