-
Scrutinizing Data from Sky: An Examination of Its Veracity in Area Based Traffic Contexts
Authors:
Yawar Ali,
Krishnan K N,
Debashis Ray Sarkar,
K. Ramachandra Rao,
Niladri Chatterjee,
Ashish Bhaskar
Abstract:
Traffic data collection has been an overwhelming task for researchers as well as authorities over the years. With the advancement in technology and introduction of various tools for processing and extracting traffic data the task has been made significantly convenient. Data from Sky (DFS) is one such tool, based on image processing and artificial intelligence (AI), that provides output for macrosc…
▽ More
Traffic data collection has been an overwhelming task for researchers as well as authorities over the years. With the advancement in technology and introduction of various tools for processing and extracting traffic data the task has been made significantly convenient. Data from Sky (DFS) is one such tool, based on image processing and artificial intelligence (AI), that provides output for macroscopic as well as microscopic variables of the traffic streams. The company claims to provide 98 to 100 percent accuracy on the data exported using DFS tool. The tool is widely used in developed countries where the traffic is homogenous and has lane-based movements. In this study, authors have checked the veracity of DFS tool in heterogenous and area-based traffic movement that is prevailing in most develo** countries. The validation is done using various methods using Classified Volume Count (CVC), Space Mean Speeds (SMS) of individual vehicle classes and microscopic trajectory of probe vehicle to verify DFS claim. The error for CVCs for each vehicle class present in the traffic stream is estimated. Mean Absolute Percentage Error (MAPE) values are calculated for average speeds of each vehicle class between manually and DFS extracted space mean speeds (SMSs), and the microscopic trajectories are validated using a GPS based tracker put on probe vehicles. The results are fairly accurate in the case of data taken from a bird eye view with least errors. The other configurations of data collection have some significant errors, that are majorly caused by the varied traffic composition, the view of camera angle, and the direction of traffic.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Prescribed Fire Modeling using Knowledge-Guided Machine Learning for Land Management
Authors:
Somya Sharma Chatterjee,
Kelly Lindsay,
Neel Chatterjee,
Rohan Patil,
Ilkay Altintas De Callafon,
Michael Steinbach,
Daniel Giron,
Mai H. Nguyen,
Vipin Kumar
Abstract:
In recent years, the increasing threat of devastating wildfires has underscored the need for effective prescribed fire management. Process-based computer simulations have traditionally been employed to plan prescribed fires for wildfire prevention. However, even simplified process models like QUIC-Fire are too compute-intensive to be used for real-time decision-making, especially when weather cond…
▽ More
In recent years, the increasing threat of devastating wildfires has underscored the need for effective prescribed fire management. Process-based computer simulations have traditionally been employed to plan prescribed fires for wildfire prevention. However, even simplified process models like QUIC-Fire are too compute-intensive to be used for real-time decision-making, especially when weather conditions change rapidly. Traditional ML methods used for fire modeling offer computational speedup but struggle with physically inconsistent predictions, biased predictions due to class imbalance, biased estimates for fire spread metrics (e.g., burned area, rate of spread), and generalizability in out-of-distribution wind conditions. This paper introduces a novel machine learning (ML) framework that enables rapid emulation of prescribed fires while addressing these concerns. By incorporating domain knowledge, the proposed method helps reduce physical inconsistencies in fuel density estimates in data-scarce scenarios. To overcome the majority class bias in predictions, we leverage pre-existing source domain data to augment training data and learn the spread of fire more effectively. Finally, we overcome the problem of biased estimation of fire spread metrics by incorporating a hierarchical modeling structure to capture the interdependence in fuel density and burned area. Notably, improvement in fire metric (e.g., burned area) estimates offered by our framework makes it useful for fire managers, who often rely on these fire metric estimates to make decisions about prescribed burn management. Furthermore, our framework exhibits better generalization capabilities than the other ML-based fire modeling methods across diverse wind conditions and ignition patterns.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
FL Games: A Federated Learning Framework for Distribution Shifts
Authors:
Sharut Gupta,
Kartik Ahuja,
Mohammad Havaei,
Niladri Chatterjee,
Yoshua Bengio
Abstract:
Federated learning aims to train predictive models for data that is distributed across clients, under the orchestration of a server. However, participating clients typically each hold data from a different distribution, which can yield to catastrophic generalization on data from a different client, which represents a new domain. In this work, we argue that in order to generalize better across non-…
▽ More
Federated learning aims to train predictive models for data that is distributed across clients, under the orchestration of a server. However, participating clients typically each hold data from a different distribution, which can yield to catastrophic generalization on data from a different client, which represents a new domain. In this work, we argue that in order to generalize better across non-i.i.d. clients, it is imperative to only learn correlations that are stable and invariant across domains. We propose FL GAMES, a game-theoretic framework for federated learning that learns causal features that are invariant across clients. While training to achieve the Nash equilibrium, the traditional best response strategy suffers from high-frequency oscillations. We demonstrate that FL GAMES effectively resolves this challenge and exhibits smooth performance curves. Further, FL GAMES scales well in the number of clients, requires significantly fewer communication rounds, and is agnostic to device heterogeneity. Through empirical evaluation, we demonstrate that FL GAMES achieves high out-of-distribution performance on various benchmarks.
△ Less
Submitted 31 October, 2022;
originally announced November 2022.
-
FL Games: A federated learning framework for distribution shifts
Authors:
Sharut Gupta,
Kartik Ahuja,
Mohammad Havaei,
Niladri Chatterjee,
Yoshua Bengio
Abstract:
Federated learning aims to train predictive models for data that is distributed across clients, under the orchestration of a server. However, participating clients typically each hold data from a different distribution, whereby predictive models with strong in-distribution generalization can fail catastrophically on unseen domains. In this work, we argue that in order to generalize better across n…
▽ More
Federated learning aims to train predictive models for data that is distributed across clients, under the orchestration of a server. However, participating clients typically each hold data from a different distribution, whereby predictive models with strong in-distribution generalization can fail catastrophically on unseen domains. In this work, we argue that in order to generalize better across non-i.i.d. clients, it is imperative to only learn correlations that are stable and invariant across domains. We propose FL Games, a game-theoretic framework for federated learning for learning causal features that are invariant across clients. While training to achieve the Nash equilibrium, the traditional best response strategy suffers from high-frequency oscillations. We demonstrate that FL Games effectively resolves this challenge and exhibits smooth performance curves. Further, FL Games scales well in the number of clients, requires significantly fewer communication rounds, and is agnostic to device heterogeneity. Through empirical evaluation, we demonstrate that FL Games achieves high out-of-distribution performance on various benchmarks.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations
Authors:
Qingyu Chen,
Alexis Allot,
Robert Leaman,
Rezarta Islamaj Doğan,
**gcheng Du,
Li Fang,
Kai Wang,
Shuo Xu,
Yuefu Zhang,
Parsa Bagherzadeh,
Sabine Bergler,
Aakash Bhatnagar,
Nidhir Bhavsar,
Yung-Chun Chang,
Sheng-Jie Lin,
Wentai Tang,
Hongtong Zhang,
Ilija Tavchioski,
Senja Pollak,
Shubo Tian,
**feng Zhang,
Yulia Otmakhova,
Antonio Jimeno Yepes,
Hang Dong,
Honghan Wu
, et al. (14 additional authors not shown)
Abstract:
The COVID-19 pandemic has been severely impacting global society since December 2019. Massive research has been undertaken to understand the characteristics of the virus and design vaccines and drugs. The related findings have been reported in biomedical literature at a rate of about 10,000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretatio…
▽ More
The COVID-19 pandemic has been severely impacting global society since December 2019. Massive research has been undertaken to understand the characteristics of the virus and design vaccines and drugs. The related findings have been reported in biomedical literature at a rate of about 10,000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretation. For instance, LitCovid is a literature database of COVID-19-related articles in PubMed, which has accumulated more than 200,000 articles with millions of accesses each month by users worldwide. One primary curation task is to assign up to eight topics (e.g., Diagnosis and Treatment) to the articles in LitCovid. Despite the continuing advances in biomedical text mining methods, few have been dedicated to topic annotations in COVID-19 literature. To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature. The BioCreative LitCovid dataset, consisting of over 30,000 articles with manually reviewed topics, was created for training and testing. It is one of the largest multilabel classification datasets in biomedical scientific literature. 19 teams worldwide participated and made 80 submissions in total. Most teams used hybrid systems based on transformers. The highest performing submissions achieved 0.8875, 0.9181, and 0.9394 for macro F1-score, micro F1-score, and instance-based F1-score, respectively. The level of participation and results demonstrate a successful track and help close the gap between dataset curation and method development. The dataset is publicly available via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/ for benchmarking and further development.
△ Less
Submitted 3 June, 2022; v1 submitted 20 April, 2022;
originally announced April 2022.
-
Interpretation of Black Box NLP Models: A Survey
Authors:
Shivani Choudhary,
Niladri Chatterjee,
Subir Kumar Saha
Abstract:
An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME, are widely us…
▽ More
An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME, are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method.
△ Less
Submitted 31 March, 2022;
originally announced March 2022.
-
Approximate Bayesian Computation for Physical Inverse Modeling
Authors:
Neel Chatterjee,
Somya Sharma,
Sarah Swisher,
Snigdhansu Chatterjee
Abstract:
Semiconductor device models are essential to understand the charge transport in thin film transistors (TFTs). Using these TFT models to draw inference involves estimating parameters used to fit to the experimental data. These experimental data can involve extracted charge carrier mobility or measured current. Estimating these parameters help us draw inferences about device performance. Fitting a T…
▽ More
Semiconductor device models are essential to understand the charge transport in thin film transistors (TFTs). Using these TFT models to draw inference involves estimating parameters used to fit to the experimental data. These experimental data can involve extracted charge carrier mobility or measured current. Estimating these parameters help us draw inferences about device performance. Fitting a TFT model for a given experimental data using the model parameters relies on manual fine tuning of multiple parameters by human experts. Several of these parameters may have confounding effects on the experimental data, making their individual effect extraction a non-intuitive process during manual tuning. To avoid this convoluted process, we propose a new method for automating the model parameter extraction process resulting in an accurate model fitting. In this work, model choice based approximate Bayesian computation (aBc) is used for generating the posterior distribution of the estimated parameters using observed mobility at various gate voltage values. Furthermore, it is shown that the extracted parameters can be accurately predicted from the mobility curves using gradient boosted trees. This work also provides a comparative analysis of the proposed framework with fine-tuned neural networks wherein the proposed framework is shown to perform better.
△ Less
Submitted 25 November, 2021;
originally announced November 2021.
-
GPU Domain Specialization via Composable On-Package Architecture
Authors:
Yaosheng Fu,
Evgeny Bolotin,
Niladrish Chatterjee,
David Nellans,
Stephen W. Keckler
Abstract:
As GPUs scale their low precision matrix math throughput to boost deep learning (DL) performance, they upset the balance between math throughput and memory system capabilities. We demonstrate that converged GPU design trying to address diverging architectural requirements between FP32 (or larger) based HPC and FP16 (or smaller) based DL workloads results in sub-optimal configuration for either of…
▽ More
As GPUs scale their low precision matrix math throughput to boost deep learning (DL) performance, they upset the balance between math throughput and memory system capabilities. We demonstrate that converged GPU design trying to address diverging architectural requirements between FP32 (or larger) based HPC and FP16 (or smaller) based DL workloads results in sub-optimal configuration for either of the application domains. We argue that a Composable On-PAckage GPU (COPAGPU) architecture to provide domain-specialized GPU products is the most practical solution to these diverging requirements. A COPA-GPU leverages multi-chip-module disaggregation to support maximal design reuse, along with memory system specialization per application domain. We show how a COPA-GPU enables DL-specialized products by modular augmentation of the baseline GPU architecture with up to 4x higher off-die bandwidth, 32x larger on-package cache, 2.3x higher DRAM bandwidth and capacity, while conveniently supporting scaled-down HPC-oriented designs. This work explores the microarchitectural design necessary to enable composable GPUs and evaluates the benefits composability can provide to HPC, DL training, and DL inference. We show that when compared to a converged GPU design, a DL-optimized COPA-GPU featuring a combination of 16x larger cache capacity and 1.6x higher DRAM bandwidth scales per-GPU training and inference performance by 31% and 35% respectively and reduces the number of GPU instances by 50% in scale-out training scenarios.
△ Less
Submitted 5 April, 2021;
originally announced April 2021.
-
Automatic Extraction of Agriculture Terms from Domain Text: A Survey of Tools and Techniques
Authors:
Niladri Chatterjee,
Neha Kaushik
Abstract:
Agriculture is a key component in any country's development. Domain-specific knowledge resources serve to gain insight into the domain. Existing knowledge resources such as AGROVOC and NAL Thesaurus are developed and maintained by the domain experts. Population of terms into these knowledge resources can be automated by using automatic term extraction tools for processing unstructured agricultural…
▽ More
Agriculture is a key component in any country's development. Domain-specific knowledge resources serve to gain insight into the domain. Existing knowledge resources such as AGROVOC and NAL Thesaurus are developed and maintained by the domain experts. Population of terms into these knowledge resources can be automated by using automatic term extraction tools for processing unstructured agricultural text. Automatic term extraction is also a key component in many semantic web applications, such as ontology creation, recommendation systems, sentiment classification, query expansion among others. The primary goal of an automatic term extraction system is to maximize the number of valid terms and minimize the number of invalid terms extracted from the input set of documents. Despite its importance in various applications, the availability of online tools for the said purpose is rather limited. Moreover, the performance of the most popular ones among them varies significantly. As a consequence, selection of the right term extraction tool is perceived as a serious problem for different knowledge-based applications. This paper presents an analysis of three commonly used term extraction tools, viz. RAKE, TerMine, TermRaider and compares their performance in terms of precision and recall, vis-a-vis RENT, a more recent term extractor developed by these authors for agriculture domain.
△ Less
Submitted 24 September, 2020;
originally announced September 2020.
-
Probabilistic Random Indexing for Continuous Event Detection
Authors:
Yashank Singh,
Niladri Chatterjee
Abstract:
The present paper explores a novel variant of Random Indexing (RI) based representations for encoding language data with a view to using them in a dynamic scenario where events are happening in a continuous fashion. As the size of the representations in the general method of onehot encoding grows linearly with the size of the vocabulary, they become non-scalable for online purposes with high volum…
▽ More
The present paper explores a novel variant of Random Indexing (RI) based representations for encoding language data with a view to using them in a dynamic scenario where events are happening in a continuous fashion. As the size of the representations in the general method of onehot encoding grows linearly with the size of the vocabulary, they become non-scalable for online purposes with high volumes of dynamic data. On the other hand, existing pre-trained embedding models are not suitable for detecting happenings of new events due to the dynamic nature of the text data. The present work addresses this issue by using a novel RI representation by imposing a probability distribution on the number of randomized entries which leads to a class of RI representations. It also provides a rigorous analysis of the goodness of the representation methods to encode semantic information in terms of the probability of orthogonality. Building on these ideas we propose an algorithm that is log-linear with the size of vocabulary to track the semantic relationship of a query word to other words for suggesting the events that are relevant to the word in question. We ran simulations using the proposed algorithm for tweet data specific to three different events and present our findings. The proposed probabilistic RI representations are found to be much faster and scalable than Bag of Words (BoW) embeddings while maintaining accuracy in depicting semantic relationships.
△ Less
Submitted 9 December, 2021; v1 submitted 28 August, 2020;
originally announced August 2020.
-
An improved Bayesian TRIE based model for SMS text normalization
Authors:
Abhinava Sikdar,
Niladri Chatterjee
Abstract:
Normalization of SMS text, commonly known as texting language, is being pursued for more than a decade. A probabilistic approach based on the Trie data structure was proposed in literature which was found to be better performing than HMM based approaches proposed earlier in predicting the correct alternative for an out-of-lexicon word. However, success of the Trie based approach depends largely on…
▽ More
Normalization of SMS text, commonly known as texting language, is being pursued for more than a decade. A probabilistic approach based on the Trie data structure was proposed in literature which was found to be better performing than HMM based approaches proposed earlier in predicting the correct alternative for an out-of-lexicon word. However, success of the Trie based approach depends largely on how correctly the underlying probabilities of word occurrences are estimated. In this work we propose a structural modification to the existing Trie-based model along with a novel training algorithm and probability generation scheme. We prove two theorems on statistical properties of the proposed Trie and use them to claim that is an unbiased and consistent estimator of the occurrence probabilities of the words. We further fuse our model into the paradigm of noisy channel based error correction and provide a heuristic to go beyond a Damerau Levenshtein distance of one. We also run simulations to support our claims and show superiority of the proposed scheme over previous works.
△ Less
Submitted 18 November, 2020; v1 submitted 3 August, 2020;
originally announced August 2020.
-
Examining Lead-Lag Relationships In-Depth, With Focus On FX Market As Covid-19 Crises Unfolds
Authors:
Kartikay Gupta,
Niladri Chatterjee
Abstract:
The lead-lag relationship plays a vital role in financial markets. It is the phenomenon where a certain price-series lags behind and partially replicates the movement of leading time-series. The present research proposes a new technique which helps better identify the lead-lag relationship empirically. Apart from better identifying the lead-lag path, the technique also gives a measure for adjudgin…
▽ More
The lead-lag relationship plays a vital role in financial markets. It is the phenomenon where a certain price-series lags behind and partially replicates the movement of leading time-series. The present research proposes a new technique which helps better identify the lead-lag relationship empirically. Apart from better identifying the lead-lag path, the technique also gives a measure for adjudging closeness between financial time-series. Also, the proposed measure is closely related to correlation, and it uses Dynamic Programming technique for finding the optimal lead-lag path. Further, it retains most of the properties of a metric, so much so, it is termed as loose metric. Tests are performed on Synthetic Time Series (STS) with known lead-lag relationship and comparisons are done with other state-of-the-art models on the basis of significance and forecastability. The proposed technique gives the best results in both the tests. It finds paths which are all statistically significant, and its forecasts are closest to the target values. Then, we use the measure to study the topology evolution of the Foreign Exchange market, as the COVID-19 pandemic unfolds. Here, we study the FX currency prices of 29 prominent countries of the world. It is observed that as the crises unfold, all the currencies become strongly interlinked to each other. Also, USA Dollar starts playing even more central role in the FX market. Finally, we mention several other application areas of the proposed technique for designing intelligent systems.
△ Less
Submitted 9 May, 2020; v1 submitted 22 April, 2020;
originally announced April 2020.
-
Rough Set based Aggregate Rank Measure & its Application to Supervised Multi Document Summarization
Authors:
Nidhika Yadav,
Niladri Chatterjee
Abstract:
Most problems in Machine Learning cater to classification and the objects of universe are classified to a relevant class. Ranking of classified objects of universe per decision class is a challenging problem. We in this paper propose a novel Rough Set based membership called Rank Measure to solve to this problem. It shall be utilized for ranking the elements to a particular class. It differs from…
▽ More
Most problems in Machine Learning cater to classification and the objects of universe are classified to a relevant class. Ranking of classified objects of universe per decision class is a challenging problem. We in this paper propose a novel Rough Set based membership called Rank Measure to solve to this problem. It shall be utilized for ranking the elements to a particular class. It differs from Pawlak Rough Set based membership function which gives an equivalent characterization of the Rough Set based approximations. It becomes paramount to look beyond the traditional approach of computing memberships while handling inconsistent, erroneous and missing data that is typically present in real world problems. This led us to propose the aggregate Rank Measure. The contribution of the paper is three fold. Firstly, it proposes a Rough Set based measure to be utilized for numerical characterization of within class ranking of objects. Secondly, it proposes and establish the properties of Rank Measure and aggregate Rank Measure based membership. Thirdly, we apply the concept of membership and aggregate ranking to the problem of supervised Multi Document Summarization wherein first the important class of sentences are determined using various supervised learning techniques and are post processed using the proposed ranking measure. The results proved to have significant improvement in accuracy.
△ Less
Submitted 8 February, 2020;
originally announced February 2020.
-
Runtime Mitigation of Packet Drop Attacks in Fault-tolerant Networks-on-Chip
Authors:
N Prasad,
Navonil Chatterjee,
Santanu Chattopadhyay,
Indrajit Chakrabarti
Abstract:
Fault-tolerant routing (FTR) in Networks-on-Chip (NoCs) has become a common practice to sustain the performance of multi-core systems with an increasing number of faults on a chip. On the other hand, usage of third-party intellectual property blocks has made security a primary concern in modern day designs. This article presents a mechanism to mitigate a denial-of-service attack, namely packet dro…
▽ More
Fault-tolerant routing (FTR) in Networks-on-Chip (NoCs) has become a common practice to sustain the performance of multi-core systems with an increasing number of faults on a chip. On the other hand, usage of third-party intellectual property blocks has made security a primary concern in modern day designs. This article presents a mechanism to mitigate a denial-of-service attack, namely packet drop attack, which may arise due to the hardware Trojans (HTs) in NoCs that adopt FTR algorithms. HTs, associated with external kill switches, are conditionally triggered to enable the attack scenario. Security modules, such as authentication unit, buffer shuffler, and control unit, have been proposed to thwart the attack in runtime and restore secure packet flow in the NoC. These units work together as a shield to safeguard the packets from proceeding towards the output ports with faulty links. Synthesis results show that the proposed secure FT router, when compared with a baseline FT router, has area and power overheads of at most 4.04% and 0.90%, respectively. Performance evaluation shows that SeFaR has acceptable overheads in the execution time, energy consumption, average packet latency, and power-latency product metrics when compared with a baseline FT router while running real benchmarks, as well as synthetic traffic. Further, a possible design of a comprehensive secure router has been presented with a view to addressing and mitigating multiple attacks that can arise in the NoC routers.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.
-
Selecting stock pairs for pairs trading while incorporating lead-lag relationship
Authors:
Kartikay Gupta,
Niladri Chatterjee
Abstract:
Pairs Trading is carried out in the financial market to earn huge profits from known equilibrium relation between pairs of stock. In financial markets, seldom it is seen that stock pairs are correlated at particular lead or lag. This lead-lag relationship has been empirically studied in various financial markets. Earlier research works have suggested various measures for identifying the best pairs…
▽ More
Pairs Trading is carried out in the financial market to earn huge profits from known equilibrium relation between pairs of stock. In financial markets, seldom it is seen that stock pairs are correlated at particular lead or lag. This lead-lag relationship has been empirically studied in various financial markets. Earlier research works have suggested various measures for identifying the best pairs for pairs trading, but they do not consider this lead-lag effect. The present study proposes a new distance measure which incorporates the lead-lag relationship between the stocks while selecting the best pairs for pairs trading. Further, the lead-lag value between the stocks is allowed to vary continuously over time. The proposed measures importance has been show-cased through experimentation on two different datasets, one corresponding to Indian companies and another corresponding to American companies. When the proposed measure is clubbed with SSD measure, i.e., when pairs are identified through optimising both these measures, then the selected pairs consistently generate the best profit, as compared to all other measures. Finally, possible generalisation and extension of the proposed distance measure have been discussed.
△ Less
Submitted 31 December, 2019; v1 submitted 12 June, 2019;
originally announced June 2019.
-
DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis
Authors:
Sangkug Lym,
Donghyuk Lee,
Mike O'Connor,
Niladrish Chatterjee,
Mattan Erez
Abstract:
Training convolutional neural networks (CNNs) requires intense compute throughput and high memory bandwidth. Especially, convolution layers account for the majority of the execution time of CNN training, and GPUs are commonly used to accelerate these layer workloads. GPU design optimization for efficient CNN training acceleration requires the accurate modeling of how their performance improves whe…
▽ More
Training convolutional neural networks (CNNs) requires intense compute throughput and high memory bandwidth. Especially, convolution layers account for the majority of the execution time of CNN training, and GPUs are commonly used to accelerate these layer workloads. GPU design optimization for efficient CNN training acceleration requires the accurate modeling of how their performance improves when computing and memory resources are increased. We present DeLTA, the first analytical model that accurately estimates the traffic at each GPU memory hierarchy level, while accounting for the complex reuse patterns of a parallel convolution algorithm. We demonstrate that our model is both accurate and robust for different CNNs and GPU architectures. We then show how this model can be used to carefully balance the scaling of different GPU resources for efficient CNN performance improvement.
△ Less
Submitted 2 April, 2019;
originally announced April 2019.
-
Top performing stocks recommendation strategy for portfolio
Authors:
Kartikay Gupta,
Niladri Chatterjee
Abstract:
Stock return forecasting is of utmost importance in the business world. This has been the favourite topic of research for many academicians since decades. Recently, regularization techniques have reported to tremendously increase the forecast accuracy of the simple regression model. Still, this model cannot incorporate the effect of things like a major natural disaster, large foreign influence, et…
▽ More
Stock return forecasting is of utmost importance in the business world. This has been the favourite topic of research for many academicians since decades. Recently, regularization techniques have reported to tremendously increase the forecast accuracy of the simple regression model. Still, this model cannot incorporate the effect of things like a major natural disaster, large foreign influence, etc. in its prediction. Such things affect the whole stock market and are very unpredictable. Thus, it is more important to recommend top stocks rather than predicting exact stock returns. The present paper modifies the regression task to output value for each stock which is more suitable for ranking the stocks by expected returns. Two large datasets consisting of altogether 1205 companies listed at Indian exchanges were used for experimentation. Five different metrics were used for evaluating the different models. Results were also analysed subjectively through plots. The results showed the superiority of the proposed techniques.
△ Less
Submitted 10 August, 2019; v1 submitted 30 January, 2019;
originally announced January 2019.
-
What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study
Authors:
Saugata Ghose,
Abdullah Giray Yağlıkçı,
Raghav Gupta,
Donghyuk Lee,
Kais Kudrolli,
William X. Liu,
Hasan Hassan,
Kevin K. Chang,
Niladrish Chatterjee,
Aditya Agrawal,
Mike O'Connor,
Onur Mutlu
Abstract:
Main memory (DRAM) consumes as much as half of the total system power in a computer today, resulting in a growing need to develop new DRAM architectures and systems that consume less power. Researchers have long relied on DRAM power models that are based off of standardized current measurements provided by vendors, called IDD values. Unfortunately, we find that these models are highly inaccurate,…
▽ More
Main memory (DRAM) consumes as much as half of the total system power in a computer today, resulting in a growing need to develop new DRAM architectures and systems that consume less power. Researchers have long relied on DRAM power models that are based off of standardized current measurements provided by vendors, called IDD values. Unfortunately, we find that these models are highly inaccurate, and do not reflect the actual power consumed by real DRAM devices.
We perform the first comprehensive experimental characterization of the power consumed by modern real-world DRAM modules. Our extensive characterization of 50 DDR3L DRAM modules from three major vendors yields four key new observations about DRAM power consumption: (1) across all IDD values that we measure, the current consumed by real DRAM modules varies significantly from the current specified by the vendors; (2) DRAM power consumption strongly depends on the data value that is read or written; (3) there is significant structural variation, where the same banks and rows across multiple DRAM modules from the same model consume more power than other banks or rows; and (4) over successive process technology generations, DRAM power consumption has not decreased by as much as vendor specifications have indicated.
Based on our detailed analysis and characterization data, we develop the Variation-Aware model of Memory Power Informed by Real Experiments (VAMPIRE). We show that VAMPIRE has a mean absolute percentage error of only 6.8% compared to actual measured DRAM power. VAMPIRE enables a wide range of studies that were not possible using prior DRAM power models. As an example, we use VAMPIRE to evaluate a new power-aware data encoding mechanism, which can reduce DRAM energy consumption by an average of 12.2%. We plan to open-source both VAMPIRE and our extensive raw data collected during our experimental characterization.
△ Less
Submitted 13 July, 2018;
originally announced July 2018.
-
Voltron: Understanding and Exploiting the Voltage-Latency-Reliability Trade-Offs in Modern DRAM Chips to Improve Energy Efficiency
Authors:
Kevin K. Chang,
Abdullah Giray Yaglıkçı,
Saugata Ghose,
Aditya Agrawal,
Niladrish Chatterjee,
Abhijith Kashyap,
Donghyuk Lee,
Mike O'Connor,
Hasan Hassan,
Onur Mutlu
Abstract:
This paper summarizes our work on experimental characterization and analysis of reduced-voltage operation in modern DRAM chips, which was published in SIGMETRICS 2017, and examines the work's significance and future potential.
We take a comprehensive approach to understanding and exploiting the latency and reliability characteristics of modern DRAM when the DRAM supply voltage is lowered below t…
▽ More
This paper summarizes our work on experimental characterization and analysis of reduced-voltage operation in modern DRAM chips, which was published in SIGMETRICS 2017, and examines the work's significance and future potential.
We take a comprehensive approach to understanding and exploiting the latency and reliability characteristics of modern DRAM when the DRAM supply voltage is lowered below the nominal voltage level specified by DRAM standards. We perform an experimental study of 124 real DDR3L (low-voltage) DRAM chips manufactured recently by three major DRAM vendors. We find that reducing the supply voltage below a certain point introduces bit errors in the data, and we comprehensively characterize the behavior of these errors. We discover that these errors can be avoided by increasing the latency of three major DRAM operations (activation, restoration, and precharge). We perform detailed DRAM circuit simulations to validate and explain our experimental findings. We also characterize the various relationships between reduced supply voltage and error locations, stored data patterns, DRAM temperature, and data retention.
Based on our observations, we propose a new DRAM energy reduction mechanism, called Voltron. The key idea of Voltron is to use a performance model to determine by how much we can reduce the supply voltage without introducing errors and without exceeding a user-specified threshold for performance loss. Our evaluations show that Voltron reduces the average DRAM and system energy consumption by 10.5% and 7.3%, respectively, while limiting the average system performance loss to only 1.8%, for a variety of memory-intensive quad-core workloads. We also show that Voltron significantly outperforms prior dynamic voltage and frequency scaling mechanisms for DRAM.
△ Less
Submitted 8 May, 2018;
originally announced May 2018.
-
Understanding Reduced-Voltage Operation in Modern DRAM Chips: Characterization, Analysis, and Mechanisms
Authors:
Kevin K. Chang,
Abdullah Giray Yağlıkçı,
Saugata Ghose,
Aditya Agrawal,
Niladrish Chatterjee,
Abhijith Kashyap,
Donghyuk Lee,
Mike O'Connor,
Hasan Hassan,
Onur Mutlu
Abstract:
The energy consumption of DRAM is a critical concern in modern computing systems. Improvements in manufacturing process technology have allowed DRAM vendors to lower the DRAM supply voltage conservatively, which reduces some of the DRAM energy consumption. We would like to reduce the DRAM supply voltage more aggressively, to further reduce energy. Aggressive supply voltage reduction requires a tho…
▽ More
The energy consumption of DRAM is a critical concern in modern computing systems. Improvements in manufacturing process technology have allowed DRAM vendors to lower the DRAM supply voltage conservatively, which reduces some of the DRAM energy consumption. We would like to reduce the DRAM supply voltage more aggressively, to further reduce energy. Aggressive supply voltage reduction requires a thorough understanding of the effect voltage scaling has on DRAM access latency and DRAM reliability.
In this paper, we take a comprehensive approach to understanding and exploiting the latency and reliability characteristics of modern DRAM when the supply voltage is lowered below the nominal voltage level specified by DRAM standards. Using an FPGA-based testing platform, we perform an experimental study of 124 real DDR3L (low-voltage) DRAM chips manufactured recently by three major DRAM vendors. We find that reducing the supply voltage below a certain point introduces bit errors in the data, and we comprehensively characterize the behavior of these errors. We discover that these errors can be avoided by increasing the latency of three major DRAM operations (activation, restoration, and precharge). We perform detailed DRAM circuit simulations to validate and explain our experimental findings. We also characterize the various relationships between reduced supply voltage and error locations, stored data patterns, DRAM temperature, and data retention.
Based on our observations, we propose a new DRAM energy reduction mechanism, called Voltron. The key idea of Voltron is to use a performance model to determine by how much we can reduce the supply voltage without introducing errors and without exceeding a user-specified threshold for performance loss. Voltron reduces the average system energy by 7.3% while limiting the average system performance loss to only 1.8%, for a variety of workloads.
△ Less
Submitted 29 May, 2017;
originally announced May 2017.
-
Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks
Authors:
Minsoo Rhu,
Mike O'Connor,
Niladrish Chatterjee,
Jeff Pool,
Stephen W. Keckler
Abstract:
Popular deep learning frameworks require users to fine-tune their memory usage so that the training data of a deep neural network (DNN) fits within the GPU physical memory. Prior work tries to address this restriction by virtualizing the memory usage of DNNs, enabling both CPU and GPU memory to be utilized for memory allocations. Despite its merits, virtualizing memory can incur significant perfor…
▽ More
Popular deep learning frameworks require users to fine-tune their memory usage so that the training data of a deep neural network (DNN) fits within the GPU physical memory. Prior work tries to address this restriction by virtualizing the memory usage of DNNs, enabling both CPU and GPU memory to be utilized for memory allocations. Despite its merits, virtualizing memory can incur significant performance overheads when the time needed to copy data back and forth from CPU memory is higher than the latency to perform the computations required for DNN forward and backward propagation. We introduce a high-performance virtualization strategy based on a "compressing DMA engine" (cDMA) that drastically reduces the size of the data structures that are targeted for CPU-side allocations. The cDMA engine offers an average 2.6x (maximum 13.8x) compression ratio by exploiting the sparsity inherent in offloaded data, improving the performance of virtualized DNNs by an average 32% (maximum 61%).
△ Less
Submitted 3 May, 2017;
originally announced May 2017.