-
SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile
Authors:
Wei Niu,
Md Musfiqur Rahman Sanim,
Zhihao Shu,
Jiexiong Guan,
Xipeng Shen,
Miao Yin,
Gagan Agrawal,
Bin Ren
Abstract:
This work is motivated by recent developments in Deep Neural Networks, particularly the Transformer architectures underlying applications such as ChatGPT, and the need for performing inference on mobile devices. Focusing on emerging transformers (specifically the ones with computationally efficient Swin-like architectures) and large models (e.g., Stable Diffusion and LLMs) based on transformers, w…
▽ More
This work is motivated by recent developments in Deep Neural Networks, particularly the Transformer architectures underlying applications such as ChatGPT, and the need for performing inference on mobile devices. Focusing on emerging transformers (specifically the ones with computationally efficient Swin-like architectures) and large models (e.g., Stable Diffusion and LLMs) based on transformers, we observe that layout transformations between the computational operators cause a significant slowdown in these applications. This paper presents SmartMem, a comprehensive framework for eliminating most layout transformations, with the idea that multiple operators can use the same tensor layout through careful choice of layout and implementation of operations. Our approach is based on classifying the operators into four groups, and considering combinations of producer-consumer edges between the operators. We develop a set of methods for searching such layouts. Another component of our work is develo** efficient memory layouts for 2.5 dimensional memory commonly seen in mobile devices. Our experimental results show that SmartMem outperforms 5 state-of-the-art DNN execution frameworks on mobile devices across 18 varied neural networks, including CNNs, Transformers with both local and global attention, as well as LLMs. In particular, compared to DNNFusion, SmartMem achieves an average speedup of 2.8$\times$, and outperforms TVM and MNN with speedups of 6.9$\times$ and 7.9$\times$, respectively, on average.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
A Survey of AI-generated Text Forensic Systems: Detection, Attribution, and Characterization
Authors:
Tharindu Kumarage,
Garima Agrawal,
Paras Sheth,
Raha Moraffah,
Aman Chadha,
Joshua Garland,
Huan Liu
Abstract:
We have witnessed lately a rapid proliferation of advanced Large Language Models (LLMs) capable of generating high-quality text. While these LLMs have revolutionized text generation across various domains, they also pose significant risks to the information ecosystem, such as the potential for generating convincing propaganda, misinformation, and disinformation at scale. This paper offers a review…
▽ More
We have witnessed lately a rapid proliferation of advanced Large Language Models (LLMs) capable of generating high-quality text. While these LLMs have revolutionized text generation across various domains, they also pose significant risks to the information ecosystem, such as the potential for generating convincing propaganda, misinformation, and disinformation at scale. This paper offers a review of AI-generated text forensic systems, an emerging field addressing the challenges of LLM misuses. We present an overview of the existing efforts in AI-generated text forensics by introducing a detailed taxonomy, focusing on three primary pillars: detection, attribution, and characterization. These pillars enable a practical understanding of AI-generated text, from identifying AI-generated content (detection), determining the specific AI model involved (attribution), and grou** the underlying intents of the text (characterization). Furthermore, we explore available resources for AI-generated text forensics research and discuss the evolving challenges and future directions of forensic systems in an AI era.
△ Less
Submitted 2 March, 2024;
originally announced March 2024.
-
SoD$^2$: Statically Optimizing Dynamic Deep Neural Network
Authors:
Wei Niu,
Gagan Agrawal,
Bin Ren
Abstract:
Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs. Dynamic DNNs, where tensor shapes and sizes and even the set of operators used are dependent upon the input and/or execution, are becoming common. This paper presents SoD$^2$, a comprehensive framework for optimizing Dynamic DNNs. The basis of our approach is a class…
▽ More
Though many compilation and runtime systems have been developed for DNNs in recent years, the focus has largely been on static DNNs. Dynamic DNNs, where tensor shapes and sizes and even the set of operators used are dependent upon the input and/or execution, are becoming common. This paper presents SoD$^2$, a comprehensive framework for optimizing Dynamic DNNs. The basis of our approach is a classification of common operators that form DNNs, and the use of this classification towards a Rank and Dimension Propagation (RDP) method. This framework statically determines the shapes of operators as known constants, symbolic constants, or operations on these. Next, using RDP we enable a series of optimizations, like fused code generation, execution (order) planning, and even runtime memory allocation plan generation. By evaluating the framework on 10 emerging Dynamic DNNs and comparing it against several existing systems, we demonstrate both reductions in execution latency and memory requirements, with RDP-enabled key optimizations responsible for much of the gains. Our evaluation results show that SoD$^2$ runs up to $3.9\times$ faster than these systems while saving up to $88\%$ peak memory consumption.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Can Knowledge Graphs Reduce Hallucinations in LLMs? : A Survey
Authors:
Garima Agrawal,
Tharindu Kumarage,
Zeyad Alghamdi,
Huan Liu
Abstract:
The contemporary LLMs are prone to producing hallucinations, stemming mainly from the knowledge gaps within the models. To address this critical limitation, researchers employ diverse strategies to augment the LLMs by incorporating external knowledge, aiming to reduce hallucinations and enhance reasoning accuracy. Among these strategies, leveraging knowledge graphs as a source of external informat…
▽ More
The contemporary LLMs are prone to producing hallucinations, stemming mainly from the knowledge gaps within the models. To address this critical limitation, researchers employ diverse strategies to augment the LLMs by incorporating external knowledge, aiming to reduce hallucinations and enhance reasoning accuracy. Among these strategies, leveraging knowledge graphs as a source of external information has demonstrated promising results. In this survey, we comprehensively review these knowledge-graph-based augmentation techniques in LLMs, focusing on their efficacy in mitigating hallucinations. We systematically categorize these methods into three overarching groups, offering methodological comparisons and performance evaluations. Lastly, this survey explores the current trends and challenges associated with these techniques and outlines potential avenues for future research in this emerging field.
△ Less
Submitted 15 March, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Beam propagation in an active nonlinear graded-index fiber
Authors:
Anuj P. Lara,
Samudra Roy,
Govind P. Agrawal
Abstract:
A theoretical model is developed by exploiting the variational technique to investigate the evolution of an optical beam inside an optically pumped graded-index fiber amplifier. The variational analysis is a semi-analytical method that provides us with a set of coupled ordinary differential equations for the beam's four parameters. Numerical solution of these equations is much faster compared to t…
▽ More
A theoretical model is developed by exploiting the variational technique to investigate the evolution of an optical beam inside an optically pumped graded-index fiber amplifier. The variational analysis is a semi-analytical method that provides us with a set of coupled ordinary differential equations for the beam's four parameters. Numerical solution of these equations is much faster compared to the underlying multidimensional nonlinear wave equation. We compare the results of the variational and full numerical simulations for the two pum** schemes used commonly for high-power fiber amplifiers. In the clad-pum** scheme, the use of a relatively wide pump beam results in a nearly uniform gain all along the fiber. In the case of edge pum**, a narrower pump beam provides gain that varies both radially and axially along the fiber's length. In both cases, the variational results are found to be in good agreement with time-consuming full numerical simulations. We also derive a single equation for the beam's width that can predict amplification-induced narrowing of the signal beam in most cases of practical interest.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Trade-off between Noise and Banding in a Quantum Adder with Qudits
Authors:
Gaurang Agrawal,
Tanoy Kanti Konar,
Leela Ganesh Chandra Lakkaraju,
Aditi Sen De
Abstract:
Quantum addition based on the quantum Fourier transform can be an integral part of a quantum circuit and proved to be more efficient than the existing classical ripple carry adder. Our study includes identifying the quantum resource required in a quantum adder in any arbitrary dimension and its relationship with the performance indicator in the presence of local noise acting on the circuit and whe…
▽ More
Quantum addition based on the quantum Fourier transform can be an integral part of a quantum circuit and proved to be more efficient than the existing classical ripple carry adder. Our study includes identifying the quantum resource required in a quantum adder in any arbitrary dimension and its relationship with the performance indicator in the presence of local noise acting on the circuit and when a limited number of controlled rotation operations is permitted, a procedure known as banding. We analytically prove an upper bound on the number of the controlled rotation gates required to accomplish the quantum addition up to an arbitrary defect in the fidelity between the desired and imperfect output. When the environment interacts with individual qudits, we establish a connection between quantum coherence and fidelity of the output. Interestingly, we demonstrate that when banding is employed in the presence of noise, approximate circuits of constant depth outperform circuits with a higher number of controlled rotations, establishing a complementary relationship between the approximate quantum adder and the strength of the noise. We exhibit that utilizing magnetic fields to prepare an initial state that evolves according to a one-dimensional spin chain for a specific amount of time can be a potential technique to implement quantum addition circuits in many-body systems.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
ForensiBlock: A Provenance-Driven Blockchain Framework for Data Forensics and Auditability
Authors:
Asma Jodeiri Akbarfam,
Mahdieh Heidaripour,
Hoda Maleki,
Gokila Dorai,
Gagan Agrawal
Abstract:
Maintaining accurate provenance records is paramount in digital forensics, as they underpin evidence credibility and integrity, addressing essential aspects like accountability and reproducibility. Blockchains have several properties that can address these requirements. Previous systems utilized public blockchains, i.e., treated blockchain as a black box, and benefiting from the immutability prope…
▽ More
Maintaining accurate provenance records is paramount in digital forensics, as they underpin evidence credibility and integrity, addressing essential aspects like accountability and reproducibility. Blockchains have several properties that can address these requirements. Previous systems utilized public blockchains, i.e., treated blockchain as a black box, and benefiting from the immutability property. However, the blockchain was accessible to everyone, giving rise to security concerns and moreover, efficient extraction of provenance faces challenges due to the enormous scale and complexity of digital data. This necessitates a tailored blockchain design for digital forensics. Our solution, Forensiblock has a novel design that automates investigation steps, ensures secure data access, traces data origins, preserves records, and expedites provenance extraction. Forensiblock incorporates Role-Based Access Control with Staged Authorization (RBAC-SA) and a distributed Merkle root for case tracking. These features support authorized resource access with an efficient retrieval of provenance records. Particularly, comparing two methods for extracting provenance records off chain storage retrieval with Merkle root verification and a brute-force search the offchain method is significantly better, especially as the blockchain size and number of cases increase. We also found that our distributed Merkle root creation slightly increases smart contract processing time but significantly improves history access. Overall, we show that Forensiblock offers secure, efficient, and reliable handling of digital forensic data
△ Less
Submitted 7 August, 2023;
originally announced August 2023.
-
Spatial beam dynamics in graded-index multimode fibers under Raman amplification:a variational approach
Authors:
Ashis Paul,
Anuj P. Lara,
Samudra Roy,
Govind P. Agrawal
Abstract:
We investigate the spatial beam dynamics inside a multimode graded-index fiber under Raman amplification by adopting a semi-analytical variational approach. The variational analysis provides us with four coupled ordinary differential equations that govern the beam's dynamics under Raman gain and are much faster to solve numerically compared to the full nonlinear wave equation. Their solution also…
▽ More
We investigate the spatial beam dynamics inside a multimode graded-index fiber under Raman amplification by adopting a semi-analytical variational approach. The variational analysis provides us with four coupled ordinary differential equations that govern the beam's dynamics under Raman gain and are much faster to solve numerically compared to the full nonlinear wave equation. Their solution also provides considerable physical insight and allows us to study the impact of important nonlinear phenomena such as self-focusing and cross-phase modulation. We first show that the variational results corroborate well with full numerical simulations and then use them to investigate the signal's dynamics under different initial conditions such as the initial widths of the pump and signal beams. This allows us to quantify the conditions under which the quality of a signal beam can improve, without collapse of the beam owing to self-focusing. While time-consuming full simulations may be needed when gain saturation and pump depletion must be included, the variational method is useful for gaining valuable physical insight and for studying dependence of the amplified beam's width and amplitude on various physical parameters in a faster fashion.
△ Less
Submitted 28 June, 2023; v1 submitted 24 June, 2023;
originally announced June 2023.
-
A semi-Markovian approach to model the tick-by-tick dynamics of stock price
Authors:
Garima Agrawal,
Anindya Goswami
Abstract:
We model the stock price dynamics through a semi-Markov process obtained using a Poisson random measure. We establish the existence and uniqueness of the classical solution of a non-homogeneous terminal value problem and we show that the expected value of stock price at horizon can be obtained as a classical solution of a linear partial differential equation that is a special case of the terminal…
▽ More
We model the stock price dynamics through a semi-Markov process obtained using a Poisson random measure. We establish the existence and uniqueness of the classical solution of a non-homogeneous terminal value problem and we show that the expected value of stock price at horizon can be obtained as a classical solution of a linear partial differential equation that is a special case of the terminal value problem studied in this paper. We further analyze the market making problem using the point of view of an agent who posts the limit orders at the best price available. We use the dynamic programming principle to obtain a HJB equation. In no-risk aversion case, we obtain the value function as a classical solution of a linear pde and derive the expressions for optimal controls by solving the HJB equation.
△ Less
Submitted 10 September, 2022;
originally announced September 2022.
-
Scalable Deep Graph Clustering with Random-walk based Self-supervised Learning
Authors:
Xiang Li,
Dong Li,
Ruoming **,
Gagan Agrawal,
Rajiv Ramnath
Abstract:
Web-based interactions can be frequently represented by an attributed graph, and node clustering in such graphs has received much attention lately. Multiple efforts have successfully applied Graph Convolutional Networks (GCN), though with some limits on accuracy as GCNs have been shown to suffer from over-smoothing issues. Though other methods (particularly those based on Laplacian Smoothing) have…
▽ More
Web-based interactions can be frequently represented by an attributed graph, and node clustering in such graphs has received much attention lately. Multiple efforts have successfully applied Graph Convolutional Networks (GCN), though with some limits on accuracy as GCNs have been shown to suffer from over-smoothing issues. Though other methods (particularly those based on Laplacian Smoothing) have reported better accuracy, a fundamental limitation of all the work is a lack of scalability. This paper addresses this open problem by relating the Laplacian smoothing to the Generalized PageRank and applying a random-walk based algorithm as a scalable graph filter. This forms the basis for our scalable deep clustering algorithm, RwSL, where through a self-supervised mini-batch training mechanism, we simultaneously optimize a deep neural network for sample-cluster assignment distribution and an autoencoder for a clustering-oriented embedding. Using 6 real-world datasets and 6 clustering metrics, we show that RwSL achieved improved results over several recent baselines. Most notably, we show that RwSL, unlike all other deep clustering frameworks, can continue to scale beyond graphs with more than one million nodes, i.e., handle web-scale. We also demonstrate how RwSL could perform node clustering on a graph with 1.8 billion edges using only a single GPU.
△ Less
Submitted 17 January, 2023; v1 submitted 31 December, 2021;
originally announced December 2021.
-
(M)SLAe-Net: Multi-Scale Multi-Level Attention embedded Network for Retinal Vessel Segmentation
Authors:
Shreshth Saini,
Geetika Agrawal
Abstract:
Segmentation plays a crucial role in diagnosis. Studying the retinal vasculatures from fundus images help identify early signs of many crucial illnesses such as diabetic retinopathy. Due to the varying shape, size, and patterns of retinal vessels, along with artefacts and noises in fundus images, no one-stage method can accurately segment retinal vessels. In this work, we propose a multi-scale, mu…
▽ More
Segmentation plays a crucial role in diagnosis. Studying the retinal vasculatures from fundus images help identify early signs of many crucial illnesses such as diabetic retinopathy. Due to the varying shape, size, and patterns of retinal vessels, along with artefacts and noises in fundus images, no one-stage method can accurately segment retinal vessels. In this work, we propose a multi-scale, multi-level attention embedded CNN architecture ((M)SLAe-Net) to address the issue of multi-stage processing for robust and precise segmentation of retinal vessels. We do this by extracting features at multiple scales and multiple levels of the network, enabling our model to holistically extracts the local and global features. Multi-scale features are extracted using our novel dynamic dilated pyramid pooling (D-DPP) module. We also aggregate the features from all the network levels. These effectively resolved the issues of varying shapes and artefacts and hence the need for multiple stages. To assist in better pixel-level classification, we use the Squeeze and Attention(SA) module, a smartly adapted version of the Squeeze and Excitation(SE) module for segmentation tasks in our network to facilitate pixel-group attention. Our unique network design and novel D-DPP module with efficient task-specific loss function for thin vessels enabled our model for better cross data performance. Exhaustive experimental results on DRIVE, STARE, HRF, and CHASE-DB1 show the superiority of our method.
△ Less
Submitted 5 September, 2021;
originally announced September 2021.
-
DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion
Authors:
Wei Niu,
Jiexiong Guan,
Yanzhi Wang,
Gagan Agrawal,
Bin Ren
Abstract:
Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frame…
▽ More
Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to high memory and computational requirements for inference. Operator fusion (or kernel/layer fusion) is key optimization in many state-of-the-art DNN execution frameworks, such as TensorFlow, TVM, and MNN. However, these frameworks usually adopt fusion approaches based on certain patterns that are too restrictive to cover the diversity of operators and layer connections. Polyhedral-based loop fusion techniques, on the other hand, work on a low-level view of the computation without operator-level information, and can also miss potential fusion opportunities. To address this challenge, this paper proposes a novel and extensive loop fusion framework called DNNFusion. The basic idea of this work is to work at an operator view of DNNs, but expand fusion opportunities by develo** a classification of both individual operators and their combinations. In addition, DNNFusion includes 1) a novel mathematical-property-based graph rewriting framework to reduce evaluation costs and facilitate subsequent operator fusion, 2) an integrated fusion plan generation that leverages the high-level analysis and accurate light-weight profiling, and 3) additional optimizations during fusion code generation. DNNFusion is extensively evaluated on 15 DNN models with varied types of tasks, model sizes, and layer counts. The evaluation results demonstrate that DNNFusion finds up to 8.8x higher fusion opportunities, outperforms four state-of-the-art DNN execution frameworks with 9.3x speedup. The memory requirement reduction and speedups can enable the execution of many of the target models on mobile devices and even make them part of a real-time application.
△ Less
Submitted 30 November, 2021; v1 submitted 30 August, 2021;
originally announced August 2021.
-
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Authors:
Sheng-Chun Kao,
Suvinay Subramanian,
Gaurav Agrawal,
Amir Yazdanbakhsh,
Tushar Krishna
Abstract:
Attention mechanisms, primarily designed to capture pairwise correlations between words, have become the backbone of machine learning, expanding beyond natural language processing into other domains. This growth in adaptation comes at the cost of prohibitively large memory requirements and computational complexity, especially at higher number of input elements. This limitation is due to inherently…
▽ More
Attention mechanisms, primarily designed to capture pairwise correlations between words, have become the backbone of machine learning, expanding beyond natural language processing into other domains. This growth in adaptation comes at the cost of prohibitively large memory requirements and computational complexity, especially at higher number of input elements. This limitation is due to inherently limited data reuse opportunities and quadratic growth in memory footprints, leading to severe memory-boundedness and limited scalability of input elements. This work addresses these challenges by devising a tailored dataflow optimization, called FLAT, for attention mechanisms without altering their functionality. This dataflow processes costly attention operations through a unique fusion mechanism, transforming the memory footprint quadratic growth to merely a linear one. To realize the full potential of this bespoke mechanism, we propose a tiling approach to enhance the data reuse across attention operations. Our method both mitigates the off-chip bandwidth bottleneck as well as reduces the on-chip memory requirement. FLAT delivers 1.94x (1.76x) speedup and 49% and (42%) of energy savings compared to the state-of-the-art Edge (Cloud) accelerators with no customized dataflow optimization. When on-chip resources are scarce (20 KB-200 KB), FLAT yields, on average, 1.5x end-to-end latency reduction across a diverse range of conventional attention-based models with input sequence lengths ranging from 512-token to 64K-token. Our evaluations demonstrate that state-of-the-art DNN dataflow applied to attention operations reach the efficiency limit for inputs above 512 elements. In contrast, FLAT unblocks transformer models for inputs with up to 64K elements
△ Less
Submitted 23 September, 2022; v1 submitted 13 July, 2021;
originally announced July 2021.
-
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning
Authors:
Peng Jiang,
Gagan Agrawal
Abstract:
Stochastic Gradient Descent (SGD) is the key learning algorithm for many machine learning tasks. Because of its computational costs, there is a growing interest in accelerating SGD on HPC resources like GPU clusters. However, the performance of parallel SGD is still bottlenecked by the high communication costs even with a fast connection among the machines. A simple approach to alleviating this pr…
▽ More
Stochastic Gradient Descent (SGD) is the key learning algorithm for many machine learning tasks. Because of its computational costs, there is a growing interest in accelerating SGD on HPC resources like GPU clusters. However, the performance of parallel SGD is still bottlenecked by the high communication costs even with a fast connection among the machines. A simple approach to alleviating this problem, used in many existing efforts, is to perform communication every few iterations, using a constant averaging period. In this paper, we show that the optimal averaging period in terms of convergence and communication cost is not a constant, but instead varies over the course of the execution. Specifically, we observe that reducing the variance of model parameters among the computing nodes is critical to the convergence of periodic parameter averaging SGD. Given a fixed communication budget, we show that it is more beneficial to synchronize more frequently in early iterations to reduce the initial large variance and synchronize less frequently in the later phase of the training process. We propose a practical algorithm, named ADaptive Periodic parameter averaging SGD (ADPSGD), to achieve a smaller overall variance of model parameters, and thus better convergence compared with the Constant Periodic parameter averaging SGD (CPSGD). We evaluate our method with several image classification benchmarks and show that our ADPSGD indeed achieves smaller training losses and higher test accuracies with smaller communication compared with CPSGD. Compared with gradient-quantization SGD, we show that our algorithm achieves faster convergence with only half of the communication. Compared with full-communication SGD, our ADPSGD achieves 1:14x to 1:27x speedups with a 100Gbps connection among computing nodes, and the speedups increase to 1:46x ~ 1:95x with a 10Gbps connection.
△ Less
Submitted 19 January, 2021; v1 submitted 12 July, 2020;
originally announced July 2020.
-
Towards Successful Social Media Advertising: Predicting the Influence of Commercial Tweets
Authors:
Renhao Cui,
Gagan Agrawal,
Rajiv Ramnath
Abstract:
Businesses communicate using Twitter for a variety of reasons -- to raise awareness of their brands, to market new products, to respond to community comments, and to connect with their customers and potential customers in a targeted manner. For businesses to do this effectively, they need to understand which content and structural elements about a tweet make it influential, that is, widely liked,…
▽ More
Businesses communicate using Twitter for a variety of reasons -- to raise awareness of their brands, to market new products, to respond to community comments, and to connect with their customers and potential customers in a targeted manner. For businesses to do this effectively, they need to understand which content and structural elements about a tweet make it influential, that is, widely liked, followed, and retweeted. This paper presents a systematic methodology for analyzing commercial tweets, and predicting the influence on their readers. Our model, which use a combination of decoration and meta features, outperforms the prediction ability of the baseline model as well as the tweet embedding model. Further, in order to demonstrate a practical use of this work, we show how an unsuccessful tweet may be engineered (for example, reworded) to increase its potential for success.
△ Less
Submitted 28 October, 2019;
originally announced October 2019.
-
Tweets Can Tell: Activity Recognition using Hybrid Long Short-Term Memory Model
Authors:
Renhao Cui,
Gagan Agrawal,
Rajiv Ramnath
Abstract:
This paper presents techniques to detect the "offline" activity a person is engaged in when she is tweeting (such as dining, shop** or entertainment), in order to create a dynamic profile of the user, for uses such as better targeting of advertisements. To this end, we propose a hybrid LSTM model for rich contextual learning, along with studies on the effects of applying and combining multiple L…
▽ More
This paper presents techniques to detect the "offline" activity a person is engaged in when she is tweeting (such as dining, shop** or entertainment), in order to create a dynamic profile of the user, for uses such as better targeting of advertisements. To this end, we propose a hybrid LSTM model for rich contextual learning, along with studies on the effects of applying and combining multiple LSTM based methods with different contextual features. The hybrid model is shown to outperform a set of baselines and state-of-the-art methods. Finally, this paper presents an orthogonal validation with a real-case application. Our model generates an offline activity analysis for the followers of several well-known accounts, which is quite representative of the expected characteristics of these accounts.
△ Less
Submitted 9 July, 2019;
originally announced August 2019.
-
Pruned Landmark Labeling Meets Vertex Centric Computation: A Surprisingly Happy Marriage!
Authors:
Ruoming **,
Zhen Peng,
Wendell Wu,
Feodor Dragan,
Gagan Agrawal,
Bin Ren
Abstract:
In this paper, we study how the Pruned Landmark Labeling (PPL) algorithm can be parallelized in a scalable fashion, producing the same results as the sequential algorithm. More specifically, we parallelize using a Vertex-Centric (VC) computational model on a modern SIMD powered multicore architecture. We design a new VC-PLL algorithm that resolves the apparent mismatch between the inherent sequent…
▽ More
In this paper, we study how the Pruned Landmark Labeling (PPL) algorithm can be parallelized in a scalable fashion, producing the same results as the sequential algorithm. More specifically, we parallelize using a Vertex-Centric (VC) computational model on a modern SIMD powered multicore architecture. We design a new VC-PLL algorithm that resolves the apparent mismatch between the inherent sequential dependence of the PLL algorithm and the Vertex- Centric (VC) computing model. Furthermore, we introduce a novel batch execution model for VC computation and the BVC-PLL algorithm to reduce the computational inefficiency in VC-PLL. Quite surprisingly, the theoretical analysis reveals that under a reasonable assumption, BVC-PLL has lower computational and memory access costs than PLL and indicates it may run faster than PLL as a sequential algorithm. We also demonstrate how BVC-PLL algorithm can be extended to handle directed graphs and weighted graphs and how it can utilize the hierarchical parallelism on a modern parallel computing architecture. Extensive experiments on real-world graphs not only show the sequential BVC-PLL can run more than two times faster than the original PLL, but also demonstrates its parallel efficiency and scalability.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
A network theoretic study of potential movement and spread of Lantana camara in Rajaji Tiger Reserve, India
Authors:
Shashankaditya Upadhyay,
Tamali Mondal,
Prasad A. Pathak,
Arijit Roy,
Girish Agrawal,
Sudeepto Bhattacharya
Abstract:
Ecosystems are often under threat by invasive species which, through their invasion dynamics, create ecological networks to spread. We present preliminary results using a technique of GIS coupled with complex network analysis to model the movement and spread of Lantana Camara in Rajaji Tiger Reserve, India, where prey species are being affected because of habitat degradation due to Lantana invasio…
▽ More
Ecosystems are often under threat by invasive species which, through their invasion dynamics, create ecological networks to spread. We present preliminary results using a technique of GIS coupled with complex network analysis to model the movement and spread of Lantana Camara in Rajaji Tiger Reserve, India, where prey species are being affected because of habitat degradation due to Lantana invasion. Understanding spatio-temporal aspects of the spread mechanism are essential for better management in the region. The objective of the present study is to develop insight into some key characteristics of the regulatory mechanism for lantana spread inside RTR. Lantana map** was carried out by field observations along multiple transects and plots and the data generated was used as input for MaxEnt modelling to identify land patches in the study area that are favourable for lantana growth. The patch information so obtained is integrated with a raster map generated by identifying different topographical features in the study area which are favourable for lantana growth. The integrated data is analysed with a complex network perspective, where relatively dense potential lantana distribution patches are considered as vertices, connected by relatively sparse lantana continuities, identified as edges. The network centrality analysis reveal key patches in the study area that play specialized roles in the spread of lantana in a large region. Hubs in the lantana network are primarily identified as dry seasonal river beds and their management is proposed as a vital strategy to contain lantana invasion. The lantana network is found to exhibit small-world architecture with a well formed community structure. We infer that the above properties of the lantana network have major contribution in regulating the rapid infestation and even spread of the plant through the entire region of study.
△ Less
Submitted 8 August, 2018;
originally announced August 2018.
-
Femtosecond Pulse Trains through Dual-Pum** of Optical Fibers: Role of Third-Order Dispersion
Authors:
Aku Antikainen,
Govind P. Agrawal
Abstract:
Generation of high-repetition-rate, femtosecond, soliton pulse trains through dual-wavelength pum** of a dispersion-decreasing fiber is studied numerically. The achievable shortest pulse width is found to be limited by third-order dispersion that has a significant effect on the pulse-compression dynamics. The output wavelength is red shifted because of intrapulse Raman scattering and depends hea…
▽ More
Generation of high-repetition-rate, femtosecond, soliton pulse trains through dual-wavelength pum** of a dispersion-decreasing fiber is studied numerically. The achievable shortest pulse width is found to be limited by third-order dispersion that has a significant effect on the pulse-compression dynamics. The output wavelength is red shifted because of intrapulse Raman scattering and depends heavily on third-order dispersion, whose positive values lead to the most red shifted solitons (>25% of the input pump center wavelength). The proposed scheme allows the generation of ultrashort pulse trains at tunable high repetition rates with a wide range of output wavelengths and pulse durations through dispersion engineering. The resulting frequency combs extend over a wide bandwidth with a tunable spacing between the comb lines.
△ Less
Submitted 10 April, 2018;
originally announced April 2018.
-
Lossless Suppression and Enhancement of Soliton Self-Frequency Shifts
Authors:
Francisco R. Arteaga-Sierra,
Aku Antikainen,
Govind P. Agrawal
Abstract:
Soliton self-frequency shifts (SSFS) can be suppressed in optical fibers through spectral recoil, but this process leads to losses through continuous transfer of energy to a dispersive wave. We demonstrate a novel way to alter the strength of SSFS in photonic crystal fibers through a frequency-dependent nonlinear parameter $γ(ω)$. Our numerical simulations show both suppression and enhancement of…
▽ More
Soliton self-frequency shifts (SSFS) can be suppressed in optical fibers through spectral recoil, but this process leads to losses through continuous transfer of energy to a dispersive wave. We demonstrate a novel way to alter the strength of SSFS in photonic crystal fibers through a frequency-dependent nonlinear parameter $γ(ω)$. Our numerical simulations show both suppression and enhancement of SSFS depending on the sign of nonlinear slope. A large enough positive value of this slope can lead to total suppression of SSFS, without spectral recoil and without energy transfer to a resonant dispersive wave. Numerical simulations are supported by mathematical predictions based on the moment method.
△ Less
Submitted 19 July, 2017;
originally announced July 2017.
-
Perturbed Dissipative Solitons: A Variational Approach
Authors:
Ambaresh Sahoo,
Samudra Roy,
Govind P. Agrawal
Abstract:
We adopt a variational technique to study the dynamics of perturbed dissipative solitons, whose evolution is governed by a Ginzburg--Landau equation (GLE). As a specific example of such solitons, we consider a silicon-based active waveguide in which free carriers are generated through two-photon absorption. In this case, dissipative solitons are perturbed by physical processes such as third-order…
▽ More
We adopt a variational technique to study the dynamics of perturbed dissipative solitons, whose evolution is governed by a Ginzburg--Landau equation (GLE). As a specific example of such solitons, we consider a silicon-based active waveguide in which free carriers are generated through two-photon absorption. In this case, dissipative solitons are perturbed by physical processes such as third-order dispersion, intrapulse Raman scattering, self-steepening, and free-carrier generation. To solve the variational problem, we adopt the Pereira--Stenflo soliton as an ansatz since this soliton is the exact solution of the unperturbed GLE. With this ansatz, we derive a set of six coupled differential equations exhibiting the dynamics of various pulse parameters. This set of equations provides considerable physical insight in the complex behavior of perturbed dissipative solitons. Its predictions are found to be in good agreement with direct numerical simulations of the GLE. More specifically, the spectral and temporal shifts of the chirped soliton induced by free carriers and intrapulse Raman scattering are predicted quite accurately. We also provide simple analytic expressions of these shifts by making suitable approximations. Our semi-analytic treatment is useful for gaining physical insight into complex soliton-evolution processes.
△ Less
Submitted 6 June, 2017;
originally announced June 2017.
-
In-Datacenter Performance Analysis of a Tensor Processing Unit
Authors:
Norman P. Jouppi,
Cliff Young,
Nishant Patil,
David Patterson,
Gaurav Agrawal,
Raminder Bajwa,
Sarah Bates,
Suresh Bhatia,
Nan Boden,
Al Borchers,
Rick Boyle,
Pierre-luc Cantin,
Clifford Chao,
Chris Clark,
Jeremy Coriell,
Mike Daley,
Matt Dau,
Jeffrey Dean,
Ben Gelb,
Tara Vazir Ghaemmaghami,
Rajendra Gottipati,
William Gulland,
Robert Hagmann,
C. Richard Ho,
Doug Hogberg
, et al. (50 additional authors not shown)
Abstract:
Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOp…
▽ More
Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X - 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.
△ Less
Submitted 16 April, 2017;
originally announced April 2017.
-
Fault Tolerant Frequent Pattern Mining
Authors:
Sameh Shohdy,
Abhinav Vishnu,
Gagan Agrawal
Abstract:
FP-Growth algorithm is a Frequent Pattern Min- ing (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel paralle…
▽ More
FP-Growth algorithm is a Frequent Pattern Min- ing (FPM) algorithm that has been extensively used to study correlations and patterns in large scale datasets. While several researchers have designed distributed memory FP-Growth algorithms, it is pivotal to consider fault tolerant FP-Growth, which can address the increasing fault rates in large scale systems. In this work, we propose a novel parallel, algorithm-level fault-tolerant FP-Growth algorithm. We leverage algorithmic properties and MPI advanced features to guarantee an O(1) space complexity, achieved by using the dataset memory space itself for checkpointing. We also propose a recovery algorithm that can use in-memory and disk-based checkpointing, though in many cases the recovery can be completed without any disk access, and incurring no memory overhead for checkpointing. We evaluate our FT algorithm on a large scale InfiniBand cluster with several large datasets using up to 2K cores. Our evaluation demonstrates excellent efficiency for checkpointing and recovery in comparison to the disk-based approach. We have also observed 20x average speed-up in comparison to Spark, establishing that a well designed algorithm can easily outperform a solution based on a general fault-tolerant programming model.
△ Less
Submitted 17 October, 2016;
originally announced October 2016.
-
Dynamics of Soliton Cascades in Fiber Amplifiers
Authors:
F. R. Arteaga-Sierra,
A. Antikainen,
Govind P. Agrawal
Abstract:
We study numerically the formation of cascading solitons when femtosecond optical pulses are launched into a fiber amplifier with less energy than required to form a soliton of equal duration. As the pulse is amplified, cascaded fundamental solitons are created at different distances, without soliton fission, as each fundamental soliton moves outside the gain bandwidth through the Raman-induced sp…
▽ More
We study numerically the formation of cascading solitons when femtosecond optical pulses are launched into a fiber amplifier with less energy than required to form a soliton of equal duration. As the pulse is amplified, cascaded fundamental solitons are created at different distances, without soliton fission, as each fundamental soliton moves outside the gain bandwidth through the Raman-induced spectral shifts. As a result, each input pulse creates multiple, temporally separated, ultrashort pulses of different wavelengths at the amplifier output. The number of pulses depends not only on the total gain of the amplifier but also on the width of input pulses.
△ Less
Submitted 30 June, 2016;
originally announced June 2016.
-
Implications of a zero-nonlinearity wavelength in optical fibers doped with silver nanoparticles
Authors:
S. Bose,
S. Roy,
R. Chattopadhyay,
S. K. Bhadra,
G. P. Agrawal
Abstract:
Photonic crystal fibers doped with silver nanoparticles exhibit the Kerr nonlinearity that can be positive or negative depending on wavelength and vanishes at a specific wavelength. We study numerically how the simultaneous presence of a zero-nonlinearity wavelength (ZNW) and a zero-dispersion wavelength affects evolution of soliton and supercontinuum generation inside such fibers and find a numbe…
▽ More
Photonic crystal fibers doped with silver nanoparticles exhibit the Kerr nonlinearity that can be positive or negative depending on wavelength and vanishes at a specific wavelength. We study numerically how the simultaneous presence of a zero-nonlinearity wavelength (ZNW) and a zero-dispersion wavelength affects evolution of soliton and supercontinuum generation inside such fibers and find a number of unique features. The existence of negative nonlinearity allows soliton formation even in the normaldispersion region of the fiber, and the ZNW acts as a barrier for the Raman-induced red shift of solitons.
△ Less
Submitted 2 November, 2016; v1 submitted 23 June, 2016;
originally announced June 2016.
-
Wheeled Robots playing Chain Catch: Strategies and Evaluation
Authors:
Garima Agrawal,
Kamalakar Karlapalem
Abstract:
Robots playing games that humans are adept in is a challenge. We studied robotic agents playing Chain Catch game as a Multi-Agent System (MAS). Our game starts with a traditional Catch game similar to Pursuit evasion, and further extends it to form a growing chain of predator agents to chase remaining preys. Hence Chain Catch is a combination of two challenges - pursuit domain and robotic chain fo…
▽ More
Robots playing games that humans are adept in is a challenge. We studied robotic agents playing Chain Catch game as a Multi-Agent System (MAS). Our game starts with a traditional Catch game similar to Pursuit evasion, and further extends it to form a growing chain of predator agents to chase remaining preys. Hence Chain Catch is a combination of two challenges - pursuit domain and robotic chain formation. These are games that require team of robotic agents to cooperate among themselves and to compete with other group of agents through quick decision making. In this paper, we present a Chain Catch simulator that allows us to incorporate game rules, design strategies and simulate the game play. We developed cost model driven strategies for each of Escapee, Catcher and Chain. Our results show that Sliding slope strategy is the best strategy for Escapees whereas Tagging method is the best method for chain s movement in Chain Catch. We also use production quality robots to implement the game play in a physical environment and analyze game strategies on real robots. Our real robots implementation in different scenarios shows that game strategies work as expected and a complete chain formation takes place successfully in each game.
△ Less
Submitted 20 February, 2016;
originally announced February 2016.
-
Resonance vector mode locking
Authors:
Stanislav A. Kolpakov,
Sergey V. Sergeyev,
Yuri Loika,
Nikita Tarasov,
Vladimir Kalashnikov,
Govind P. Agrawal
Abstract:
A mode locked fibre laser as a source of ultra-stable pulse train has revolutionised a wide range of fundamental and applied research areas by offering high peak powers, high repetition rates, femtosecond range pulse widths and a narrow linewidth. However, further progress in linewidth narrowing seems to be limited by the complexity of the carrier-envelope phase control. Here for the first time we…
▽ More
A mode locked fibre laser as a source of ultra-stable pulse train has revolutionised a wide range of fundamental and applied research areas by offering high peak powers, high repetition rates, femtosecond range pulse widths and a narrow linewidth. However, further progress in linewidth narrowing seems to be limited by the complexity of the carrier-envelope phase control. Here for the first time we demonstrate experimentally and theoretically a new mechanism of resonance vector self-mode locking where tuning in-cavity birefringence leads to excitation of the longitudinal modes sidebands accompanied by the resonance phase locking of sidebands with the adjacent longitudinal modes. An additional resonance with acoustic phonons provides the repetition rate tunability and linewidth narrowing down to Hz range that drastically reduces the complexity of the carrier-envelope phase control and so will open the way to advance lasers in the context of applications in metrology, spectroscopy, microwave photonics, astronomy, and telecommunications.
△ Less
Submitted 23 August, 2015;
originally announced August 2015.
-
Specialty Fibers for Terahertz Generation and Transmission: A Review
Authors:
Ajanta Barh,
B. P. Pal,
G. P. Agrawal,
R. K. Varshney,
B. M. A. Rahman
Abstract:
Terahertz (THz) frequency range, lying between the optical and microwave range covers a significant portion of the electro-magnetic spectrum. Though its initial usage started in the 1960s, active research in the THz field started only in the 1990s by researchers from both optics and microwaves disciplines. The use of optical fibers for THz application has attracted considerable attention in recent…
▽ More
Terahertz (THz) frequency range, lying between the optical and microwave range covers a significant portion of the electro-magnetic spectrum. Though its initial usage started in the 1960s, active research in the THz field started only in the 1990s by researchers from both optics and microwaves disciplines. The use of optical fibers for THz application has attracted considerable attention in recent years. In this article, we review the progress and current status of optical fiber-based techniques for THz generation and transmission. The first part of this review focuses on THz sources. After a review on various types of THz sources, we discuss how specialty optical fibers can be used for THz generation. The second part of this review focuses on the guided wave propagation of THz waves for their transmission. After discussing various wave guiding schemes, we consider new fiber designs for THz transmission.
△ Less
Submitted 10 June, 2015;
originally announced June 2015.
-
Stimulated Raman scattering cascade spanning the wavelength range of 523 to 1750~nm using a graded-index multimode optical fiber
Authors:
Hamed Pourbeyram,
Govind P. Agrawal,
Arash Mafi
Abstract:
We report on the generation of a Raman cascade spanning the wavelength range of 523 to 1750 nm wavelength range, in a standard telecommunication graded-index multimode optical fiber. Despite the highly multimode nature of the pump, the Raman peaks are generated in specific modes of the fiber, confirming substantial beam cleanup during the stimulated Raman scattering process.
We report on the generation of a Raman cascade spanning the wavelength range of 523 to 1750 nm wavelength range, in a standard telecommunication graded-index multimode optical fiber. Despite the highly multimode nature of the pump, the Raman peaks are generated in specific modes of the fiber, confirming substantial beam cleanup during the stimulated Raman scattering process.
△ Less
Submitted 16 July, 2013; v1 submitted 25 January, 2013;
originally announced January 2013.
-
Nonlinear Propagation in Multimode and Multicore Fibers: Generalization of the Manakov Equations
Authors:
Sami Mumtaz,
René-Jean Essiambre,
Govind P. Agrawal
Abstract:
This paper starts by an investigation of nonlinear transmission in space-division multiplexed (SDM) systems using multimode fibers exhibiting a rapidly varying birefringence. A primary objective is to generalize the Manakov equations, well known in the case of single-mode fibers. We first investigate a reference case where linear coupling among the spatial modes of the fiber is weak and after aver…
▽ More
This paper starts by an investigation of nonlinear transmission in space-division multiplexed (SDM) systems using multimode fibers exhibiting a rapidly varying birefringence. A primary objective is to generalize the Manakov equations, well known in the case of single-mode fibers. We first investigate a reference case where linear coupling among the spatial modes of the fiber is weak and after averaging over birefringence fluctuations, we obtain new Manakov equations for multimode fibers. Such an averaging reduces the number of intermodal nonlinear terms drastically since all four-wave-mixing terms average out. Cross-phase modulation terms still affect multimode transmission but their effectiveness is reduced. We then verify the accuracy of our new Manakov equations by transmitting multiple PDM-QPSK signals over different modes of a multimode fiber and comparing the numerical results with those obtained by solving the full stochastic equation. The agreement is excellent in all cases studied. A great benefit of the new equations is to reduce the computation time by a factor of 10 or more. Another important feature observed is that birefringence fluctuations improve system performance by reducing the impact of fiber nonlinearities. Finally multimode fibers with strong random coupling among all spatial modes are considered. Linear coupling is modeled using the random matrix theory approach. We derive new Manakov equations for multimode fibers in that regime and show that such fibers can perform better than single-modes fiber for large number of propagating spatial modes.
△ Less
Submitted 27 July, 2012;
originally announced July 2012.
-
Design of an efficient Mid-IR light source using As2S3 based highly nonlinear microstructured optical fibers
Authors:
A. Barh,
S. Ghosh,
G. P. Agrawal,
R. K. Varshney,
I. D. Aggarwal,
B. P. Pal
Abstract:
We report on the design of a highly-nonlinear specialty fiber as a mid-infrared light source at 4.3 μm. A meter length of the designed solid-core chalcogenide based index-guided microstructured optical fiber (MOF) with circular air holes has been exploited to translate wavelength via four wave mixing using a thulium-doped fiber laser as the pump with a relatively low peak power of 5 W. A peak gain…
▽ More
We report on the design of a highly-nonlinear specialty fiber as a mid-infrared light source at 4.3 μm. A meter length of the designed solid-core chalcogenide based index-guided microstructured optical fiber (MOF) with circular air holes has been exploited to translate wavelength via four wave mixing using a thulium-doped fiber laser as the pump with a relatively low peak power of 5 W. A peak gain value of around 37 dB with full width at half maxima (FWHM) less than 3 nm is achieved.
△ Less
Submitted 15 May, 2012;
originally announced May 2012.
-
Transverse localization of light and its dependence on the phase-front curvature of the input beam in a disordered optical waveguide lattice
Authors:
S Ghosh,
B P Pal,
R K Varshney,
G P Agrawal
Abstract:
We investigate the influence of the phase-front curvature of an input light beam on the transverse localization of light by choosing an evanescently coupled disordered one-dimensional semi-infinite waveguide lattice as an example. Our numerical study reveals that a finite phase front curvature of the input beam indeed plays an important role and it could degrade the quality of light localization i…
▽ More
We investigate the influence of the phase-front curvature of an input light beam on the transverse localization of light by choosing an evanescently coupled disordered one-dimensional semi-infinite waveguide lattice as an example. Our numerical study reveals that a finite phase front curvature of the input beam indeed plays an important role and it could degrade the quality of light localization in a disordered dielectric structure. More specifically, a faster transition from ballistic mode of beam propagation due to diffraction to a characteristic localized state is observed in case of a continuous wave (CW) beam, whose phase-front is plane as compared to one having a curved phase front.
△ Less
Submitted 27 April, 2012;
originally announced April 2012.