Skip to main content

Showing 1–22 of 22 results for author: Tripathy, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2311.02909  [pdf, other

    cs.LG cs.DC cs.PF

    Distributed Matrix-Based Sampling for Graph Neural Network Training

    Authors: Alok Tripathy, Katherine Yelick, Aydin Buluc

    Abstract: Graph Neural Networks (GNNs) offer a compact and computationally efficient way to learn embeddings and classifications on graph data. GNN models are frequently large, making distributed minibatch training necessary. The primary contribution of this paper is new methods for reducing communication in the sampling step for distributed GNN training. Here, we propose a matrix-based bulk sampling appr… ▽ More

    Submitted 19 April, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Proceedings of Machine Learning and Systems

  2. arXiv:2310.10294  [pdf, other

    cs.CL cs.AI

    Key-phrase boosted unsupervised summary generation for FinTech organization

    Authors: Aadit Deshpande, Shreya Goyal, Prateek Nagwanshi, Avinash Tripathy

    Abstract: With the recent advances in social media, the use of NLP techniques in social media data analysis has become an emerging research direction. Business organizations can particularly benefit from such an analysis of social media discourse, providing an external perspective on consumer behavior. Some of the NLP applications such as intent detection, sentiment classification, text summarization can he… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 8 pages, 4 figures

  3. arXiv:2309.01662   

    cs.DC cs.PF

    Towards Persistent Memory based Stateful Serverless Computing for Big Data Applications

    Authors: Yuze Li, Kevin Assogba, Abhijit Tripathy, Moiz Arif, M. Mustafa Rafique, Ali R. Butt, Dimitrios Nikolopoulos

    Abstract: The Function-as-a-service (FaaS) computing model has recently seen significant growth especially for highly scalable, event-driven applications. The easy-to-deploy and cost-efficient fine-grained billing of FaaS is highly attractive to big data applications. However, the stateless nature of serverless platforms poses major challenges when supporting stateful I/O intensive workloads such as a lack… ▽ More

    Submitted 8 September, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

    Comments: Not yet ready to be publicly available

  4. Using Geographic Location-based Public Health Features in Survival Analysis

    Authors: Navid Seidi, Ardhendu Tripathy, Sajal K. Das

    Abstract: Time elapsed till an event of interest is often modeled using the survival analysis methodology, which estimates a survival score based on the input features. There is a resurgence of interest in develo** more accurate prediction models for time-to-event prediction in personalized healthcare using modern tools such as neural networks. Higher quality features and more frequent observations improv… ▽ More

    Submitted 15 April, 2023; originally announced April 2023.

    Journal ref: 2023 IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), 2023, 80-91

  5. Workflows Community Summit 2022: A Roadmap Revolution

    Authors: Rafael Ferreira da Silva, Rosa M. Badia, Venkat Bala, Debbie Bard, Peer-Timo Bremer, Ian Buckley, Silvina Caino-Lores, Kyle Chard, Carole Goble, Shantenu Jha, Daniel S. Katz, Daniel Laney, Manish Parashar, Frederic Suter, Nick Tyler, Thomas Uram, Ilkay Altintas, Stefan Andersson, William Arndt, Juan Aznar, Jonathan Bader, Bartosz Balis, Chris Blanton, Kelly Rosa Braghetto, Aharon Brodutch , et al. (80 additional authors not shown)

    Abstract: Scientific workflows have become integral tools in broad scientific computing use cases. Science discovery is increasingly dependent on workflows to orchestrate large and complex scientific experiments that range from execution of a cloud-based data preprocessing pipeline to multi-facility instrument-to-edge-to-HPC computational workflows. Given the changing landscape of scientific computing and t… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

    Report number: ORNL/TM-2023/2885

  6. arXiv:2212.02346  [pdf, other

    cs.LG

    Accu-Help: A Machine Learning based Smart Healthcare Framework for Accurate Detection of Obsessive Compulsive Disorder

    Authors: Kabita Patel, Ajaya Kumar Tripathy, Laxmi Narayan Padhy, Sujita Kumar Kar, Susanta Kumar Padhy, Saraju Prasad Mohanty

    Abstract: In recent years the importance of Smart Healthcare cannot be overstated. The current work proposed to expand the state-of-art of smart healthcare in integrating solutions for Obsessive Compulsive Disorder (OCD). Identification of OCD from oxidative stress biomarkers (OSBs) using machine learning is an important development in the study of OCD. However, this process involves the collection of OCD c… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

  7. arXiv:2105.03280  [pdf, other

    cs.CL cs.LG

    Potential Idiomatic Expression (PIE)-English: Corpus for Classes of Idioms

    Authors: Tosin P. Adewumi, Roshanak Vadoodi, Aparajita Tripathy, Konstantina Nikolaidou, Foteini Liwicki, Marcus Liwicki

    Abstract: We present a fairly large, Potential Idiomatic Expression (PIE) dataset for Natural Language Processing (NLP) in English. The challenges with NLP systems with regards to tasks such as Machine Translation (MT), word sense disambiguation (WSD) and information retrieval make it imperative to have a labelled idioms dataset with classes such as it is in this work. To the best of the authors' knowledge,… ▽ More

    Submitted 23 April, 2022; v1 submitted 25 April, 2021; originally announced May 2021.

    Comments: Accepted at the International Conference on Language Resources and Evaluation (LREC) 2022

  8. arXiv:2104.00792  [pdf, other

    cs.DC cs.DS

    Scalable Hash Table for NUMA Systems

    Authors: Alok Tripathy, Oded Green

    Abstract: Hash tables are used in a plethora of applications, including database operations, DNA sequencing, string searching, and many more. As such, there are many parallelized hash tables targeting multicore, distributed, and accelerator-based systems. We present in this work a multi-GPU hash table implementation that can process keys at a throughput comparable to that of distributed hash tables. Distrib… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

  9. arXiv:2103.05057  [pdf, ps, other

    stat.ML cs.LG

    Nearest Neighbor Search Under Uncertainty

    Authors: Blake Mason, Ardhendu Tripathy, Robert Nowak

    Abstract: Nearest Neighbor Search (NNS) is a central task in knowledge representation, learning, and reasoning. There is vast literature on efficient algorithms for constructing data structures and performing exact and approximate NNS. This paper studies NNS under Uncertainty (NNSU). Specifically, consider the setting in which an NNS algorithm has access only to a stochastic distance oracle that provides a… ▽ More

    Submitted 8 March, 2021; originally announced March 2021.

    Comments: 22 pages

  10. arXiv:2012.08073  [pdf, other

    stat.ML cs.LG

    Chernoff Sampling for Active Testing and Extension to Active Regression

    Authors: Subhojyoti Mukherjee, Ardhendu Tripathy, Robert Nowak

    Abstract: Active learning can reduce the number of samples needed to perform a hypothesis test and to estimate the parameters of a model. In this paper, we revisit the work of Chernoff that described an asymptotically optimal algorithm for performing a hypothesis test. We obtain a novel sample complexity bound for Chernoff's algorithm, with a non-asymptotic term that characterizes its performance at a fixed… ▽ More

    Submitted 10 March, 2022; v1 submitted 14 December, 2020; originally announced December 2020.

    Comments: 47 pages, 9 figures

  11. arXiv:2006.08850  [pdf, other

    stat.ML cs.LG

    Finding All ε-Good Arms in Stochastic Bandits

    Authors: Blake Mason, Lalit Jain, Ardhendu Tripathy, Robert Nowak

    Abstract: The pure-exploration problem in stochastic multi-armed bandits aims to find one or more arms with the largest (or near largest) means. Examples include finding an ε-good arm, best-arm identification, top-k arm identification, and finding all arms with means above a specified threshold. However, the problem of finding all ε-good arms has been overlooked in past work, although arguably this may be t… ▽ More

    Submitted 11 September, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: 93 total pages (8 main pages + appendices), 12 figures, submitted to NeurIPS 2020

  12. arXiv:2005.03300  [pdf, other

    cs.LG cs.DC stat.ML

    Reducing Communication in Graph Neural Network Training

    Authors: Alok Tripathy, Katherine Yelick, Aydin Buluc

    Abstract: Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making GNNs harder to scale to high concurrencies than convolutional or fully-connected neural networks.… ▽ More

    Submitted 2 September, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: To appear in International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'20)

  13. arXiv:2002.01044  [pdf, other

    stat.ML cs.IT cs.LG math.ST

    Optimal Confidence Regions for the Multinomial Parameter

    Authors: Matthew L. Malloy, Ardhendu Tripathy, Robert D. Nowak

    Abstract: Construction of tight confidence regions and intervals is central to statistical inference and decision making. This paper develops new theory showing minimum average volume confidence regions for categorical data. More precisely, consider an empirical distribution $\widehat{\boldsymbol{p}}$ generated from $n$ iid realizations of a random variable that takes one of $k$ possible values according to… ▽ More

    Submitted 29 January, 2021; v1 submitted 3 February, 2020; originally announced February 2020.

  14. arXiv:1906.00547  [pdf, other

    stat.ML cs.LG

    MaxGap Bandit: Adaptive Algorithms for Approximate Ranking

    Authors: Sumeet Katariya, Ardhendu Tripathy, Robert Nowak

    Abstract: This paper studies the problem of adaptively sampling from K distributions (arms) in order to identify the largest gap between any two adjacent means. We call this the MaxGap-bandit problem. This problem arises naturally in approximate ranking, noisy sorting, outlier detection, and top-arm identification in bandits. The key novelty of the MaxGap-bandit problem is that it aims to adaptively determi… ▽ More

    Submitted 2 June, 2019; originally announced June 2019.

  15. arXiv:1905.13267  [pdf, other

    stat.ML cs.LG

    Learning Nearest Neighbor Graphs from Noisy Distance Samples

    Authors: Blake Mason, Ardhendu Tripathy, Robert Nowak

    Abstract: We consider the problem of learning the nearest neighbor graph of a dataset of n items. The metric is unknown, but we can query an oracle to obtain a noisy estimate of the distance between any pair of items. This framework applies to problem domains where one wants to learn people's preferences from responses commonly modeled as noisy distance judgments. In this paper, we propose an active algorit… ▽ More

    Submitted 30 May, 2019; originally announced May 2019.

    Comments: 21 total pages (8 main pages + appendices), 7 figures, submitted to NeurIPS 2019

  16. arXiv:1805.03730  [pdf, other

    cs.IT

    Zero-error Function Computation on a Directed Acyclic Network

    Authors: Ardhendu Tripathy, Aditya Ramamoorthy

    Abstract: We study the rate region of variable-length source-network codes that are used to compute a function of messages observed over a network. The particular network considered here is the simplest instance of a directed acyclic graph (DAG) that is not a tree. Existing work on zero-error function computation in DAG networks provides bounds on the \textit{computation capacity}, which is a measure of the… ▽ More

    Submitted 9 May, 2018; originally announced May 2018.

    Comments: 18 pages, 2 figures, submitted to IEEE Transactions on Information Theory

  17. arXiv:1712.07008  [pdf, other

    cs.IT cs.CR cs.GT cs.LG stat.ML

    Privacy-Preserving Adversarial Networks

    Authors: Ardhendu Tripathy, Ye Wang, Prakash Ishwar

    Abstract: We propose a data-driven framework for optimizing privacy-preserving data release mechanisms to attain the information-theoretically optimal tradeoff between minimizing distortion of useful data and concealing specific sensitive information. Our approach employs adversarially-trained neural networks to implement randomized mechanisms and to perform a variational approximation of mutual information… ▽ More

    Submitted 12 June, 2019; v1 submitted 19 December, 2017; originally announced December 2017.

    Comments: 16 pages

    MSC Class: 94A15; 68T05; 62B10

  18. arXiv:1612.07773  [pdf, other

    cs.IT

    Sum-networks from undirected graphs: construction and capacity analysis

    Authors: Ardhendu Tripathy, Aditya Ramamoorthy

    Abstract: We consider a directed acyclic network with multiple sources and multiple terminals where each terminal is interested in decoding the sum of independent sources generated at the source nodes. We describe a procedure whereby a simple undirected graph can be used to construct such a sum-network and demonstrate an upper bound on its computation rate. Furthermore, we show sufficient conditions for the… ▽ More

    Submitted 22 December, 2016; originally announced December 2016.

    Comments: This has an extra Appendix section giving more information about Remark 2 of the original paper published in the 52nd Annual Allerton Conference on Communication, Control and Computing, 2014

  19. arXiv:1611.01887  [pdf, other

    cs.IT

    Sum-networks from incidence structures: construction and capacity analysis

    Authors: Ardhendu Tripathy, Aditya Ramamoorthy

    Abstract: A sum-network is an instance of a network coding problem over a directed acyclic network in which each terminal node wants to compute the sum over a finite field of the information observed at all the source nodes. Many characteristics of the well-studied multiple unicast network communication problem also hold for sum-networks due to a known reduction between instances of these two problems. In t… ▽ More

    Submitted 28 January, 2018; v1 submitted 6 November, 2016; originally announced November 2016.

    Comments: Version accepted for publication in IEEE Transactions on Information Theory

  20. arXiv:1601.07228  [pdf, ps, other

    cs.IT

    On Computation Rates for Arithmetic Sum

    Authors: Ardhendu Tripathy, Aditya Ramamoorthy

    Abstract: For zero-error function computation over directed acyclic networks, existing upper and lower bounds on the computation capacity are known to be loose. In this work we consider the problem of computing the arithmetic sum over a specific directed acyclic network that is not a tree. We assume the sources to be i.i.d. Bernoulli with parameter $1/2$. Even in this simple setting, we demonstrate that upp… ▽ More

    Submitted 26 January, 2016; originally announced January 2016.

    Comments: Full paper for ISIT 2016 submission

  21. arXiv:1510.07439  [pdf, ps, other

    cs.SE cs.CL

    Object Oriented Analysis using Natural Language Processing concepts: A Review

    Authors: Abinash Tripathy, Santanu Kumar Rath

    Abstract: The Software Development Life Cycle (SDLC) starts with eliciting requirements of the customers in the form of Software Requirement Specification (SRS). SRS document needed for software development is mostly written in Natural Language(NL) convenient for the client. From the SRS document only, the class name, its attributes and the functions incorporated in the body of the class are traced based on… ▽ More

    Submitted 26 October, 2015; originally announced October 2015.

    Comments: 12 pages, International Journal of Information Processing, 9(3), 38-50, 2015, ISSN : 0973-8215, IK International Publishing House Pvt. Ltd., New Delhi, India

    Journal ref: International Journal of Information Processing, vol. 9, no. 3, pp. 38-50, 2015

  22. arXiv:1504.05618  [pdf, other

    cs.IT

    Capacity of Sum-networks for Different Message Alphabets

    Authors: Ardhendu Tripathy, Aditya Ramamoorthy

    Abstract: A sum-network is a directed acyclic network in which all terminal nodes demand the `sum' of the independent information observed at the source nodes. Many characteristics of the well-studied multiple-unicast network communication problem also hold for sum-networks due to a known reduction between instances of these two problems. Our main result is that unlike a multiple unicast network, the coding… ▽ More

    Submitted 21 April, 2015; originally announced April 2015.