Search | arXiv e-print repository

arXiv:2406.19415 [pdf, other]

An Analysis of Multilingual FActScore

Authors: Kim Trong Vu, Michael Krumdick, Varshini Reddy, Franck Dernoncourt, Viet Dac Lai

Abstract: FActScore has gained popularity as a metric to estimate the factuality of long-form texts generated by Large Language Models (LLMs) in English. However, there has not been any work in studying the behavior of FActScore in other languages. This paper studies the limitations of each component in the four-component pipeline of FActScore in the multilingual setting. We introduce a new dataset for FAct… ▽ More FActScore has gained popularity as a metric to estimate the factuality of long-form texts generated by Large Language Models (LLMs) in English. However, there has not been any work in studying the behavior of FActScore in other languages. This paper studies the limitations of each component in the four-component pipeline of FActScore in the multilingual setting. We introduce a new dataset for FActScore on texts generated by strong multilingual LLMs. Our evaluation shows that LLMs exhibit distinct behaviors in both fact extraction and fact scoring tasks. No LLM produces consistent and reliable FActScore across languages with varying levels of resources. We also find that the knowledge source plays an important role in the quality of the estimated FActScore. Using Wikipedia as the knowledge source may hinder the true FActScore of long-form text due to its limited coverage in medium- and low-resource languages. We also incorporate three mitigations to our knowledge source that ultimately improve FActScore estimation across all languages. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14394 [pdf, other]

SEC-QA: A Systematic Evaluation Corpus for Financial QA

Authors: Viet Dac Lai, Michael Krumdick, Charles Lovering, Varshini Reddy, Craig Schmidt, Chris Tanner

Abstract: The financial domain frequently deals with large numbers of long documents that are essential for daily operations. Significant effort is put towards automating financial data analysis. However, a persistent challenge, not limited to the finance domain, is the scarcity of datasets that accurately reflect real-world tasks for model evaluation. Existing datasets are often constrained by size, contex… ▽ More The financial domain frequently deals with large numbers of long documents that are essential for daily operations. Significant effort is put towards automating financial data analysis. However, a persistent challenge, not limited to the finance domain, is the scarcity of datasets that accurately reflect real-world tasks for model evaluation. Existing datasets are often constrained by size, context, or relevance to practical applications. Moreover, LLMs are currently trained on trillions of tokens of text, limiting access to novel data or documents that models have not encountered during training for unbiased evaluation. We propose SEC-QA, a continuous dataset generation framework with two key features: 1) the semi-automatic generation of Question-Answer (QA) pairs spanning multiple long context financial documents, which better represent real-world financial scenarios; 2) the ability to continually refresh the dataset using the most recent public document collections, not yet ingested by LLMs. Our experiments show that current retrieval augmented generation methods systematically fail to answer these challenging multi-document questions. In response, we introduce a QA system based on program-of-thought that improves the ability to perform complex information retrieval and quantitative reasoning pipelines, thereby increasing QA accuracy. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.02608 [pdf, other]

PPINtonus: Early Detection of Parkinson's Disease Using Deep-Learning Tonal Analysis

Authors: Varun Reddy

Abstract: PPINtonus is a system for the early detection of Parkinson's Disease (PD) utilizing deep-learning tonal analysis, providing a cost-effective and accessible alternative to traditional neurological examinations. Partnering with the Parkinson's Voice Project (PVP), PPINtonus employs a semi-supervised conditional generative adversarial network to generate synthetic data points, enhancing the training… ▽ More PPINtonus is a system for the early detection of Parkinson's Disease (PD) utilizing deep-learning tonal analysis, providing a cost-effective and accessible alternative to traditional neurological examinations. Partnering with the Parkinson's Voice Project (PVP), PPINtonus employs a semi-supervised conditional generative adversarial network to generate synthetic data points, enhancing the training dataset for a multi-layered deep neural network. Combined with PRAAT phonetics software, this network accurately assesses biomedical voice measurement values from a simple 120-second vocal test performed with a standard microphone in typical household noise conditions. The model's performance was validated using a confusion matrix, achieving an impressive 92.5 \% accuracy with a low false negative rate. PPINtonus demonstrated a precision of 92.7 \%, making it a reliable tool for early PD detection. The non-intrusive and efficient methodology of PPINtonus can significantly benefit develo** countries by enabling early diagnosis and improving the quality of life for millions of PD patients through timely intervention and management. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2402.18376 [pdf, other]

Tokenization Is More Than Compression

Authors: Craig W. Schmidt, Varshini Reddy, Haoran Zhang, Alec Alameddine, Omri Uzan, Yuval Pinter, Chris Tanner

Abstract: Tokenization is a foundational step in Natural Language Processing (NLP) tasks, bridging raw text and language models. Existing tokenization approaches like Byte-Pair Encoding (BPE) originate from the field of data compression, and it has been suggested that the effectiveness of BPE stems from its ability to condense text into a relatively small number of tokens. We test the hypothesis that fewer… ▽ More Tokenization is a foundational step in Natural Language Processing (NLP) tasks, bridging raw text and language models. Existing tokenization approaches like Byte-Pair Encoding (BPE) originate from the field of data compression, and it has been suggested that the effectiveness of BPE stems from its ability to condense text into a relatively small number of tokens. We test the hypothesis that fewer tokens lead to better downstream performance by introducing PathPiece, a new tokenizer that segments a document's text into the minimum number of tokens for a given vocabulary. Through extensive experimentation we find this hypothesis not to be the case, casting doubt on the understanding of the reasons for effective tokenization. To examine which other factors play a role, we evaluate design decisions across all three phases of tokenization: pre-tokenization, vocabulary construction, and segmentation, offering new insights into the design of effective tokenizers. Specifically, we illustrate the importance of pre-tokenization and the benefits of using BPE to initialize vocabulary construction. We train 64 language models with varying tokenization, ranging in size from 350M to 2.4B parameters, all of which are made publicly available. △ Less

Submitted 28 February, 2024; originally announced February 2024.

MSC Class: 68T50 ACM Class: I.2.7

arXiv:2401.11932 [pdf, other]

Accelerating Causal Algorithms for Industrial-scale Data: A Distributed Computing Approach with Ray Framework

Authors: Vishal Verma, Vinod Reddy, Jaiprakash Ravi

Abstract: The increasing need for causal analysis in large-scale industrial datasets necessitates the development of efficient and scalable causal algorithms for real-world applications. This paper addresses the challenge of scaling causal algorithms in the context of conducting causal analysis on extensive datasets commonly encountered in industrial settings. Our proposed solution involves enhancing the sc… ▽ More The increasing need for causal analysis in large-scale industrial datasets necessitates the development of efficient and scalable causal algorithms for real-world applications. This paper addresses the challenge of scaling causal algorithms in the context of conducting causal analysis on extensive datasets commonly encountered in industrial settings. Our proposed solution involves enhancing the scalability of causal algorithm libraries, such as EconML, by leveraging the parallelism capabilities offered by the distributed computing framework Ray. We explore the potential of parallelizing key iterative steps within causal algorithms to significantly reduce overall runtime, supported by a case study that examines the impact on estimation times and costs. Through this approach, we aim to provide a more effective solution for implementing causal analysis in large-scale industrial applications. △ Less

Submitted 22 January, 2024; originally announced January 2024.

ACM Class: C.4; E.2; I.2.1

arXiv:2401.06915 [pdf, other]

DocFinQA: A Long-Context Financial Reasoning Dataset

Authors: Varshini Reddy, Rik Koncel-Kedziorski, Viet Dac Lai, Michael Krumdick, Charles Lovering, Chris Tanner

Abstract: For large language models (LLMs) to be effective in the financial domain -- where each decision can have a significant impact -- it is necessary to investigate realistic tasks and data. Financial professionals often interact with documents that are hundreds of pages long, but most financial research datasets only deal with short excerpts from these documents. To address this, we introduce a long-d… ▽ More For large language models (LLMs) to be effective in the financial domain -- where each decision can have a significant impact -- it is necessary to investigate realistic tasks and data. Financial professionals often interact with documents that are hundreds of pages long, but most financial research datasets only deal with short excerpts from these documents. To address this, we introduce a long-document financial QA task. We augment 7,437 questions from the existing FinQA dataset with the full-document context, extending the average context length from under 700 words in FinQA to 123k words in DocFinQA. We conduct extensive experiments over retrieval-based QA pipelines and long-context language models. DocFinQA proves a significant challenge for even state-of-the-art systems. We also provide a case-study on the longest documents in DocFinQA and find that models particularly struggle on these documents. Addressing these challenges may have a wide reaching impact across applications where specificity and long-range contexts are critical, like gene sequences and legal document contract analysis. △ Less

Submitted 29 February, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

Comments: 13 pages

arXiv:2312.02299 [pdf]

Cotton Yield Prediction Using Random Forest

Authors: Alakananda Mitra, Sahila Beegum, David Fleisher, Vangimalla R. Reddy, Wenguang Sun, Chittaranjan Ray, Dennis Timlin, Arindam Malakar

Abstract: The cotton industry in the United States is committed to sustainable production practices that minimize water, land, and energy use while improving soil health and cotton output. Climate-smart agricultural technologies are being developed to boost yields while decreasing operating expenses. Crop yield prediction, on the other hand, is difficult because of the complex and nonlinear impacts of culti… ▽ More The cotton industry in the United States is committed to sustainable production practices that minimize water, land, and energy use while improving soil health and cotton output. Climate-smart agricultural technologies are being developed to boost yields while decreasing operating expenses. Crop yield prediction, on the other hand, is difficult because of the complex and nonlinear impacts of cultivar, soil type, management, pest and disease, climate, and weather patterns on crops. To solve this issue, we employ machine learning (ML) to forecast production while considering climate change, soil diversity, cultivar, and inorganic nitrogen levels. From the 1980s to the 1990s, field data were gathered across the southern cotton belt of the United States. To capture the most current effects of climate change over the previous six years, a second data source was produced using the process-based crop model, GOSSYM. We concentrated our efforts on three distinct areas inside each of the three southern states: Texas, Mississippi, and Georgia. To simplify the amount of computations, accumulated heat units (AHU) for each set of experimental data were employed as an analogy to use time-series weather data. The Random Forest Regressor yielded a 97.75% accuracy rate, with a root mean square error of 55.05 kg/ha and an R2 of around 0.98. These findings demonstrate how an ML technique may be developed and applied as a reliable and easy-to-use model to support the cotton climate-smart initiative. △ Less

Submitted 4 December, 2023; originally announced December 2023.

Comments: 6 pages, 2 figures, 3 tables

arXiv:2311.06602 [pdf, other]

BizBench: A Quantitative Reasoning Benchmark for Business and Finance

Authors: Rik Koncel-Kedziorski, Michael Krumdick, Viet Lai, Varshini Reddy, Charles Lovering, Chris Tanner

Abstract: Answering questions within business and finance requires reasoning, precision, and a wide-breadth of technical knowledge. Together, these requirements make this domain difficult for large language models (LLMs). We introduce BizBench, a benchmark for evaluating models' ability to reason about realistic financial problems. BizBench comprises eight quantitative reasoning tasks, focusing on question-… ▽ More Answering questions within business and finance requires reasoning, precision, and a wide-breadth of technical knowledge. Together, these requirements make this domain difficult for large language models (LLMs). We introduce BizBench, a benchmark for evaluating models' ability to reason about realistic financial problems. BizBench comprises eight quantitative reasoning tasks, focusing on question-answering (QA) over financial data via program synthesis. We include three financially-themed code-generation tasks from newly collected and augmented QA data. Additionally, we isolate the reasoning capabilities required for financial QA: reading comprehension of financial text and tables for extracting intermediate values, and understanding financial concepts and formulas needed to calculate complex solutions. Collectively, these tasks evaluate a model's financial background knowledge, ability to parse financial documents, and capacity to solve problems with code. We conduct an in-depth evaluation of open-source and commercial LLMs, comparing and contrasting the behavior of code-focused and language-focused models. We demonstrate that the current bottleneck in performance is due to LLMs' limited business and financial understanding, highlighting the value of a challenging benchmark for quantitative reasoning within this domain. △ Less

Submitted 12 March, 2024; v1 submitted 11 November, 2023; originally announced November 2023.

Comments: Work in progress

arXiv:2309.11157 [pdf, other]

Learning Deformable 3D Graph Similarity to Track Plant Cells in Unregistered Time Lapse Images

Authors: Md Shazid Islam, Arindam Dutta, Calvin-Khang Ta, Kevin Rodriguez, Christian Michael, Mark Alber, G. Venugopala Reddy, Amit K. Roy-Chowdhury

Abstract: Tracking of plant cells in images obtained by microscope is a challenging problem due to biological phenomena such as large number of cells, non-uniform growth of different layers of the tightly packed plant cells and cell division. Moreover, images in deeper layers of the tissue being noisy and unavoidable systemic errors inherent in the imaging process further complicates the problem. In this pa… ▽ More Tracking of plant cells in images obtained by microscope is a challenging problem due to biological phenomena such as large number of cells, non-uniform growth of different layers of the tightly packed plant cells and cell division. Moreover, images in deeper layers of the tissue being noisy and unavoidable systemic errors inherent in the imaging process further complicates the problem. In this paper, we propose a novel learning-based method that exploits the tightly packed three-dimensional cell structure of plant cells to create a three-dimensional graph in order to perform accurate cell tracking. We further propose novel algorithms for cell division detection and effective three-dimensional registration, which improve upon the state-of-the-art algorithms. We demonstrate the efficacy of our algorithm in terms of tracking accuracy and inference-time on a benchmark dataset. △ Less

Submitted 21 September, 2023; v1 submitted 20 September, 2023; originally announced September 2023.

arXiv:2308.01203 [pdf, other]

Analysing the Resourcefulness of the Paragraph for Precedence Retrieval

Authors: Bhoomeendra Singh Sisodiya, Narendra Babu Unnam, P. Krishna Reddy, Apala Das, K. V. K. Santhy, V. Balakista Reddy

Abstract: Develo** methods for extracting relevant legal information to aid legal practitioners is an active research area. In this regard, research efforts are being made by leveraging different kinds of information, such as meta-data, citations, keywords, sentences, paragraphs, etc. Similar to any text document, legal documents are composed of paragraphs. In this paper, we have analyzed the resourcefuln… ▽ More Develo** methods for extracting relevant legal information to aid legal practitioners is an active research area. In this regard, research efforts are being made by leveraging different kinds of information, such as meta-data, citations, keywords, sentences, paragraphs, etc. Similar to any text document, legal documents are composed of paragraphs. In this paper, we have analyzed the resourcefulness of paragraph-level information in capturing similarity among judgments for improving the performance of precedence retrieval. We found that the paragraph-level methods could capture the similarity among the judgments with only a few paragraph interactions and exhibit more discriminating power over the baseline document-level method. Moreover, the comparison results on two benchmark datasets for the precedence retrieval on the Indian supreme court judgments task show that the paragraph-level methods exhibit comparable performance with the state-of-the-art methods △ Less

Submitted 29 July, 2023; originally announced August 2023.

Comments: 5 pages , 3 figures, ICAIL 2023

arXiv:2305.17956 [pdf, other]

On Color Critical Graphs of Star Coloring

Authors: Harshit Kumar Choudhary, I. Vinod Reddy

Abstract: A \emph{star coloring} of a graph $G$ is a proper vertex-coloring such that no path on four vertices is $2$-colored. The minimum number of colors required to obtain a star coloring of a graph $G$ is called star chromatic number and it is denoted by $χ_s(G)$. A graph $G$ is called $k$-critical if $χ_s(G)=k$ and $χ_s(G -e) < χ_s(G)$ for every edge $e \in E(G)$. In this paper, we give a characterizat… ▽ More A \emph{star coloring} of a graph $G$ is a proper vertex-coloring such that no path on four vertices is $2$-colored. The minimum number of colors required to obtain a star coloring of a graph $G$ is called star chromatic number and it is denoted by $χ_s(G)$. A graph $G$ is called $k$-critical if $χ_s(G)=k$ and $χ_s(G -e) < χ_s(G)$ for every edge $e \in E(G)$. In this paper, we give a characterization of 3-critical, $(n-1)$-critical and $(n-2)$-critical graphs with respect to star coloring, where $n$ denotes the number of vertices of $G$. We also give upper and lower bounds on the minimum number of edges in $(n-1)$-critical and $(n-2)$-critical graphs. △ Less

Submitted 29 May, 2023; originally announced May 2023.

arXiv:2305.17536 [pdf, other]

On Locally Identifying Coloring of Cartesian Product and Tensor Product of Graphs

Authors: Sriram Bhyravarapu, Swati Kumari, I. Vinod Reddy

Abstract: For a positive integer $k$, a proper $k$-coloring of a graph $G$ is a map** $f: V(G) \rightarrow \{1,2, \ldots, k\}$ such that $f(u) \neq f(v)$ for each edge $uv$ of $G$. The smallest integer $k$ for which there is a proper $k$-coloring of $G$ is called the chromatic number of $G$, denoted by $χ(G)$. A locally identifying coloring (for short, lid-coloring) of a graph $G$ is a proper $k$-colorin… ▽ More For a positive integer $k$, a proper $k$-coloring of a graph $G$ is a map** $f: V(G) \rightarrow \{1,2, \ldots, k\}$ such that $f(u) \neq f(v)$ for each edge $uv$ of $G$. The smallest integer $k$ for which there is a proper $k$-coloring of $G$ is called the chromatic number of $G$, denoted by $χ(G)$. A locally identifying coloring (for short, lid-coloring) of a graph $G$ is a proper $k$-coloring of $G$ such that every pair of adjacent vertices with distinct closed neighborhoods has distinct set of colors in their closed neighborhoods. The smallest integer $k$ such that $G$ has a lid-coloring with $k$ colors is called locally identifying chromatic number (for short, lid-chromatic number) of $G$, denoted by $χ_{lid}(G)$. This paper studies the lid-coloring of the Cartesian product and tensor product of two graphs. We prove that if $G$ and $H$ are two connected graphs having at least two vertices then (a) $χ_{lid}(G \square H) \leq χ(G) χ(H)-1$ and (b) $χ_{lid}(G \times H) \leq χ(G) χ(H)$. Here $G \square H$ and $G \times H$ denote the Cartesian and tensor products of $G$ and $H$ respectively. We determine the lid-chromatic number of $C_m \square P_n$, $C_m \square C_n$, $P_m \times P_n$, $C_m \times P_n$ and $C_m \times C_n$, where $C_m$ and $P_n$ denote a cycle and a path on $m$ and $n$ vertices respectively. △ Less

Submitted 12 October, 2023; v1 submitted 27 May, 2023; originally announced May 2023.

arXiv:2305.17300 [pdf, other]

Exploiting Large Neuroimaging Datasets to Create Connectome-Constrained Approaches for more Robust, Efficient, and Adaptable Artificial Intelligence

Authors: Erik C. Johnson, Brian S. Robinson, Gautam K. Vallabha, Justin Joyce, Jordan K. Matelsky, Raphael Norman-Tenazas, Isaac Western, Marisel Villafañe-Delgado, Martha Cervantes, Michael S. Robinette, Arun V. Reddy, Lindsey Kitchell, Patricia K. Rivlin, Elizabeth P. Reilly, Nathan Drenkow, Matthew J. Roos, I-Jeng Wang, Brock A. Wester, William R. Gray-Roncal, Joan A. Hoffmann

Abstract: Despite the progress in deep learning networks, efficient learning at the edge (enabling adaptable, low-complexity machine learning solutions) remains a critical need for defense and commercial applications. We envision a pipeline to utilize large neuroimaging datasets, including maps of the brain which capture neuron and synapse connectivity, to improve machine learning approaches. We have pursue… ▽ More Despite the progress in deep learning networks, efficient learning at the edge (enabling adaptable, low-complexity machine learning solutions) remains a critical need for defense and commercial applications. We envision a pipeline to utilize large neuroimaging datasets, including maps of the brain which capture neuron and synapse connectivity, to improve machine learning approaches. We have pursued different approaches within this pipeline structure. First, as a demonstration of data-driven discovery, the team has developed a technique for discovery of repeated subcircuits, or motifs. These were incorporated into a neural architecture search approach to evolve network architectures. Second, we have conducted analysis of the heading direction circuit in the fruit fly, which performs fusion of visual and angular velocity features, to explore augmenting existing computational models with new insight. Our team discovered a novel pattern of connectivity, implemented a new model, and demonstrated sensor fusion on a robotic platform. Third, the team analyzed circuitry for memory formation in the fruit fly connectome, enabling the design of a novel generative replay approach. Finally, the team has begun analysis of connectivity in mammalian cortex to explore potential improvements to transformer networks. These constraints increased network robustness on the most challenging examples in the CIFAR-10-C computer vision robustness benchmark task, while reducing learnable attention parameters by over an order of magnitude. Taken together, these results demonstrate multiple potential approaches to utilize insight from neural systems for develo** robust and efficient machine learning techniques. △ Less

Submitted 26 May, 2023; originally announced May 2023.

Comments: 11 pages, 4 figures

arXiv:2305.00730 [pdf, ps, other]

Integer Linear Programming Formulations for Triple and Quadruple Roman Domination Problems

Authors: Sanath Kumar Vengaldas, Adarsh Reddy Muthyala, Bharath Chaitanya Konkati, P. Venkata Subba Reddy

Abstract: Roman domination is a well researched topic in graph theory. Recently two new variants of Roman domination, namely triple Roman domination and quadruple Roman domination problems have been introduced, to provide better defense strategies. However, triple Roman domination and quadruple Roman domination problems are NP-hard. In this paper, we have provided genetic algorithm for solving triple and qu… ▽ More Roman domination is a well researched topic in graph theory. Recently two new variants of Roman domination, namely triple Roman domination and quadruple Roman domination problems have been introduced, to provide better defense strategies. However, triple Roman domination and quadruple Roman domination problems are NP-hard. In this paper, we have provided genetic algorithm for solving triple and quadruple Roman domination problems. Programming (ILP) formulations for triple Roman domination and quadruple Roman domination problems have been proposed. The proposed models are implemented using IBM CPLEX 22.1 optimization solvers and obtained results for random graphs generated using NetworkX Erdos-Renyi model. △ Less

Submitted 1 May, 2023; originally announced May 2023.

arXiv:2303.10280 [pdf, other]

Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances

Authors: Arun V. Reddy, Ketul Shah, William Paul, Rohita Mocharla, Judy Hoffman, Kapil D. Katyal, Dinesh Manocha, Celso M. de Melo, Rama Chellappa

Abstract: Human action recognition is a challenging problem, particularly when there is high variability in factors such as subject appearance, backgrounds and viewpoint. While deep neural networks (DNNs) have been shown to perform well on action recognition tasks, they typically require large amounts of high-quality labeled data to achieve robust performance across a variety of conditions. Synthetic data h… ▽ More Human action recognition is a challenging problem, particularly when there is high variability in factors such as subject appearance, backgrounds and viewpoint. While deep neural networks (DNNs) have been shown to perform well on action recognition tasks, they typically require large amounts of high-quality labeled data to achieve robust performance across a variety of conditions. Synthetic data has shown promise as a way to avoid the substantial costs and potential ethical concerns associated with collecting and labeling enormous amounts of data in the real-world. However, synthetic data may differ from real data in important ways. This phenomenon, known as \textit{domain shift}, can limit the utility of synthetic data in robotics applications. To mitigate the effects of domain shift, substantial effort is being dedicated to the development of domain adaptation (DA) techniques. Yet, much remains to be understood about how best to develop these techniques. In this paper, we introduce a new dataset called Robot Control Gestures (RoCoG-v2). The dataset is composed of both real and synthetic videos from seven gesture classes, and is intended to support the study of synthetic-to-real domain shift for video-based action recognition. Our work expands upon existing datasets by focusing the action classes on gestures for human-robot teaming, as well as by enabling investigation of domain shift in both ground and aerial views. We present baseline results using state-of-the-art action recognition and domain adaptation algorithms and offer initial insight on tackling the synthetic-to-real and ground-to-air domain shifts. △ Less

Submitted 17 March, 2023; originally announced March 2023.

Comments: ICRA 2023. The first two authors contributed equally. Dataset available at: https://github.com/reddyav1/RoCoG-v2

arXiv:2302.12959 [pdf]

Chaotic Variational Auto encoder-based Adversarial Machine Learning

Authors: Pavan Venkata Sainadh Reddy, Yelleti Vivek, Gopi Pranay, Vadlamani Ravi

Abstract: Machine Learning (ML) has become the new contrivance in almost every field. This makes them a target of fraudsters by various adversary attacks, thereby hindering the performance of ML models. Evasion and Data-Poison-based attacks are well acclaimed, especially in finance, healthcare, etc. This motivated us to propose a novel computationally less expensive attack mechanism based on the adversarial… ▽ More Machine Learning (ML) has become the new contrivance in almost every field. This makes them a target of fraudsters by various adversary attacks, thereby hindering the performance of ML models. Evasion and Data-Poison-based attacks are well acclaimed, especially in finance, healthcare, etc. This motivated us to propose a novel computationally less expensive attack mechanism based on the adversarial sample generation by Variational Auto Encoder (VAE). It is well known that Wavelet Neural Network (WNN) is considered computationally efficient in solving image and audio processing, speech recognition, and time-series forecasting. This paper proposed VAE-Deep-Wavelet Neural Network (VAE-Deep-WNN), where Encoder and Decoder employ WNN networks. Further, we proposed chaotic variants of both VAE with Multi-layer perceptron (MLP) and Deep-WNN and named them C-VAE-MLP and C-VAE-Deep-WNN, respectively. Here, we employed a Logistic map to generate random noise in the latent space. In this paper, we performed VAE-based adversary sample generation and applied it to various problems related to finance and cybersecurity domain-related problems such as loan default, credit card fraud, and churn modelling, etc., We performed both Evasion and Data-Poison attacks on Logistic Regression (LR) and Decision Tree (DT) models. The results indicated that VAE-Deep-WNN outperformed the rest in the majority of the datasets and models. However, its chaotic variant C-VAE-Deep-WNN performed almost similarly to VAE-Deep-WNN in the majority of the datasets. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: 24 pages, 6 figures and 5 tables

MSC Class: 68T01; 68M25 ACM Class: I.2.6; K.6.5

arXiv:2302.04305 [pdf, other]

Mask Conditional Synthetic Satellite Imagery

Authors: Van Anh Le, Varshini Reddy, Zixi Chen, Mengyuan Li, Xinran Tang, Anthony Ortiz, Simone Fobi Nsutezo, Caleb Robinson

Abstract: In this paper we propose a mask-conditional synthetic image generation model for creating synthetic satellite imagery datasets. Given a dataset of real high-resolution images and accompanying land cover masks, we show that it is possible to train an upstream conditional synthetic imagery generator, use that generator to create synthetic imagery with the land cover masks, then train a downstream mo… ▽ More In this paper we propose a mask-conditional synthetic image generation model for creating synthetic satellite imagery datasets. Given a dataset of real high-resolution images and accompanying land cover masks, we show that it is possible to train an upstream conditional synthetic imagery generator, use that generator to create synthetic imagery with the land cover masks, then train a downstream model on the synthetic imagery and land cover masks that achieves similar test performance to a model that was trained with the real imagery. Further, we find that incorporating a mixture of real and synthetic imagery acts as a data augmentation method, producing better models than using only real imagery (0.5834 vs. 0.5235 mIoU). Finally, we find that encouraging diversity of outputs in the upstream model is a necessary component for improved downstream task performance. We have released code for reproducing our work on GitHub, see https://github.com/ms-synthetic-satellite-image/synthetic-satellite-imagery . △ Less

Submitted 8 February, 2023; originally announced February 2023.

arXiv:2211.12226 [pdf, other]

On Structural Parameterizations of Star Coloring

Authors: Sriram Bhyravarapu, I. Vinod Reddy

Abstract: A Star Coloring of a graph G is a proper vertex coloring such that every path on four vertices uses at least three distinct colors. The minimum number of colors required for such a star coloring of G is called star chromatic number, denoted by χ_s(G). Given a graph G and a positive integer k, the STAR COLORING PROBLEM asks whether $G$ has a star coloring using at most k colors. This problem is NP-… ▽ More A Star Coloring of a graph G is a proper vertex coloring such that every path on four vertices uses at least three distinct colors. The minimum number of colors required for such a star coloring of G is called star chromatic number, denoted by χ_s(G). Given a graph G and a positive integer k, the STAR COLORING PROBLEM asks whether $G$ has a star coloring using at most k colors. This problem is NP-complete even on restricted graph classes such as bipartite graphs. In this paper, we initiate a study of STAR COLORING from the parameterized complexity perspective. We show that STAR COLORING is fixed-parameter tractable when parameterized by (a) neighborhood diversity, (b) twin-cover, and (c) the combined parameters clique-width and the number of colors. △ Less

Submitted 22 November, 2022; originally announced November 2022.

arXiv:2208.01705 [pdf, other]

Success of Uncertainty-Aware Deep Models Depends on Data Manifold Geometry

Authors: Mark Penrod, Harrison Termotto, Varshini Reddy, Jiayu Yao, Finale Doshi-Velez, Weiwei Pan

Abstract: For responsible decision making in safety-critical settings, machine learning models must effectively detect and process edge-case data. Although existing works show that predictive uncertainty is useful for these tasks, it is not evident from literature which uncertainty-aware models are best suited for a given dataset. Thus, we compare six uncertainty-aware deep learning models on a set of edge-… ▽ More For responsible decision making in safety-critical settings, machine learning models must effectively detect and process edge-case data. Although existing works show that predictive uncertainty is useful for these tasks, it is not evident from literature which uncertainty-aware models are best suited for a given dataset. Thus, we compare six uncertainty-aware deep learning models on a set of edge-case tasks: robustness to adversarial attacks as well as out-of-distribution and adversarial detection. We find that the geometry of the data sub-manifold is an important factor in determining the success of various models. Our finding suggests an interesting direction in the study of uncertainty-aware deep learning models. △ Less

Submitted 5 August, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

ACM Class: I.2.6

Journal ref: International Conference on Machine Learning. PMLR 162 (2022)

arXiv:2205.13020 [pdf, other]

People counting system for retail analytics using edge AI

Authors: Karthik Reddy Kanjula, Vishnu Vardhan Reddy, Jnanesh K P, Jeffy S Abraham, Tanuja K

Abstract: Developments in IoT applications are playing an important role in our day-to-day life, starting from business predictions to self driving cars. One of the area, most influenced by the field of AI and IoT is retail analytics. In Retail Analytics, Conversion Rates - a metric which is most often used by retail stores to measure how many people have visited the store and how many purchases has happene… ▽ More Developments in IoT applications are playing an important role in our day-to-day life, starting from business predictions to self driving cars. One of the area, most influenced by the field of AI and IoT is retail analytics. In Retail Analytics, Conversion Rates - a metric which is most often used by retail stores to measure how many people have visited the store and how many purchases has happened. This retail conversion rate assess the marketing operations, increasing stock, store outlet and running promotions ..etc. Our project intends to build a cost-effective people counting system with AI at Edge, where it calculates Conversion rates using total number of people counted by the system and number of transactions for the day, which helps in providing analytical insights for retail store optimization with a very minimum hardware requirements. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: 5 pages, 3 figures. We proposed a novel framework design (highlighted in abstract) instead of enhancing a DL model or openVINO. To demonstrate the importance of our framework, we have chosen a retail computer vision problem, people counting system and attempted to construct an end-to-end solution with our suggested framework

arXiv:2201.01538 [pdf, ps, other]

Compliant Constant Output/ Input Force Mechanisms- Topology Optimization with Contact

Authors: B V S Nagendra Reddy, Vitthal Manohar Khatik, Burkhard Corves, Anupam Saxena

Abstract: We synthesize monolithic topologies of constant output (CoFM) and input (CiFM) force mechanisms. During synthesis, we capture all possible aspects of member deformation including finite displacements, buckling, interaction between members, their interaction with external surfaces, and importantly, interaction of the mechanism with flexible workpieces to capture force transfer in true sense. Featur… ▽ More We synthesize monolithic topologies of constant output (CoFM) and input (CiFM) force mechanisms. During synthesis, we capture all possible aspects of member deformation including finite displacements, buckling, interaction between members, their interaction with external surfaces, and importantly, interaction of the mechanism with flexible workpieces to capture force transfer in true sense. Features of constant force characteristics, e.g., magnitude(s) of the desired force(s), range of input displacement over which slope of the force displacement curve is near zero, and distance between workpiece and the mechanism are controlled individually via novel objectives proposed herein. Two of the constant output and constant input force mechanisms each, are synthesized using stochastic optimization ensuring ready manufacturability. We observe that presence of external surfaces may not be required for singlepiece mechanisms to attain constant force characteristics. However, interesting solutions are possible if mutual contact is permitted. We also note that desired force characteristics may not remain the same with alteration in the workpieces shape and (or) material properties. We finally fabricate and test the synthesized mechanisms and find that the desired constant force characteristics are by and large retained. △ Less

Submitted 11 January, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

Comments: 25 pages

arXiv:2111.06916 [pdf]

Offense Detection in Dravidian Languages using Code-Mixing Index based Focal Loss

Authors: Debapriya Tula, Shreyas MS, Viswanatha Reddy, Pranjal Sahu, Sumanth Doddapaneni, Prathyush Potluri, Rohan Sukumaran, Parth Patwa

Abstract: Over the past decade, we have seen exponential growth in online content fueled by social media platforms. Data generation of this scale comes with the caveat of insurmountable offensive content in it. The complexity of identifying offensive content is exacerbated by the usage of multiple modalities (image, language, etc.), code-mixed language and more. Moreover, even after careful sampling and ann… ▽ More Over the past decade, we have seen exponential growth in online content fueled by social media platforms. Data generation of this scale comes with the caveat of insurmountable offensive content in it. The complexity of identifying offensive content is exacerbated by the usage of multiple modalities (image, language, etc.), code-mixed language and more. Moreover, even after careful sampling and annotation of offensive content, there will always exist a significant class imbalance between offensive and non-offensive content. In this paper, we introduce a novel Code-Mixing Index (CMI) based focal loss which circumvents two challenges (1) code-mixing in languages (2) class imbalance problem for Dravidian language offense detection. We also replace the conventional dot product-based classifier with the cosine-based classifier which results in a boost in performance. Further, we use multilingual models that help transfer characteristics learnt across languages to work effectively with low resourced languages. It is also important to note that our model handles instances of mixed script (say usage of Latin and Dravidian-Tamil script) as well. To summarize, our model can handle offensive language detection in a low-resource, class imbalanced, multilingual and code-mixed setting. △ Less

Submitted 6 May, 2022; v1 submitted 12 November, 2021; originally announced November 2021.

Comments: Accepted for publication at SN Computer Science Journal

arXiv:2107.14368 [pdf, other]

Deep Quantized Representation for Enhanced Reconstruction

Authors: Akash Gupta, Abhishek Aich, Kevin Rodriguez, G. Venugopala Reddy, Amit K. Roy-Chowdhury

Abstract: While machine learning approaches have shown remarkable performance in biomedical image analysis, most of these methods rely on high-quality and accurate imaging data. However, collecting such data requires intensive and careful manual effort. One of the major challenges in imaging the Shoot Apical Meristem (SAM) of Arabidopsis thaliana, is that the deeper slices in the z-stack suffer from differe… ▽ More While machine learning approaches have shown remarkable performance in biomedical image analysis, most of these methods rely on high-quality and accurate imaging data. However, collecting such data requires intensive and careful manual effort. One of the major challenges in imaging the Shoot Apical Meristem (SAM) of Arabidopsis thaliana, is that the deeper slices in the z-stack suffer from different perpetual quality-related problems like poor contrast and blurring. These quality-related issues often lead to the disposal of the painstakingly collected data with little to no control on quality while collecting the data. Therefore, it becomes necessary to employ and design techniques that can enhance the images to make them more suitable for further analysis. In this paper, we propose a data-driven Deep Quantized Latent Representation (DQLR) methodology for high-quality image reconstruction in the Shoot Apical Meristem (SAM) of Arabidopsis thaliana. Our proposed framework utilizes multiple consecutive slices in the z-stack to learn a low dimensional latent space, quantize it and subsequently perform reconstruction using the quantized representation to obtain sharper images. Experiments on a publicly available dataset validate our methodology showing promising results. △ Less

Submitted 29 July, 2021; originally announced July 2021.

Comments: Accepted to ISBI Workshop, 2020

arXiv:2105.12527 [pdf, other]

Dimensioning of V2X Services in 5G Networks through Forecast-based Scaling

Authors: Jorge Martín-Pérez, Koteswararao Kondepu, Danny De Vleeschauwer, Venkatarami Reddy, Carlos Guimarães, Andrea Sgambelluri, Luca Valcarenghi, Chrysa Papagianni, Carlos J. Bernardos

Abstract: With the increasing adoption of intelligent transportation systems and the upcoming era of autonomous vehicles, vehicular services (such as, remote driving, cooperative awareness, and hazard warning) will face an ever changing and dynamic environment. Traffic flows on the roads is a critical condition for these services and, therefore, it is of paramount importance to forecast how they will evolve… ▽ More With the increasing adoption of intelligent transportation systems and the upcoming era of autonomous vehicles, vehicular services (such as, remote driving, cooperative awareness, and hazard warning) will face an ever changing and dynamic environment. Traffic flows on the roads is a critical condition for these services and, therefore, it is of paramount importance to forecast how they will evolve over time. By knowing future events (such as, traffic jams), vehicular services can be dimensioned in an on-demand fashion in order to minimize Service Level Agreements (SLAs) violations, thus reducing the chances of car accidents. This research departs from an evaluation of traditional time-series techniques with recent Machine Learning (ML)-based solutions to forecast traffic flows in the roads of Torino (Italy). Given the accuracy of the selected forecasting techniques, a forecast-based scaling algorithm is proposed and evaluated over a set of dimensioning experiments of three distinct vehicular services with strict latency requirements. Results show that the proposed scaling algorithm enables resource savings of up to a 5% at the cost of incurring in an increase of less than 0.4% of latency violations. △ Less

Submitted 26 May, 2021; originally announced May 2021.

Comments: 10 pages, 7 figures, pre-print, arXiv:1406.6768

arXiv:2105.08693 [pdf, other]

Conflict-Free Coloring: Graphs of Bounded Clique Width and Intersection Graphs

Authors: Sriram Bhyravarapu, Tim A. Hartmann, Hung P. Hoang, Subrahmanyam Kalyanasundaram, I. Vinod Reddy

Abstract: A conflict-free coloring of a graph $G$ is a (partial) coloring of its vertices such that every vertex $u$ has a neighbor whose assigned color is unique in the neighborhood of $u$. There are two variants of this coloring, one defined using the open neighborhood and one using the closed neighborhood. For both variants, we study the problem of deciding whether the conflict-free coloring of a given g… ▽ More A conflict-free coloring of a graph $G$ is a (partial) coloring of its vertices such that every vertex $u$ has a neighbor whose assigned color is unique in the neighborhood of $u$. There are two variants of this coloring, one defined using the open neighborhood and one using the closed neighborhood. For both variants, we study the problem of deciding whether the conflict-free coloring of a given graph $G$ is at most a given number $k$. In this work, we investigate the relation of clique-width and minimum number of colors needed (for both variants) and show that these parameters do not bound one another. Moreover, we consider specific graph classes, particularly graphs of bounded clique-width and types of intersection graphs, such as distance hereditary graphs, interval graphs and unit square and disk graphs. We also consider Kneser graphs and split graphs. We give (often tight) upper and lower bounds and determine the complexity of the decision problem on these graph classes, which improve some of the results from the literature. Particularly, we settle the number of colors needed for an interval graph to be conflict-free colored under the open neighborhood model, which was posed as an open problem. △ Less

Submitted 11 March, 2024; v1 submitted 18 May, 2021; originally announced May 2021.

Comments: Accepted in Algorithmica

arXiv:2105.08321 [pdf, other]

Can Self Reported Symptoms Predict Daily COVID-19 Cases?

Authors: Parth Patwa, Viswanatha Reddy, Rohan Sukumaran, Sethuraman TV, Eptehal Nashnoush, Sheshank Shankar, Rishemjit Kaur, Abhishek Singh, Ramesh Raskar

Abstract: The COVID-19 pandemic has impacted lives and economies across the globe, leading to many deaths. While vaccination is an important intervention, its roll-out is slow and unequal across the globe. Therefore, extensive testing still remains one of the key methods to monitor and contain the virus. Testing on a large scale is expensive and arduous. Hence, we need alternate methods to estimate the numb… ▽ More The COVID-19 pandemic has impacted lives and economies across the globe, leading to many deaths. While vaccination is an important intervention, its roll-out is slow and unequal across the globe. Therefore, extensive testing still remains one of the key methods to monitor and contain the virus. Testing on a large scale is expensive and arduous. Hence, we need alternate methods to estimate the number of cases. Online surveys have been shown to be an effective method for data collection amidst the pandemic. In this work, we develop machine learning models to estimate the prevalence of COVID-19 using self-reported symptoms. Our best model predicts the daily cases with a mean absolute error (MAE) of 226.30 (normalized MAE of 27.09%) per state, which demonstrates the possibility of predicting the actual number of confirmed cases by utilizing self-reported symptoms. The models are developed at two levels of data granularity - local models, which are trained at the state level, and a single global model which is trained on the combined data aggregated across all states. Our results indicate a lower error on the local models as opposed to the global model. In addition, we also show that the most important symptoms (features) vary considerably from state to state. This work demonstrates that the models developed on crowd-sourced data, curated via online platforms, can complement the existing epidemiological surveillance infrastructure in a cost-effective manner. The code is publicly available at https://github.com/parthpatwa/Can-Self-Reported-Symptoms-Predict-Daily-COVID-19-Cases. △ Less

Submitted 21 June, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

Comments: Accepted as a full-length oral presentation at the International Workshop on Artificial Intelligence for Social Good (AI4SG), IJCAI-21

arXiv:2011.09012 [pdf, other]

RustViz: Interactively Visualizing Ownership and Borrowing

Authors: Gongming, Luo, Vishnu Reddy, Marcelo Almeida, Yingying Zhu, Ke Du, Cyrus Omar

Abstract: Rust is a systems programming language that guarantees memory safety without the need for a garbage collector by statically tracking ownership and borrowing events. The associated rules are subtle and unique among industry programming languages, which can make learning Rust more challenging. Motivated by the challenges that Rust learners face, we are develo** RustViz, a tool that allows teachers… ▽ More Rust is a systems programming language that guarantees memory safety without the need for a garbage collector by statically tracking ownership and borrowing events. The associated rules are subtle and unique among industry programming languages, which can make learning Rust more challenging. Motivated by the challenges that Rust learners face, we are develo** RustViz, a tool that allows teachers to generate an interactive timeline depicting ownership and borrowing events for each variable in a Rust code example. These visualizations are intended to help Rust learners develop an understanding of ownership and borrowing by example. This paper introduces RustViz by example, shows how teachers can use it to generate visualizations, describes learning goals, and proposes a study designed to evaluate RustViz based on these learning goals. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: 9 pages, 3 figures. Presented at HATRA 2020 (Human Aspects of Types and Reasoning Assistants)

arXiv:2010.05186 [pdf, ps, other]

On Structural Parameterizations of Load Coloring

Authors: I. Vinod Reddy

Abstract: Given a graph $G$ and a positive integer $k$, the 2-Load coloring problem is to check whether there is a $2$-coloring $f:V(G) \rightarrow \{r,b\}$ of $G$ such that for every $i \in \{r,b\}$, there are at least $k$ edges with both end vertices colored $i$. It is known that the problem is NP-complete even on special classes of graphs like regular graphs. Gutin and Jones (Inf Process Lett 114:446-449… ▽ More Given a graph $G$ and a positive integer $k$, the 2-Load coloring problem is to check whether there is a $2$-coloring $f:V(G) \rightarrow \{r,b\}$ of $G$ such that for every $i \in \{r,b\}$, there are at least $k$ edges with both end vertices colored $i$. It is known that the problem is NP-complete even on special classes of graphs like regular graphs. Gutin and Jones (Inf Process Lett 114:446-449, 2014) showed that the problem is fixed-parameter tractable by giving a kernel with at most $7k$ vertices. Barbero et al. (Algorithmica 79:211-229, 2017) obtained a kernel with less than $4k$ vertices and $O(k)$ edges, improving the earlier result. In this paper, we study the parameterized complexity of the problem with respect to structural graph parameters. We show that \lcp{} cannot be solved in time $f(w)n^{o(w)}$, unless ETH fails and it can be solved in time $n^{O(w)}$, where $n$ is the size of the input graph, $w$ is the clique-width of the graph and $f$ is an arbitrary function of $w$. Next, we consider the parameters distance to cluster graphs, distance to co-cluster graphs and distance to threshold graphs, which are weaker than the parameter clique-width and show that the problem is fixed-parameter tractable (FPT) with respect to these parameters. Finally, we show that \lcp{} is NP-complete even on bipartite graphs and split graphs. △ Less

Submitted 11 October, 2020; originally announced October 2020.

Comments: 15 pages

arXiv:2009.09223 [pdf, other]

BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition

Authors: Usman Naseem, Matloob Khushi, Vinay Reddy, Sakthivel Rajendran, Imran Razzak, **man Kim

Abstract: In recent years, with the growing amount of biomedical documents, coupled with advancement in natural language processing algorithms, the research on biomedical named entity recognition (BioNER) has increased exponentially. However, BioNER research is challenging as NER in the biomedical domain are: (i) often restricted due to limited amount of training data, (ii) an entity can refer to multiple t… ▽ More In recent years, with the growing amount of biomedical documents, coupled with advancement in natural language processing algorithms, the research on biomedical named entity recognition (BioNER) has increased exponentially. However, BioNER research is challenging as NER in the biomedical domain are: (i) often restricted due to limited amount of training data, (ii) an entity can refer to multiple types and concepts depending on its context and, (iii) heavy reliance on acronyms that are sub-domain specific. Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models trained in general corpora which often yields unsatisfactory results. We propose biomedical ALBERT (A Lite Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) bioALBERT, an effective domain-specific language model trained on large-scale biomedical corpora designed to capture biomedical context-dependent NER. We adopted a self-supervised loss used in ALBERT that focuses on modelling inter-sentence coherence to better learn context-dependent representations and incorporated parameter reduction techniques to lower memory consumption and increase the training speed in BioNER. In our experiments, BioALBERT outperformed comparative SOTA BioNER models on eight biomedical NER benchmark datasets with four different entity types. We trained four different variants of BioALBERT models which are available for the research community to be used in future research. △ Less

Submitted 19 September, 2020; originally announced September 2020.

Comments: 7 pages

arXiv:2006.16427 [pdf, other]

Biologically Inspired Mechanisms for Adversarial Robustness

Authors: Manish V. Reddy, Andrzej Banburski, Nishka Pant, Tomaso Poggio

Abstract: A convolutional neural network strongly robust to adversarial perturbations at reasonable computational and performance cost has not yet been demonstrated. The primate visual ventral stream seems to be robust to small perturbations in visual stimuli but the underlying mechanisms that give rise to this robust perception are not understood. In this work, we investigate the role of two biologically p… ▽ More A convolutional neural network strongly robust to adversarial perturbations at reasonable computational and performance cost has not yet been demonstrated. The primate visual ventral stream seems to be robust to small perturbations in visual stimuli but the underlying mechanisms that give rise to this robust perception are not understood. In this work, we investigate the role of two biologically plausible mechanisms in adversarial robustness. We demonstrate that the non-uniform sampling performed by the primate retina and the presence of multiple receptive fields with a range of receptive field sizes at each eccentricity improve the robustness of neural networks to small adversarial perturbations. We verify that these two mechanisms do not suffer from gradient obfuscation and study their contribution to adversarial robustness through ablation studies. △ Less

Submitted 29 June, 2020; originally announced June 2020.

Comments: 25 pages, 15 figures

arXiv:2006.10385 [pdf, ps, other]

Topology synthesis of a 3-kink contact-aided compliant switch

Authors: B V S Nagendra Reddy, Anupam Saxena

Abstract: A topology synthesis approach to design 2D Contact-aided Compliant Mechanisms (CCMs) to trace output paths with three or more kinks is presented. Synthesis process uses three different types of external, rigid contact surfaces: circular, elliptical and rectangular: which in combination, offer intricate local curvatures that CCMs can benefit from, to deliver desired, complex output characteristics.… ▽ More A topology synthesis approach to design 2D Contact-aided Compliant Mechanisms (CCMs) to trace output paths with three or more kinks is presented. Synthesis process uses three different types of external, rigid contact surfaces: circular, elliptical and rectangular: which in combination, offer intricate local curvatures that CCMs can benefit from, to deliver desired, complex output characteristics. A network of line elements is employed to generate topologies. A set of circular subregions is laid over this network, and external contact surfaces are generated within each subregion. Both, discrete and continuous design variables are employed: the former set controls the CCM topology, appearance and type of external contact surfaces, whereas the latter set governs shapes and sizes of the CCM constituents, and sizes of contact surfaces. All contact types are permitted with contact modeling made significantly easier through identification of outer and inner loops. Line topologies are fleshed out via a user-defined number of quadrilateral elements along lateral and longitudinal directions. Candidate CCM designs are carefully preprocessed before analysis via a commercial software and evolution using a stochastic search. The process is exemplified via a contact-aided, 3-kink mechanical switch which is thoroughly analysed in presence of friction and wear. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: 32 pages

arXiv:2002.05538 [pdf, ps, other]

Algorithmic Complexity of Isolate Secure Domination in Graphs

Authors: Jakkepalli Pavan Kumar, P. Venkata Subba Reddy

Abstract: A dominating set $S$ is an Isolate Dominating Set (IDS) if the induced subgraph $G[S]$ has at least one isolated vertex. In this paper, we initiate the study of new domination parameter called, isolate secure domination. An isolate dominating set $S\subseteq V$ is an isolate secure dominating set (ISDS), if for each vertex $u \in V \setminus S$, there exists a neighboring vertex $v$ of $u$ in $S$… ▽ More A dominating set $S$ is an Isolate Dominating Set (IDS) if the induced subgraph $G[S]$ has at least one isolated vertex. In this paper, we initiate the study of new domination parameter called, isolate secure domination. An isolate dominating set $S\subseteq V$ is an isolate secure dominating set (ISDS), if for each vertex $u \in V \setminus S$, there exists a neighboring vertex $v$ of $u$ in $S$ such that $(S \setminus \{v\}) \cup \{u\}$ is an IDS of $G$. The minimum cardinality of an ISDS of $G$ is called as an isolate secure domination number, and is denoted by $γ_{0s}(G)$. Given a graph $ G=(V,E)$ and a positive integer $ k,$ the ISDM problem is to check whether $ G $ has an isolate secure dominating set of size at most $ k.$ We prove that ISDM is NP-complete even when restricted to bipartite graphs and split graphs. We also show that ISDM can be solved in linear time for graphs of bounded tree-width. △ Less

Submitted 12 February, 2020; originally announced February 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2002.00002; text overlap with arXiv:2001.11250

MSC Class: 05C69; 68Q25

arXiv:2002.02408 [pdf, ps, other]

Algorithmic Aspects of 2-Secure Domination in Graphs

Authors: J. Pavan Kumar, P. Venkata Subba Reddy

Abstract: Let $G(V,E)$ be a simple, undirected and connected graph. A dominating set $S \subseteq V(G)$ is called a $2$-\textit{secure dominating set} ($2$-SDS) in $G$, if for every pair of distinct vertices $u_1,u_2 \in V(G)$ there exists a pair of distinct vertices $v_1,v_2 \in S$ such that $v_1 \in N[u_1]$, $v_2 \in N[u_2]$ and $(S \setminus \{v_1,v_2\}) \cup \{u_1,u_2 \}$ is a dominating set in $G$. The… ▽ More Let $G(V,E)$ be a simple, undirected and connected graph. A dominating set $S \subseteq V(G)$ is called a $2$-\textit{secure dominating set} ($2$-SDS) in $G$, if for every pair of distinct vertices $u_1,u_2 \in V(G)$ there exists a pair of distinct vertices $v_1,v_2 \in S$ such that $v_1 \in N[u_1]$, $v_2 \in N[u_2]$ and $(S \setminus \{v_1,v_2\}) \cup \{u_1,u_2 \}$ is a dominating set in $G$. The $2$\textit{-secure domination number} denoted by $γ_{2s}(G)$, equals the minimum cardinality of a $2$-SDS in $G$. Given a graph $ G$ and a positive integer $ k,$ the $ 2 $-Secure Domination ($ 2 $-SDM) problem is to check whether $ G $ has a $ 2 $-secure dominating set of size at most $ k.$ It is known that $ 2 $-SDM is NP-complete for bipartite graphs. In this paper, we prove that the $ 2 $-SDM problem is NP-complete for planar graphs and doubly chordal graphs, a subclass of chordal graphs. We strengthen the NP-complete result for bipartite graphs, by proving this problem is NP-complete for some subclasses of bipartite graphs namely, star convex bipartite, comb convex bipartite graphs. We prove that $ 2 $-SDM is linear time solvable for bounded tree-width graphs. We also show that the $ 2 $-SDM is W[2]-hard even for split graphs. The Minimum $ 2 $-Secure Dominating Set (M2SDS) problem is to find a $ 2 $-secure dominating set of minimum size in the input graph. We propose a $ Δ(G)+1 $ $ - $ approximation algorithm for M2SDS, where $ Δ(G) $ is the maximum degree of the input graph $ G $ and prove that M2SDS cannot be approximated within $ (1 - ε) \ln(| V | ) $ for any $ ε> 0 $ unless $ NP \subseteq DTIME(| V |^{ O(\log \log | V | )}) $. % even for bipartite graphs. A secure dominating set of a graph \textit{defends} one attack at any vertex of the graph. Finally, we show that the M2SDS is APX-complete for graphs with $Δ(G)=4.$ △ Less

Submitted 5 February, 2020; originally announced February 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2001.11250, arXiv:2002.00002

MSC Class: 05C69; 68Q25

arXiv:2002.00713 [pdf, ps, other]

Algorithmic Complexity of Secure Connected Domination in Graphs

Authors: Jakkepalli Pavan Kumar, P. Venkata Subba Reddy, S. Arumugam

Abstract: Let $G = (V,E)$ be a simple, undirected and connected graph. A connected (total) dominating set $S \subseteq V$ is a secure connected (total) dominating set of $G$, if for each $ u \in V \setminus S$, there exists $v \in S$ such that $uv \in E$ and $(S \setminus \lbrace v \rbrace) \cup \lbrace u \rbrace $ is a connected (total) dominating set of $G$. The minimum cardinality of a secure connected (… ▽ More Let $G = (V,E)$ be a simple, undirected and connected graph. A connected (total) dominating set $S \subseteq V$ is a secure connected (total) dominating set of $G$, if for each $ u \in V \setminus S$, there exists $v \in S$ such that $uv \in E$ and $(S \setminus \lbrace v \rbrace) \cup \lbrace u \rbrace $ is a connected (total) dominating set of $G$. The minimum cardinality of a secure connected (total) dominating set of $G$ denoted by $ γ_{sc} (G) (γ_{st}(G))$, is called the secure connected (total) domination number of $G$. In this paper, we show that the decision problems corresponding to secure connected domination number and secure total domination number are NP-complete even when restricted to split graphs or bipartite graphs. The NP-complete reductions also show that these problems are w[2]-hard. We also prove that the secure connected domination problem is linear time solvable in block graphs and threshold graphs. △ Less

Submitted 3 February, 2020; originally announced February 2020.

MSC Class: 05C69; 68Q25

arXiv:2002.00002 [pdf, ps, other]

Algorithmic Aspects of Some Variants of Domination in Graphs

Authors: Jakkepalli Pavan Kumar, P. Venkata Subba Reddy

Abstract: A set $S \subseteq V$ is a dominating set in G if for every u \in V \ S, there exists $v \in S$ such that $(u,v) \in E$, i.e., $N[S] = V$. A dominating set $S$ is an Isolate Dominating Set} (IDS) if the induced subgraph $G[S]$ has at least one isolated vertex. It is known that Isolate Domination Decision problem (IDOM) is NP-complete for bipartite graphs. In this paper, we extend this by showing t… ▽ More A set $S \subseteq V$ is a dominating set in G if for every u \in V \ S, there exists $v \in S$ such that $(u,v) \in E$, i.e., $N[S] = V$. A dominating set $S$ is an Isolate Dominating Set} (IDS) if the induced subgraph $G[S]$ has at least one isolated vertex. It is known that Isolate Domination Decision problem (IDOM) is NP-complete for bipartite graphs. In this paper, we extend this by showing that the IDOM is NP-complete for split graphs and perfect elimination bipartite graphs, a subclass of bipartite graphs. A set $S \subseteq V$ is an independent set if G[S] has no edge. A set S \subseteq V is a secure dominating set of $G$ if, for each vertex $u \in V \setminus S$, there exists a vertex $v \in S$ such that $ (u,v) \in E $ and $(S \ \{v\}) \cup \{u\}$ is a dominating set of $G$. In addition, we initiate the study of a new domination parameter called, independent secure domination. A set $S\subseteq V$ is an Independent Secure Dominating Set (InSDS) if $S$ is an independent set and a secure dominating set of $G$. The minimum size of an InSDS in $G$ is called the independent secure domination number of $G$ and is denoted by $γ_{is}(G)$. Given a graph $ G$ and a positive integer $ k,$ the InSDM problem is to check whether $ G $ has an independent secure dominating set of size at most $ k.$ We prove that InSDM is NP-complete for bipartite graphs and linear time solvable for bounded tree-width graphs and threshold graphs, a subclass of split graphs. The MInSDS problem is to find an independent secure dominating set of minimum size, in the input graph. Finally, we prove that the MInSDS problem is APX-hard for graphs with maximum degree $5.$ △ Less

Submitted 12 February, 2020; v1 submitted 30 January, 2020; originally announced February 2020.

Comments: arXiv admin note: text overlap with arXiv:2001.11250

MSC Class: 05C69; 68Q25

arXiv:2001.11250 [pdf, ps, other]

doi 10.7151/dmgt.2260

Algorithmic Aspects of Secure Connected Domination in Graphs

Authors: Jakkepalli Pavan Kumar, P. Venkata Subba Reddy

Abstract: Let $G = (V,E)$ be a simple, undirected and connected graph. A connected dominating set $S \subseteq V$ is a secure connected dominating set of $G$, if for each $ u \in V\setminus S$, there exists $v\in S$ such that $(u,v) \in E$ and the set $(S \setminus \{ v \}) \cup \{ u \} $ is a connected dominating set of $G$. The minimum size of a secure connected dominating set of $G$ denoted by… ▽ More Let $G = (V,E)$ be a simple, undirected and connected graph. A connected dominating set $S \subseteq V$ is a secure connected dominating set of $G$, if for each $ u \in V\setminus S$, there exists $v\in S$ such that $(u,v) \in E$ and the set $(S \setminus \{ v \}) \cup \{ u \} $ is a connected dominating set of $G$. The minimum size of a secure connected dominating set of $G$ denoted by $ γ_{sc} (G)$, is called the secure connected domination number of $G$. Given a graph $ G$ and a positive integer $ k,$ the Secure Connected Domination (SCDM) problem is to check whether $ G $ has a secure connected dominating set of size at most $ k.$ In this paper, we prove that the SCDM problem is NP-complete for doubly chordal graphs, a subclass of chordal graphs. We investigate the complexity of this problem for some subclasses of bipartite graphs namely, star convex bipartite, comb convex bipartite, chordal bipartite and chain graphs. The Minimum Secure Connected Dominating Set (MSCDS) problem is to find a secure connected dominating set of minimum size in the input graph. We propose a $ (Δ(G)+1) $ - approximation algorithm for MSCDS, where $ Δ(G) $ is the maximum degree of the input graph $ G $ and prove that MSCDS cannot be approximated within $ (1 -ε) ln(| V |)$ for any $ ε> 0 $ unless $ NP \subseteq DTIME(| V |^{O(log log | V |)})$ even for bipartite graphs. Finally, we show that the MSCDS is APX-complete for graphs with $Δ(G)=4$. △ Less

Submitted 30 January, 2020; originally announced January 2020.

MSC Class: 05C69; 68Q25

arXiv:1912.01130 [pdf, other]

Addict Free -- A Smart and Connected Relapse Intervention Mobile App

Authors: Zhou Yang, Vinay Jayachandra Reddy, Rashmi Kesidi, Fang **

Abstract: It is widely acknowledged that addiction relapse is highly associated with spatial-temporal factors such as some specific places or time periods. Current studies suggest that those factors can be utilized for better relapse interventions, however, there is no relapse prevention application that makes use of those factors. In this paper, we introduce a mobile app called "Addict Free", which records… ▽ More It is widely acknowledged that addiction relapse is highly associated with spatial-temporal factors such as some specific places or time periods. Current studies suggest that those factors can be utilized for better relapse interventions, however, there is no relapse prevention application that makes use of those factors. In this paper, we introduce a mobile app called "Addict Free", which records user profiles, tracks relapse history and summarizes recovering statistics to help users better understand their recovering situations. Also, this app builds a relapse recovering community, which allows users to ask for advice and encouragement, and share relapse prevention experience. Moreover, machine learning algorithms that ingest spatial and temporal factors are utilized to predict relapse, based on which helpful addiction diversion activities are recommended by a recovering recommendation algorithm. By interacting with users, this app targets at providing smart suggestions that aim to stop relapse, especially for alcohol and tobacco addiction users. △ Less

Submitted 2 December, 2019; originally announced December 2019.

Comments: 4 pages

arXiv:1910.10364 [pdf, other]

Parameterized Coloring Problems on Threshold Graphs

Authors: I. Vinod Reddy

Abstract: In this paper, we study several coloring problems on graphs from the viewpoint of parameterized complexity. We show that Precoloring Extension is fixed-parameter tractable (FPT) parameterized by distance to clique and Equitable Coloring is FPT parameterized by the distance to threshold graphs. We also study the List k-Coloring and show that the problem is NP-complete on split graphs and it is FPT… ▽ More In this paper, we study several coloring problems on graphs from the viewpoint of parameterized complexity. We show that Precoloring Extension is fixed-parameter tractable (FPT) parameterized by distance to clique and Equitable Coloring is FPT parameterized by the distance to threshold graphs. We also study the List k-Coloring and show that the problem is NP-complete on split graphs and it is FPT parameterized by solution size on split graphs. △ Less

Submitted 28 May, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

Comments: 12pages, latest version

arXiv:1908.09003 [pdf]

High Accurate Unhealthy Leaf Detection

Authors: S. Mohan Sai, G. Gopichand, C. Vikas Reddy, K. Mona Teja

Abstract: India is an agriculture-dependent country. As we all know that farming is the backbone of our country it is our responsibility to preserve the crops. However, we cannot stop the destruction of crops by natural calamities at least we have to try to protect our crops from diseases. To, detect a plant disease we need a fast automatic way. So, this paper presents a model to identify the particular dis… ▽ More India is an agriculture-dependent country. As we all know that farming is the backbone of our country it is our responsibility to preserve the crops. However, we cannot stop the destruction of crops by natural calamities at least we have to try to protect our crops from diseases. To, detect a plant disease we need a fast automatic way. So, this paper presents a model to identify the particular disease of plant leaves at early stages so that we can prevent or take a remedy to stop spreading of the disease. This proposed model is made into five sessions. Image preprocessing includes the enhancement of the low light image done using inception modules in CNN. Low-resolution image enhancement is done using an Adversarial Neural Network. This also includes Conversion of RGB Image to YCrCb color space. Next, this paper presents a methodology for image segmentation which is an important aspect for identifying the disease symptoms. This segmentation is done using the genetic algorithm. Due to this process the segmentation of the leaf Image this helps in detection of the leaf mage automatically and classifying. Texture extraction is done using the statistical model called GLCM and finally, the classification of the diseases is done using the SVM using Different Kernels with the high accuracy. △ Less

Submitted 14 August, 2019; originally announced August 2019.

Comments: Page 4, 5 with 1 figure, and page 6 with 2 figures

arXiv:1904.12580 [pdf, other]

Twitter Sentiment Analysis using Distributed Word and Sentence Representation

Authors: Dwarampudi Mahidhar Reddy, Dr. N V Subba Reddy, Dr. N V Subba Reddy

Abstract: An important part of the information gathering and data analysis is to find out what people think about, either a product or an entity. Twitter is an opinion rich social networking site. The posts or tweets from this data can be used for mining people's opinions. The recent surge of activity in this area can be attributed to the computational treatment of data, which made opinion extraction and se… ▽ More An important part of the information gathering and data analysis is to find out what people think about, either a product or an entity. Twitter is an opinion rich social networking site. The posts or tweets from this data can be used for mining people's opinions. The recent surge of activity in this area can be attributed to the computational treatment of data, which made opinion extraction and sentiment analysis easier. This paper classifies tweets into positive and negative sentiments, but instead of using traditional methods or preprocessing text data here we use the distributed representations of words and sentences to classify the tweets. We use Long Short Term Memory (LSTM) Networks, Convolutional Neural Networks (CNNs) and Artificial Neural Networks. The first two are used on Distributed Representation of words while the latter is used on the distributed representation of sentences. This paper achieves accuracies as high as 81%. It also suggests the best and optimal ways for creating distributed representations of words for sentiment analysis, out of the available methods. △ Less

Submitted 1 April, 2019; originally announced April 2019.

Comments: 8 pages, 5 figures, 6 tables

arXiv:1903.07288 [pdf, other]

Effects of padding on LSTMs and CNNs

Authors: Mahidhar Dwarampudi, N V Subba Reddy

Abstract: Long Short-Term Memory (LSTM) Networks and Convolutional Neural Networks (CNN) have become very common and are used in many fields as they were effective in solving many problems where the general neural networks were inefficient. They were applied to various problems mostly related to images and sequences. Since LSTMs and CNNs take inputs of the same length and dimension, input images and sequenc… ▽ More Long Short-Term Memory (LSTM) Networks and Convolutional Neural Networks (CNN) have become very common and are used in many fields as they were effective in solving many problems where the general neural networks were inefficient. They were applied to various problems mostly related to images and sequences. Since LSTMs and CNNs take inputs of the same length and dimension, input images and sequences are padded to maximum length while testing and training. This padding can affect the way the networks function and can make a great deal when it comes to performance and accuracies. This paper studies this and suggests the best way to pad an input sequence. This paper uses a simple sentiment analysis task for this purpose. We use the same dataset on both the networks with various padding to show the difference. This paper also discusses some preprocessing techniques done on the data to ensure effective analysis of the data. △ Less

Submitted 18 March, 2019; originally announced March 2019.

Comments: 5 pages, 5 figures, 2 tables

arXiv:1901.08875 [pdf]

Laser Communication and Coordination Control of Spacecraft Swarms

Authors: Himangshu Kalita, Leonard Vance, Vishnu Reddy, Jekan Thangavelautham

Abstract: Swarms of small spacecraft offer whole new capabilities in Earth observation, global positioning and communications compared to a large monolithic spacecraft. These small spacecrafts can provide bigger apertures that increase gain in communication antennas, increase area coverage or effective resolution of distributed cameras and enable persistent observation of ground or space targets. However, t… ▽ More Swarms of small spacecraft offer whole new capabilities in Earth observation, global positioning and communications compared to a large monolithic spacecraft. These small spacecrafts can provide bigger apertures that increase gain in communication antennas, increase area coverage or effective resolution of distributed cameras and enable persistent observation of ground or space targets. However, there remain important challenges in operating large number of spacecrafts at once. Current methods would require a large number of ground operators monitor and actively control these spacecrafts which poses challenges in terms of coordination and control which prevents the technology from scaled up in cost-effective manner. Technologies are required to enable one ground operator to manage tens if not hundreds of spacecrafts. We propose to utilize laser beams directed from the ground or from a command and control spacecraft to organize and manage a large swarm. Each satellite in the swarm will have a customized "smart skin" con-taining solar panels, power and control circuitry and an embedded secondary propulsion unit. A secondary propulsion unit may include electrospray pro-pulsion, solar radiation pressure-based system, photonic laser thrusters and Lorentz force thrusters. Solar panels typically occupy the largest surface area on an earth orbiting satellite. A laser beam from another spacecraft or from the ground would interact with solar panels of the spacecraft swarm. The laser beam would be used to select a 'leader' amongst a group of spacecrafts, set parameters for formation-flight, including separation distance, local if-then rules and coordinated changes in attitude and position. △ Less

Submitted 25 January, 2019; originally announced January 2019.

Comments: 11 pages, 10 figures, Space Traffic Management Conference 2019

arXiv:1809.02042 [pdf]

On-Orbit Smart Camera System to Observe Illuminated and Unilluminated Space Objects

Authors: Steve Morad, Ravi Teja Nallapu, Himangshu Kalita, Byon Kwon, Vishnu Reddy, Roberto Furfaro, Erik Asphaug, Jekan Thangavelautham

Abstract: The wide availability of Commercial Off-The-Shelf (COTS) electronics that can withstand Low Earth Orbit conditions has opened avenue for wide deployment of CubeSats and small-satellites. CubeSats thanks to their low developmental and launch costs offer new opportunities for rapidly demonstrating on-orbit surveillance capabilities. In our earlier work, we proposed development of SWIMSat (Space base… ▽ More The wide availability of Commercial Off-The-Shelf (COTS) electronics that can withstand Low Earth Orbit conditions has opened avenue for wide deployment of CubeSats and small-satellites. CubeSats thanks to their low developmental and launch costs offer new opportunities for rapidly demonstrating on-orbit surveillance capabilities. In our earlier work, we proposed development of SWIMSat (Space based Wide-angle Imaging of Meteors) a 3U CubeSat demonstrator that is designed to observe illuminated objects entering the Earth's atmosphere. The spacecraft would operate autonomously using a smart camera with vision algorithms to detect, track and report of objects. Several CubeSats can track an object in a coordinated fashion to pinpoint an object's trajectory. An extension of this smart camera capability is to track unilluminated objects utilizing capabilities we have been develo** to track and navigate to Near Earth Objects (NEOs). This extension enables detecting and tracking objects that can't readily be detected by humans. The system maintains a dense star map of the night sky and performs round the clock observations. Standard optical flow algorithms are used to obtain trajectories of all moving objects in the camera field of view. Through a process of elimination, certain stars maybe occluded by a transiting unilluminated object which is then used to first detect and obtain a trajectory of the object. Using multiple cameras observing the event from different points of view, it may be possible then to triangulate the position of the object in space and obtain its orbital trajectory. In this work, the performance of our space object detection algorithm coupled with a spacecraft guidance, navigation, and control system is demonstrated. △ Less

Submitted 6 September, 2018; originally announced September 2018.

Comments: 12 pages, 11 figures, appears at Advanced Maui Optical and Space Surveillance Technologies Conference 2018

arXiv:1805.02173 [pdf, other]

An Interval Type-2 Fuzzy Approach to Automatic PDF Generation for Histogram Specification

Authors: Vishal Agarwal, Diwanshu Jain, A. Vamshi Krishna Reddy, Frank Chung-Hoon Rhee

Abstract: Image enhancement plays an important role in several application in the field of computer vision and image processing. Histogram specification (HS) is one of the most widely used techniques for contrast enhancement of an image, which requires an appropriate probability density function for the transformation. In this paper, we propose a fuzzy method to find a suitable PDF automatically for histogr… ▽ More Image enhancement plays an important role in several application in the field of computer vision and image processing. Histogram specification (HS) is one of the most widely used techniques for contrast enhancement of an image, which requires an appropriate probability density function for the transformation. In this paper, we propose a fuzzy method to find a suitable PDF automatically for histogram specification using interval type - 2 (IT2) fuzzy approach, based on the fuzzy membership values obtained from the histogram of input image. The proposed algorithm works in 5 stages which includes - symmetric Gaussian fitting on the histogram, extraction of IT2 fuzzy membership functions (MFs) and therefore, footprint of uncertainty (FOU), obtaining membership value (MV), generating PDF and application of HS. We have proposed 4 different methods to find membership values - point-wise method, center of weight method, area method, and karnik-mendel (KM) method. The framework is sensitive to local variations in the histogram and chooses the best PDF so as to improve contrast enhancement. Experimental validity of the methods used is illustrated by qualitative and quantitative analysis on several images using the image quality index - Average Information Content (AIC) or Entropy, and by comparison with the commonly used algorithms such as Histogram Equalization (HE), Recursive Mean-Separate Histogram Equalization (RMSHE) and Brightness Preserving Fuzzy Histogram Equalization (BPFHE). It has been found out that on an average, our algorithm improves the AIC index by 11.5% as compared to the index obtained by histogram equalisation. △ Less

Submitted 6 May, 2018; originally announced May 2018.

arXiv:1802.06185 [pdf, other]

Building a Word Segmenter for Sanskrit Overnight

Authors: Vikas Reddy, Amrith Krishna, Vishnu Dutt Sharma, Prateek Gupta, Vineeth M R, Pawan Goyal

Abstract: There is an abundance of digitised texts available in Sanskrit. However, the word segmentation task in such texts are challenging due to the issue of 'Sandhi'. In Sandhi, words in a sentence often fuse together to form a single chunk of text, where the word delimiter vanishes and sounds at the word boundaries undergo transformations, which is also reflected in the written text. Here, we propose an… ▽ More There is an abundance of digitised texts available in Sanskrit. However, the word segmentation task in such texts are challenging due to the issue of 'Sandhi'. In Sandhi, words in a sentence often fuse together to form a single chunk of text, where the word delimiter vanishes and sounds at the word boundaries undergo transformations, which is also reflected in the written text. Here, we propose an approach that uses a deep sequence to sequence (seq2seq) model that takes only the sandhied string as the input and predicts the unsandhied string. The state of the art models are linguistically involved and have external dependencies for the lexical and morphological analysis of the input. Our model can be trained "overnight" and be used for production. In spite of the knowledge lean approach, our system preforms better than the current state of the art by gaining a percentage increase of 16.79 % than the current state of the art. △ Less

Submitted 16 February, 2018; originally announced February 2018.

Comments: The work is accepted at LREC 2018, Miyazaki, Japan

arXiv:1711.10227 [pdf, other]

On Structural Parameterizations of Firefighting

Authors: Bireswar Das, Murali Krishna Enduri, Neeldhara Misra, I. Vinod Reddy

Abstract: The Firefighting problem is defined as follows. At time $t=0$, a fire breaks out at a vertex of a graph. At each time step $t \geq 0$, a firefighter permanently defends (protects) an unburned vertex, and the fire then spread to all undefended neighbors from the vertices on fire. This process stops when the fire cannot spread anymore. The goal is to find a sequence of vertices for the firefighter t… ▽ More The Firefighting problem is defined as follows. At time $t=0$, a fire breaks out at a vertex of a graph. At each time step $t \geq 0$, a firefighter permanently defends (protects) an unburned vertex, and the fire then spread to all undefended neighbors from the vertices on fire. This process stops when the fire cannot spread anymore. The goal is to find a sequence of vertices for the firefighter that maximizes the number of saved (non burned) vertices. The Firefighting problem turns out to be NP-hard even when restricted to bipartite graphs or trees of maximum degree three. We study the parameterized complexity of the Firefighting problem for various structural parameterizations. All our parameters measure the distance to a graph class (in terms of vertex deletion) on which the firefighting problem admits a polynomial time algorithm. Specifically, for a graph class $\mathcal{F}$ and a graph $G$, a vertex subset $S$ is called a modulator to $\mathcal{F}$ if $G \setminus S$ belongs to $\mathcal{F}$. The parameters we consider are the sizes of modulators to graph classes such as threshold graphs, bounded diameter graphs, disjoint unions of stars, and split graphs. To begin with, we show that the problem is W[1]-hard when parameterized by the size of a modulator to diameter at most two graphs and split graphs. In contrast to the above intractability results, we show that Firefighting is fixed parameter tractable (FPT) when parameterized by the size of a modulator to threshold graphs and disjoint unions of stars, which are subclasses of diameter at most two graphs. We further investigate the kernelization complexity of these problems to find that firefighting admits a polynomial kernel when parameterized by the size of a modulator to a clique, while it is unlikely to admit a polynomial kernel when parameterized by the size of a modulator to a disjoint union of stars. △ Less

Submitted 28 November, 2017; originally announced November 2017.

Comments: 19 pages, To be appeared in CALDAM 2018

arXiv:1711.08885 [pdf, ps, other]

On the Parallel Parameterized Complexity of the Graph Isomorphism Problem

Authors: Bireswar Das, Murali Krishna Enduri, I. Vinod Reddy

Abstract: In this paper, we study the parallel and the space complexity of the graph isomorphism problem (\GI{}) for several parameterizations. Let $\mathcal{H}=\{H_1,H_2,\cdots,H_l\}$ be a finite set of graphs where $|V(H_i)|\leq d$ for all $i$ and for some constant $d$. Let $\mathcal{G}$ be an $\mathcal{H}$-free graph class i.e., none of the graphs $G\in \mathcal{G}$ contain any $H \in \mathcal{H}$ as an… ▽ More In this paper, we study the parallel and the space complexity of the graph isomorphism problem (\GI{}) for several parameterizations. Let $\mathcal{H}=\{H_1,H_2,\cdots,H_l\}$ be a finite set of graphs where $|V(H_i)|\leq d$ for all $i$ and for some constant $d$. Let $\mathcal{G}$ be an $\mathcal{H}$-free graph class i.e., none of the graphs $G\in \mathcal{G}$ contain any $H \in \mathcal{H}$ as an induced subgraph. We show that \GI{} parameterized by vertex deletion distance to $\mathcal{G}$ is in a parameterized version of $\AC^1$, denoted $\PL$-$\AC^1$, provided the colored graph isomorphism problem for graphs in $\mathcal{G}$ is in $\AC^1$. From this, we deduce that \GI{} parameterized by the vertex deletion distance to cographs is in $\PL$-$\AC^1$. The parallel parameterized complexity of \GI{} parameterized by the size of a feedback vertex set remains an open problem. Towards this direction we show that the graph isomorphism problem is in $\PL$-$\TC^0$ when parameterized by vertex cover or by twin-cover. Let $\mathcal{G}'$ be a graph class such that recognizing graphs from $\mathcal{G}'$ and the colored version of \GI{} for $\mathcal{G}'$ is in logspace ($Ł$). We show that \GI{} for bounded vertex deletion distance to $\mathcal{G}'$ is in $Ł$. From this, we obtain logspace algorithms for \GI{} for graphs with bounded vertex deletion distance to interval graphs and graphs with bounded vertex deletion distance to cographs. △ Less

Submitted 1 December, 2017; v1 submitted 24 November, 2017; originally announced November 2017.

MSC Class: 11Yxx; 68Qxx

arXiv:1710.00223 [pdf, other]

Parameterized Algorithms for Conflict-free Colorings of Graphs

Authors: I. Vinod Reddy

Abstract: In this paper, we study the conflict-free coloring of graphs induced by neighborhoods. A coloring of a graph is conflict-free if every vertex has a uniquely colored vertex in its neighborhood. The conflict-free coloring problem is to color the vertices of a graph using the minimum number of colors such that the coloring is conflict-free. We consider both closed neighborhoods, where the neighborhoo… ▽ More In this paper, we study the conflict-free coloring of graphs induced by neighborhoods. A coloring of a graph is conflict-free if every vertex has a uniquely colored vertex in its neighborhood. The conflict-free coloring problem is to color the vertices of a graph using the minimum number of colors such that the coloring is conflict-free. We consider both closed neighborhoods, where the neighborhood of a vertex includes itself, and open neighborhoods, where a vertex does not included in its neighborhood. We study the parameterized complexity of conflict-free closed neighborhood coloring and conflict-free open neighborhood coloring problems. We show that both problems are fixed-parameter tractable (FPT) when parameterized by the cluster vertex deletion number of the input graph. This generalizes the result of Gargano et al.(2015) that conflict-free coloring is fixed-parameter tractable parameterized by the vertex cover number. Also, we show that both problems admit an additive constant approximation algorithm when parameterized by the distance to threshold graphs. We also study the complexity of the problem on special graph classes. We show that both problems can be solved in polynomial time on cographs. For split graphs, we give a polynomial time algorithm for closed neighborhood conflict-free coloring problem, whereas we show that open neighborhood conflict-free coloring is NP-complete. We show that interval graphs can be conflict-free colored using at most four colors. △ Less

Submitted 30 September, 2017; originally announced October 2017.

Comments: appears in SOFSEM Student Research Forum 2018

arXiv:1708.03853 [pdf, ps, other]

The Parameterized Complexity of Happy Colorings

Authors: Neeldhara Misra, I. Vinod Reddy

Abstract: Consider a graph $G = (V,E)$ and a coloring $c$ of vertices with colors from $[\ell]$. A vertex $v$ is said to be happy with respect to $c$ if $c(v) = c(u)$ for all neighbors $u$ of $v$. Further, an edge $(u,v)$ is happy if $c(u) = c(v)$. Given a partial coloring $c$ of $V$, the Maximum Happy Vertex (Edge) problem asks for a total coloring of $V$ extending $c$ to all vertices of $V$ that maximises… ▽ More Consider a graph $G = (V,E)$ and a coloring $c$ of vertices with colors from $[\ell]$. A vertex $v$ is said to be happy with respect to $c$ if $c(v) = c(u)$ for all neighbors $u$ of $v$. Further, an edge $(u,v)$ is happy if $c(u) = c(v)$. Given a partial coloring $c$ of $V$, the Maximum Happy Vertex (Edge) problem asks for a total coloring of $V$ extending $c$ to all vertices of $V$ that maximises the number of happy vertices (edges). Both problems are known to be NP-hard in general even when $\ell = 3$, and is polynomially solvable when $\ell = 2$. In [IWOCA 2016] it was shown that both problems are polynomially solvable on trees, and for arbitrary $k$, it was shown that MHE is \NPH{} on planar graphs and is \FPT{} parameterized by the number of precolored vertices and branchwidth. We continue the study of this problem from a parameterized prespective. Our focus is on both structural and standard parameterizations. To begin with, we establish that the problems are \FPT{} when parameterized by the treewidth and the number of colors used in the precoloring, which is a potential improvement over the total number of precolored vertices. Further, we show that both the vertex and edge variants of the problem is \FPT{} when parameterized by vertex cover and distance-to-clique parameters. We also show that the problem of maximizing the number of happy edges is \FPT{} when parameterized by the standard parameter, the number of happy edges. We show that the maximum happy vertex (edge) problem is \NPH{} on split graphs and bipartite graphs and polynomially solvable on cographs. △ Less

Submitted 13 August, 2017; originally announced August 2017.

Comments: 16 pages, appears in IWOCA 2017

arXiv:1704.01090 [pdf, other]

doi 10.1109/ACCESS.2018.2881041

Survey Research in Software Engineering: Problems and Strategies

Authors: Ahmad Nauman Ghazi, Kai Petersen, Sri Sai Vijay Raj Reddy, Harini Nekkanti

Abstract: Background: The need for empirical investigations in software engineering is growing. Many researchers nowadays, conduct and validate their solutions using empirical research. Survey is one empirical method which enables researchers to collect data from a large population. Main aim of the survey is to generalize the findings. Aims: In this study we aim to identify the problems researchers face dur… ▽ More Background: The need for empirical investigations in software engineering is growing. Many researchers nowadays, conduct and validate their solutions using empirical research. Survey is one empirical method which enables researchers to collect data from a large population. Main aim of the survey is to generalize the findings. Aims: In this study we aim to identify the problems researchers face during survey design, and mitigation strategies. Method: A literature review as well as semi-structured interviews with nine software engineering researchers were conducted to elicit their views on problems and mitigation strategies. The researchers are all focused on empirical software engineering. Results: We identified 24 problems and 65 strategies, structured according to the survey research process. The most commonly discussed problem was sampling, in particular the ability to obtain a sufficiently large sample. To improve survey instrument design, evaluation and execution recommendations for question formulation and survey pre-testing were given. The importance of involving multiple researchers in the analysis of survey results was stressed. Conclusions: The elicited problems and strategies may serve researchers during the design of their studies. However, it was observed that some strategies were conflicting. This shows that it is important to conduct a trade-off analysis between strategies. △ Less

Submitted 4 April, 2017; originally announced April 2017.

Comments: Submitted to e-Informatica Software Engineering Journal

Showing 1–50 of 74 results for author: Reddy, V