Search | arXiv e-print repository

Hybrid Approach to Parallel Stochastic Gradient Descent

Authors: Aakash Sudhirbhai Vora, Dhrumil Chetankumar Joshi, Aksh Kantibhai Patel

Abstract: Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel. Synchronous and asynchronous approach to data parallelism is used by most systems to train the model in parallel. However, both of them have their drawbacks. We pr… ▽ More Stochastic Gradient Descent is used for large datasets to train models to reduce the training time. On top of that data parallelism is widely used as a method to efficiently train neural networks using multiple worker nodes in parallel. Synchronous and asynchronous approach to data parallelism is used by most systems to train the model in parallel. However, both of them have their drawbacks. We propose a third approach to data parallelism which is a hybrid between synchronous and asynchronous approaches, using both approaches to train the neural network. When the threshold function is selected appropriately to gradually shift all parameter aggregation from asynchronous to synchronous, we show that in a given time period our hybrid approach outperforms both asynchronous and synchronous approaches. △ Less

Submitted 27 June, 2024; originally announced July 2024.

arXiv:2406.13864 [pdf, other]

Evaluating representation learning on the protein structure universe

Authors: Arian R. Jamasb, Alex Morehead, Chaitanya K. Joshi, Zuobai Zhang, Kieran Didi, Simon V. Mathis, Charles Harris, Jian Tang, Jianlin Cheng, Pietro Lio, Tom L. Blundell

Abstract: We introduce ProteinWorkshop, a comprehensive benchmark suite for representation learning on protein structures with Geometric Graph Neural Networks. We consider large-scale pre-training and downstream tasks on both experimental and predicted structures to enable the systematic evaluation of the quality of the learned structural representation and their usefulness in capturing functional relations… ▽ More We introduce ProteinWorkshop, a comprehensive benchmark suite for representation learning on protein structures with Geometric Graph Neural Networks. We consider large-scale pre-training and downstream tasks on both experimental and predicted structures to enable the systematic evaluation of the quality of the learned structural representation and their usefulness in capturing functional relationships for downstream tasks. We find that: (1) large-scale pretraining on AlphaFold structures and auxiliary tasks consistently improve the performance of both rotation-invariant and equivariant GNNs, and (2) more expressive equivariant GNNs benefit from pretraining to a greater extent compared to invariant models. We aim to establish a common ground for the machine learning and computational biology communities to rigorously compare and advance protein structure representation learning. Our open-source codebase reduces the barrier to entry for working with large protein structure datasets by providing: (1) storage-efficient dataloaders for large-scale structural databases including AlphaFoldDB and ESM Atlas, as well as (2) utilities for constructing new tasks from the entire PDB. ProteinWorkshop is available at: github.com/a-r-j/ProteinWorkshop. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: ICLR 2024

arXiv:2406.13839 [pdf, other]

RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design

Authors: Rishabh Anand, Chaitanya K. Joshi, Alex Morehead, Arian R. Jamasb, Charles Harris, Simon V. Mathis, Kieran Didi, Bryan Hooi, Pietro Liò

Abstract: We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally fle… ▽ More We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and crop** augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self-consistency TM-score >= 0.45, at which two RNAs have the same global fold. Open-source code: https://github.com/rish-16/rna-backbone-design △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: To be presented as an Oral at ICML 2024 Structured Probabilistic Inference & Generative Modeling Workshop, and a Spotlight at ICML 2024 AI4Science Workshop

arXiv:2403.04106 [pdf]

Understanding Biology in the Age of Artificial Intelligence

Authors: Elsa Lawrence, Adham El-Shazly, Srijit Seal, Chaitanya K Joshi, Pietro Liò, Shantanu Singh, Andreas Bender, Pietro Sormanni, Matthew Greenig

Abstract: Modern life sciences research is increasingly relying on artificial intelligence approaches to model biological systems, primarily centered around the use of machine learning (ML) models. Although ML is undeniably useful for identifying patterns in large, complex data sets, its widespread application in biological sciences represents a significant deviation from traditional methods of scientific i… ▽ More Modern life sciences research is increasingly relying on artificial intelligence approaches to model biological systems, primarily centered around the use of machine learning (ML) models. Although ML is undeniably useful for identifying patterns in large, complex data sets, its widespread application in biological sciences represents a significant deviation from traditional methods of scientific inquiry. As such, the interplay between these models and scientific understanding in biology is a topic with important implications for the future of scientific research, yet it is a subject that has received little attention. Here, we draw from an epistemological toolkit to contextualize recent applications of ML in biological sciences under modern philosophical theories of understanding, identifying general principles that can guide the design and application of ML systems to model biological phenomena and advance scientific knowledge. We propose that conceptions of scientific understanding as information compression, qualitative intelligibility, and dependency relation modelling provide a useful framework for interpreting ML-mediated understanding of biological systems. Through a detailed analysis of two key application areas of ML in modern biological research - protein structure prediction and single cell RNA-sequencing - we explore how these features have thus far enabled ML systems to advance scientific understanding of their target phenomena, how they may guide the development of future ML models, and the key obstacles that remain in preventing ML from achieving its potential as a tool for biological discovery. Consideration of the epistemological features of ML applications in biology will improve the prospects of these methods to solve important problems and advance scientific understanding of living systems. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2312.07511 [pdf, other]

A Hitchhiker's Guide to Geometric GNNs for 3D Atomic Systems

Authors: Alexandre Duval, Simon V. Mathis, Chaitanya K. Joshi, Victor Schmidt, Santiago Miret, Fragkiskos D. Malliaros, Taco Cohen, Pietro Liò, Yoshua Bengio, Michael Bronstein

Abstract: Recent advances in computational modelling of atomic systems, spanning molecules, proteins, and materials, represent them as geometric graphs with atoms embedded as nodes in 3D Euclidean space. In these graphs, the geometric attributes transform according to the inherent physical symmetries of 3D atomic systems, including rotations and translations in Euclidean space, as well as node permutations.… ▽ More Recent advances in computational modelling of atomic systems, spanning molecules, proteins, and materials, represent them as geometric graphs with atoms embedded as nodes in 3D Euclidean space. In these graphs, the geometric attributes transform according to the inherent physical symmetries of 3D atomic systems, including rotations and translations in Euclidean space, as well as node permutations. In recent years, Geometric Graph Neural Networks have emerged as the preferred machine learning architecture powering applications ranging from protein structure prediction to molecular simulations and material generation. Their specificity lies in the inductive biases they leverage - such as physical symmetries and chemical properties - to learn informative representations of these geometric graphs. In this opinionated paper, we provide a comprehensive and self-contained overview of the field of Geometric GNNs for 3D atomic systems. We cover fundamental background material and introduce a pedagogical taxonomy of Geometric GNN architectures: (1) invariant networks, (2) equivariant networks in Cartesian basis, (3) equivariant networks in spherical basis, and (4) unconstrained networks. Additionally, we outline key datasets and application areas and suggest future research directions. The objective of this work is to present a structured perspective on the field, making it accessible to newcomers and aiding practitioners in gaining an intuition for its mathematical abstractions. △ Less

Submitted 13 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2307.08423 [pdf, other]

Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems

Authors: Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, Keir Adams, Maurice Weiler, Xiner Li, Tianfan Fu, Yucheng Wang, Haiyang Yu, YuQing Xie, Xiang Fu, Alex Strasser, Shenglong Xu, Yi Liu, Yuanqi Du, Alexandra Saxton, Hongyi Ling, Hannah Lawrence , et al. (38 additional authors not shown)

Abstract: Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Sc… ▽ More Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural sciences. Today, AI has started to advance natural sciences by improving, accelerating, and enabling our understanding of natural phenomena at a wide range of spatial and temporal scales, giving rise to a new area of research known as AI for science (AI4Science). Being an emerging research paradigm, AI4Science is unique in that it is an enormous and highly interdisciplinary area. Thus, a unified and technical treatment of this field is needed yet challenging. This work aims to provide a technically thorough account of a subarea of AI4Science; namely, AI for quantum, atomistic, and continuum systems. These areas aim at understanding the physical world from the subatomic (wavefunctions and electron density), atomic (molecules, proteins, materials, and interactions), to macro (fluids, climate, and subsurface) scales and form an important subarea of AI4Science. A unique advantage of focusing on these areas is that they largely share a common set of challenges, thereby allowing a unified and foundational treatment. A key common challenge is how to capture physics first principles, especially symmetries, in natural systems by deep learning methods. We provide an in-depth yet intuitive account of techniques to achieve equivariance to symmetry transformations. We also discuss other common technical challenges, including explainability, out-of-distribution generalization, knowledge transfer with foundation and large language models, and uncertainty quantification. To facilitate learning and education, we provide categorized lists of resources that we found to be useful. We strive to be thorough and unified and hope this initial effort may trigger more community interests and efforts to further advance AI4Science. △ Less

Submitted 15 November, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

arXiv:2305.19207 [pdf, other]

Group Invariant Global Pooling

Authors: Kamil Bujel, Yonatan Gideoni, Chaitanya K. Joshi, Pietro Liò

Abstract: Much work has been devoted to devising architectures that build group-equivariant representations, while invariance is often induced using simple global pooling mechanisms. Little work has been done on creating expressive layers that are invariant to given symmetries, despite the success of permutation invariant pooling in various molecular tasks. In this work, we present Group Invariant Global Po… ▽ More Much work has been devoted to devising architectures that build group-equivariant representations, while invariance is often induced using simple global pooling mechanisms. Little work has been done on creating expressive layers that are invariant to given symmetries, despite the success of permutation invariant pooling in various molecular tasks. In this work, we present Group Invariant Global Pooling (GIGP), an invariant pooling layer that is provably sufficiently expressive to represent a large class of invariant functions. We validate GIGP on rotated MNIST and QM9, showing improvements for the latter while attaining identical results for the former. By making the pooling process group orbit-aware, this invariant aggregation method leads to improved performance, while performing well-principled group aggregation. △ Less

Submitted 30 May, 2023; originally announced May 2023.

arXiv:2305.16275 [pdf, other]

CENSUS-HWR: a large training dataset for offline handwriting recognition

Authors: Chetan Joshi, Lawry Sorenson, Ammon Wolfert, Dr. Mark Clement, Dr. Joseph Price, Dr. Kasey Buckles

Abstract: Progress in Automated Handwriting Recognition has been hampered by the lack of large training datasets. Nearly all research uses a set of small datasets that often cause models to overfit. We present CENSUS-HWR, a new dataset consisting of full English handwritten words in 1,812,014 gray scale images. A total of 1,865,134 handwritten texts from a vocabulary of 10,711 words in the English language… ▽ More Progress in Automated Handwriting Recognition has been hampered by the lack of large training datasets. Nearly all research uses a set of small datasets that often cause models to overfit. We present CENSUS-HWR, a new dataset consisting of full English handwritten words in 1,812,014 gray scale images. A total of 1,865,134 handwritten texts from a vocabulary of 10,711 words in the English language are present in this collection. This dataset is intended to serve handwriting models as a benchmark for deep learning algorithms. This huge English handwriting recognition dataset has been extracted from the US 1930 and 1940 censuses taken by approximately 70,000 enumerators each year. The dataset and the trained model with their weights are freely available to download at https://censustree.org/data.html. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.14749 [pdf, other]

gRNAde: Geometric Deep Learning for 3D RNA inverse design

Authors: Chaitanya K. Joshi, Arian R. Jamasb, Ramon Viñas, Charles Harris, Simon V. Mathis, Alex Morehead, Rishabh Anand, Pietro Liò

Abstract: Computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D geometry and conformational diversity. We introduce gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. Under the hood, gRNAde is a mul… ▽ More Computational RNA design tasks are often posed as inverse problems, where sequences are designed based on adopting a single desired secondary structure without considering 3D geometry and conformational diversity. We introduce gRNAde, a geometric RNA design pipeline operating on 3D RNA backbones to design sequences that explicitly account for structure and dynamics. Under the hood, gRNAde is a multi-state Graph Neural Network that generates candidate RNA sequences conditioned on one or more 3D backbone structures where the identities of the bases are unknown. On a single-state fixed backbone re-design benchmark of 14 RNA structures from the PDB identified by Das et al. [2010], gRNAde obtains higher native sequence recovery rates (56% on average) compared to Rosetta (45% on average), taking under a second to produce designs compared to the reported hours for Rosetta. We further demonstrate the utility of gRNAde on a new benchmark of multi-state design for structurally flexible RNAs, as well as zero-shot ranking of mutational fitness landscapes in a retrospective analysis of a recent RNA polymerase ribozyme structure. Open source code: https://github.com/chaitjo/geometric-rna-design △ Less

Submitted 25 May, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Previously titled 'Multi-State RNA Design with Geometric Multi-Graph Neural Networks', presented at ICML 2023 Computational Biology Workshop

arXiv:2301.09308 [pdf, other]

On the Expressive Power of Geometric Graph Neural Networks

Authors: Chaitanya K. Joshi, Cristian Bodnar, Simon V. Mathis, Taco Cohen, Pietro Liò

Abstract: The expressive power of Graph Neural Networks (GNNs) has been studied extensively through the Weisfeiler-Leman (WL) graph isomorphism test. However, standard GNNs and the WL framework are inapplicable for geometric graphs embedded in Euclidean space, such as biomolecules, materials, and other physical systems. In this work, we propose a geometric version of the WL test (GWL) for discriminating geo… ▽ More The expressive power of Graph Neural Networks (GNNs) has been studied extensively through the Weisfeiler-Leman (WL) graph isomorphism test. However, standard GNNs and the WL framework are inapplicable for geometric graphs embedded in Euclidean space, such as biomolecules, materials, and other physical systems. In this work, we propose a geometric version of the WL test (GWL) for discriminating geometric graphs while respecting the underlying physical symmetries: permutations, rotation, reflection, and translation. We use GWL to characterise the expressive power of geometric GNNs that are invariant or equivariant to physical symmetries in terms of distinguishing geometric graphs. GWL unpacks how key design choices influence geometric GNN expressivity: (1) Invariant layers have limited expressivity as they cannot distinguish one-hop identical geometric graphs; (2) Equivariant layers distinguish a larger class of graphs by propagating geometric information beyond local neighbourhoods; (3) Higher order tensors and scalarisation enable maximally powerful geometric GNNs; and (4) GWL's discrimination-based perspective is equivalent to universal approximation. Synthetic experiments supplementing our results are available at \url{https://github.com/chaitjo/geometric-gnn-dojo} △ Less

Submitted 3 March, 2024; v1 submitted 23 January, 2023; originally announced January 2023.

Comments: ICML 2023

Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:15330-15355, 2023

arXiv:2111.04964 [pdf, other]

doi 10.1109/TNNLS.2022.3223018

On Representation Knowledge Distillation for Graph Neural Networks

Authors: Chaitanya K. Joshi, Fayao Liu, Xu Xun, Jie Lin, Chuan-Sheng Foo

Abstract: Knowledge distillation is a learning paradigm for boosting resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the Local Structure Preserving loss (LSP), which matches local structural relationships defined over edges across the student and teacher's node embeddings. This paper studies whether preserving t… ▽ More Knowledge distillation is a learning paradigm for boosting resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the Local Structure Preserving loss (LSP), which matches local structural relationships defined over edges across the student and teacher's node embeddings. This paper studies whether preserving the global topology of how the teacher embeds graph data can be a more effective distillation objective for GNNs, as real-world graphs often contain latent interactions and noisy edges. We propose Graph Contrastive Representation Distillation (G-CRD), which uses contrastive learning to implicitly preserve global topology by aligning the student node embeddings to those of the teacher in a shared representation space. Additionally, we introduce an expanded set of benchmarks on large-scale real-world datasets where the performance gap between teacher and student GNNs is non-negligible. Experiments across 4 datasets and 14 heterogeneous GNN architectures show that G-CRD consistently boosts the performance and robustness of lightweight GNNs, outperforming LSP (and a global structure preserving variant of LSP) as well as baselines from 2D computer vision. An analysis of the representational similarity among teacher and student embedding spaces reveals that G-CRD balances preserving local and global relationships, while structure preserving approaches are best at preserving one or the other. Our code is available at https://github.com/chaitjo/efficient-gnns △ Less

Submitted 4 February, 2023; v1 submitted 9 November, 2021; originally announced November 2021.

Comments: IEEE Transactions on Neural Networks and Learning Representation (TNNLS), Special Issue on Deep Neural Networks for Graphs: Theory, Models, Algorithms and Applications

arXiv:2109.09808 [pdf, other]

Integrated Construction of Multimodal Atlases with Structural Connectomes in the Space of Riemannian Metrics

Authors: Kristen M. Campbell, Haocheng Dai, Zhe Su, Martin Bauer, P. Thomas Fletcher, Sarang C. Joshi

Abstract: The structural network of the brain, or structural connectome, can be represented by fiber bundles generated by a variety of tractography methods. While such methods give qualitative insights into brain structure, there is controversy over whether they can provide quantitative information, especially at the population level. In order to enable population-level statistical analysis of the structura… ▽ More The structural network of the brain, or structural connectome, can be represented by fiber bundles generated by a variety of tractography methods. While such methods give qualitative insights into brain structure, there is controversy over whether they can provide quantitative information, especially at the population level. In order to enable population-level statistical analysis of the structural connectome, we propose representing a connectome as a Riemannian metric, which is a point on an infinite-dimensional manifold. We equip this manifold with the Ebin metric, a natural metric structure for this space, to get a Riemannian manifold along with its associated geometric properties. We then use this Riemannian framework to apply object-oriented statistical analysis to define an atlas as the Fréchet mean of a population of Riemannian metrics. This formulation ties into the existing framework for diffeomorphic construction of image atlases, allowing us to construct a multimodal atlas by simultaneously integrating complementary white matter structure details from DWMRI and cortical details from T1-weighted MRI. We illustrate our framework with 2D data examples of connectome registration and atlas formation. Finally, we build an example 3D multimodal atlas using T1 images and connectomes derived from diffusion tensors estimated from a subset of subjects from the Human Connectome Project. △ Less

Submitted 13 June, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://www.melba-journal.org/papers/2022:016.html. arXiv admin note: substantial text overlap with arXiv:2103.05730

arXiv:2108.02104 [pdf, other]

Point Discriminative Learning for Data-efficient 3D Point Cloud Analysis

Authors: Fayao Liu, Guosheng Lin, Chuan-Sheng Foo, Chaitanya K. Joshi, Jie Lin

Abstract: 3D point cloud analysis has drawn a lot of research attention due to its wide applications. However, collecting massive labelled 3D point cloud data is both time-consuming and labor-intensive. This calls for data-efficient learning methods. In this work we propose PointDisc, a point discriminative learning method to leverage self-supervisions for data-efficient 3D point cloud classification and se… ▽ More 3D point cloud analysis has drawn a lot of research attention due to its wide applications. However, collecting massive labelled 3D point cloud data is both time-consuming and labor-intensive. This calls for data-efficient learning methods. In this work we propose PointDisc, a point discriminative learning method to leverage self-supervisions for data-efficient 3D point cloud classification and segmentation. PointDisc imposes a novel point discrimination loss on the middle and global level features produced by the backbone network. This point discrimination loss enforces learned features to be consistent with points belonging to the corresponding local shape region and inconsistent with randomly sampled noisy points. We conduct extensive experiments on 3D object classification, 3D semantic and part segmentation, showing the benefits of PointDisc for data-efficient learning. Detailed analysis demonstrate that PointDisc learns unsupervised features that well capture local and global geometry. △ Less

Submitted 20 January, 2023; v1 submitted 4 August, 2021; originally announced August 2021.

Comments: This work is published in 3DV 2022

arXiv:2107.11737 [pdf]

Mathematical Modeling of Heat Conduction

Authors: Abdul Aziz Momin, Nikhil Shende, Abhijna Anamtatmakula, Emily Ganguly, Ashwin Gurbani, Chaitanya A Joshi, Yogesh Y Mahajan

Abstract: This report describes a mathematical model of heat conduction. The differential equation for heat conduction in one dimensional rod has been derived. The explicit finite difference numerical method is used to solve this differential equation. Then for simulation, a code was written in using python libraries via Jupyter notebook. The simulation carried out for Aluminum, Copper and Mild Steel rods a… ▽ More This report describes a mathematical model of heat conduction. The differential equation for heat conduction in one dimensional rod has been derived. The explicit finite difference numerical method is used to solve this differential equation. Then for simulation, a code was written in using python libraries via Jupyter notebook. The simulation carried out for Aluminum, Copper and Mild Steel rods and results were discussed. △ Less

Submitted 25 July, 2021; originally announced July 2021.

Comments: 8 pages, 9 figures, 3 tables, IMRSE-2021(https://www.imrse2021.com/)

arXiv:2103.05730 [pdf, other]

Structural Connectome Atlas Construction in the Space of Riemannian Metrics

Authors: Kristen M. Campbell, Haocheng Dai, Zhe Su, Martin Bauer, P. Thomas Fletcher, Sarang C. Joshi

Abstract: The structural connectome is often represented by fiber bundles generated from various types of tractography. We propose a method of analyzing connectomes by representing them as a Riemannian metric, thereby viewing them as points in an infinite-dimensional manifold. After equip** this space with a natural metric structure, the Ebin metric, we apply object-oriented statistical analysis to define… ▽ More The structural connectome is often represented by fiber bundles generated from various types of tractography. We propose a method of analyzing connectomes by representing them as a Riemannian metric, thereby viewing them as points in an infinite-dimensional manifold. After equip** this space with a natural metric structure, the Ebin metric, we apply object-oriented statistical analysis to define an atlas as the Fréchet mean of a population of Riemannian metrics. We demonstrate connectome registration and atlas formation using connectomes derived from diffusion tensors estimated from a subset of subjects from the Human Connectome Project. △ Less

Submitted 9 March, 2021; originally announced March 2021.

Comments: 12 pages, 3 figures

arXiv:2010.00519 [pdf, ps, other]

Performance of Intelligent Reconfigurable Surface-Based Wireless Communications Using QAM Signaling

Authors: Dharmendra Dixit, Kishor Chandra Joshi, Sanjeev Sharma

Abstract: Intelligent reconfigurable surface (IRS) is being seen as a promising technology for 6G wireless networks. The IRS can reconfigure the wireless propagation environment, which results in significant performance improvement of wireless communications. In this paper, we analyze the performance of bandwidth-efficient quadrature amplitude modulation (QAM) techniques for IRS-assisted wireless communicat… ▽ More Intelligent reconfigurable surface (IRS) is being seen as a promising technology for 6G wireless networks. The IRS can reconfigure the wireless propagation environment, which results in significant performance improvement of wireless communications. In this paper, we analyze the performance of bandwidth-efficient quadrature amplitude modulation (QAM) techniques for IRS-assisted wireless communications over Rayleigh fading channels. New closed-form expressions of the generic average symbol error rate (ASER) for rectangular QAM, square QAM and cross QAM schemes are derived. Moreover, simplified expressions of the ASER for low signal-to-noise-ratio (SNR) and high SNR regions are also presented, which are useful to provide insights analytically. We comprehensively analyze the impact of modulation parameters and the number of IRS elements employed. We also verify our theoretical results through simulations. Our results demonstrate that employing IRS significantly enhances the ASER performance in comparison to additive white Gaussian noise channel at a low SNR regime. Thus, IRS-assisted wireless communications can be a promising candidate for various low powered communication applications such as internet-of-things (IoT). △ Less

Submitted 26 September, 2020; originally announced October 2020.

Comments: 18 pages, 5 fiures

ACM Class: F.2.2; I.2.7

arXiv:2006.07054 [pdf, other]

doi 10.4230/LIPIcs.CP.2021.33

Learning the Travelling Salesperson Problem Requires Rethinking Generalization

Authors: Chaitanya K. Joshi, Quentin Cappart, Louis-Martin Rousseau, Thomas Laurent

Abstract: End-to-end training of neural network solvers for graph combinatorial optimization problems such as the Travelling Salesperson Problem (TSP) have seen a surge of interest recently, but remain intractable and inefficient beyond graphs with few hundreds of nodes. While state-of-the-art learning-driven approaches for TSP perform closely to classical solvers when trained on trivially small sizes, they… ▽ More End-to-end training of neural network solvers for graph combinatorial optimization problems such as the Travelling Salesperson Problem (TSP) have seen a surge of interest recently, but remain intractable and inefficient beyond graphs with few hundreds of nodes. While state-of-the-art learning-driven approaches for TSP perform closely to classical solvers when trained on trivially small sizes, they are unable to generalize the learnt policy to larger instances at practical scales. This work presents an end-to-end neural combinatorial optimization pipeline that unifies several recent papers in order to identify the inductive biases, model architectures and learning algorithms that promote generalization to instances larger than those seen in training. Our controlled experiments provide the first principled investigation into such zero-shot generalization, revealing that extrapolating beyond training data requires rethinking the neural combinatorial optimization pipeline, from network layers and learning paradigms to evaluation protocols. Additionally, we analyze recent advances in deep learning for routing problems through the lens of our pipeline and provide new directions to stimulate future research. △ Less

Submitted 25 May, 2022; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: Accepted to the 27th International Conference on Principles and Practice of Constraint Programming (CP 2021) and Constraints (2022). Code and data available at https://github.com/chaitjo/learning-tsp

arXiv:2004.06378 [pdf]

Various Secure Routing Schemes for MANETs: A Survey

Authors: Priya R. Soni, Charmi A. Joshi, Dhwani R. Bhadra, Nikita P. Vyas, Rutvij H. Jhaveri

Abstract: MANET is an infrastructure less as well as self configuring network consisting of mobile nodes communicating with each other using radio medium. Its exclusive properties such as dynamic topology, decentralization, and wireless medium make MANET to become very unique network amongst other traditional networks, thereby determining security to be a major challenge. In this paper, we have carried out… ▽ More MANET is an infrastructure less as well as self configuring network consisting of mobile nodes communicating with each other using radio medium. Its exclusive properties such as dynamic topology, decentralization, and wireless medium make MANET to become very unique network amongst other traditional networks, thereby determining security to be a major challenge. In this paper, we have carried out the survey of various security approaches of Mobile Adhoc Networks and provide a comprehensive study regarding it. We have focused our work on three approaches such as Bayesian watch dog, Trust based systems, and Ant colony optimization. In wireless perspective, security is a crucial term to handle. Therefore it becomes necessary when we are concerning our work with Mobile Adhoc Network. △ Less

Submitted 14 April, 2020; originally announced April 2020.

arXiv:2003.02978 [pdf, other]

doi 10.1109/TGRS.2020.2976888

Fast and Accurate Retrieval of Methane Concentration from Imaging Spectrometer Data Using Sparsity Prior

Authors: Markus D. Foote, Philip E. Dennison, Andrew K. Thorpe, David R. Thompson, Siraput Jongaramrungruang, Christian Frankenberg, Sarang C. Joshi

Abstract: The strong radiative forcing by atmospheric methane has stimulated interest in identifying natural and anthropogenic sources of this potent greenhouse gas. Point sources are important targets for quantification, and anthropogenic targets have potential for emissions reduction. Methane point source plume detection and concentration retrieval have been previously demonstrated using data from the Air… ▽ More The strong radiative forcing by atmospheric methane has stimulated interest in identifying natural and anthropogenic sources of this potent greenhouse gas. Point sources are important targets for quantification, and anthropogenic targets have potential for emissions reduction. Methane point source plume detection and concentration retrieval have been previously demonstrated using data from the Airborne Visible InfraRed Imaging Spectrometer Next Generation (AVIRIS-NG). Current quantitative methods have tradeoffs between computational requirements and retrieval accuracy, creating obstacles for processing real-time data or large datasets from flight campaigns. We present a new computationally efficient algorithm that applies sparsity and an albedo correction to matched filter retrieval of trace gas concentration-pathlength. The new algorithm was tested using AVIRIS-NG data acquired over several point source plumes in Ahmedabad, India. The algorithm was validated using simulated AVIRIS-NG data including synthetic plumes of known methane concentration. Sparsity and albedo correction together reduced the root mean squared error of retrieved methane concentration-pathlength enhancement by 60.7% compared with a previous robust matched filter method. Background noise was reduced by a factor of 2.64. The new algorithm was able to process the entire 300 flightline 2016 AVIRIS-NG India campaign in just over 8 hours on a desktop computer with GPU acceleration. △ Less

Submitted 5 March, 2020; originally announced March 2020.

Comments: 13 pages, 11 figures

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, 2020, pp. 1-13

arXiv:2003.00982 [pdf, other]

Benchmarking Graph Neural Networks

Authors: Vijay Prakash Dwivedi, Chaitanya K. Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, Xavier Bresson

Abstract: In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. This emerging field has witnessed an extensive growth of promising techniques that have been applied with success to computer science, mathematics, biology, physics and chemistry. But for any successful field to become mainstream and reliable, benchmarks must be deve… ▽ More In the last few years, graph neural networks (GNNs) have become the standard toolkit for analyzing and learning from data on graphs. This emerging field has witnessed an extensive growth of promising techniques that have been applied with success to computer science, mathematics, biology, physics and chemistry. But for any successful field to become mainstream and reliable, benchmarks must be developed to quantify progress. This led us in March 2020 to release a benchmark framework that i) comprises of a diverse collection of mathematical and real-world graphs, ii) enables fair model comparison with the same parameter budget to identify key architectures, iii) has an open-source, easy-to-use and reproducible code infrastructure, and iv) is flexible for researchers to experiment with new theoretical ideas. As of December 2022, the GitHub repository has reached 2,000 stars and 380 forks, which demonstrates the utility of the proposed open-source framework through the wide usage by the GNN community. In this paper, we present an updated version of our benchmark with a concise presentation of the aforementioned framework characteristics, an additional medium-sized molecular dataset AQSOL, similar to the popular ZINC, but with a real-world measured chemical target, and discuss how this framework can be leveraged to explore new GNN designs and insights. As a proof of value of our benchmark, we study the case of graph positional encoding (PE) in GNNs, which was introduced with this benchmark and has since spurred interest of exploring more powerful PE for Transformers and GNNs in a robust experimental setting. △ Less

Submitted 27 December, 2022; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: Benchmarking framework on GitHub at https://github.com/graphdeeplearning/benchmarking-gnns

Journal ref: Journal of Machine Learning Research (JMLR), 2022

arXiv:1912.11258 [pdf, other]

Multi-Graph Transformer for Free-Hand Sketch Recognition

Authors: Peng Xu, Chaitanya K. Joshi, Xavier Bresson

Abstract: Learning meaningful representations of free-hand sketches remains a challenging task given the signal sparsity and the high-level abstraction of sketches. Existing techniques have focused on exploiting either the static nature of sketches with Convolutional Neural Networks (CNNs) or the temporal sequential property with Recurrent Neural Networks (RNNs). In this work, we propose a new representatio… ▽ More Learning meaningful representations of free-hand sketches remains a challenging task given the signal sparsity and the high-level abstraction of sketches. Existing techniques have focused on exploiting either the static nature of sketches with Convolutional Neural Networks (CNNs) or the temporal sequential property with Recurrent Neural Networks (RNNs). In this work, we propose a new representation of sketches as multiple sparsely connected graphs. We design a novel Graph Neural Network (GNN), the Multi-Graph Transformer (MGT), for learning representations of sketches from multiple graphs which simultaneously capture global and local geometric stroke structures, as well as temporal information. We report extensive numerical experiments on a sketch recognition task to demonstrate the performance of the proposed approach. Particularly, MGT applied on 414k sketches from Google QuickDraw: (i) achieves small recognition gap to the CNN-based performance upper bound (72.80% vs. 74.22%), and (ii) outperforms all RNN-based models by a significant margin. To the best of our knowledge, this is the first work proposing to represent sketches as graphs and apply GNNs for sketch recognition. Code and trained models are available at https://github.com/PengBoXiangShang/multigraph_transformer. △ Less

Submitted 25 March, 2021; v1 submitted 24 December, 2019; originally announced December 2019.

Comments: This paper has been accepted by IEEE TNNLS

arXiv:1911.09945 [pdf, other]

Insider threat modeling: An adversarial risk analysis approach

Authors: Chaitanya Joshi, David Rios Insua, Jesus Rios

Abstract: Insider threats entail major security issues in geopolitics, cyber risk management and business organization. The game theoretic models proposed so far do not take into account some important factors such as the organisational culture and whether the attacker was detected or not. They also fail to model the defensive mechanisms already put in place by an organisation to mitigate an insider attack.… ▽ More Insider threats entail major security issues in geopolitics, cyber risk management and business organization. The game theoretic models proposed so far do not take into account some important factors such as the organisational culture and whether the attacker was detected or not. They also fail to model the defensive mechanisms already put in place by an organisation to mitigate an insider attack. We propose two new models which incorporate these settings and hence are more realistic. %Most earlier work in the field has focused on %standard game theoretic approaches to find the solutions. We use the adversarial risk analysis (ARA) approach to find the solution to our models. ARA does not assume common knowledge and solves the problem from the point of view of one of the players, taking into account their knowledge and uncertainties regarding the choices available to them, to their adversaries, the possible outcomes, their utilities and their opponents' utilities. Our models and the ARA solutions are general and can be applied to most insider threat scenarios. A data security example illustrates the discussion. △ Less

Submitted 22 November, 2019; originally announced November 2019.

MSC Class: 91A40; 62C10

arXiv:1911.09851 [pdf, ps, other]

Adversarial Risk Analysis for First-Price Sealed-Bid Auctions

Authors: Muhammad Ejaz, Chaitanya Joshi, Stephen Joe

Abstract: Adversarial Risk Analysis (ARA) is an upcoming methodology that is considered to have advantages over the traditional decision theoretic and game theoretic approaches. ARA solutions for first-price sealed-bid (FPSB) auctions have been found but only under strong assumptions which make the model somewhat unrealistic. In this paper, we use ARA methodology to model FPSB auctions using more realistic… ▽ More Adversarial Risk Analysis (ARA) is an upcoming methodology that is considered to have advantages over the traditional decision theoretic and game theoretic approaches. ARA solutions for first-price sealed-bid (FPSB) auctions have been found but only under strong assumptions which make the model somewhat unrealistic. In this paper, we use ARA methodology to model FPSB auctions using more realistic assumptions. We define a new utility function that considers bidders' wealth, we assume a reserve price and find solutions not only for risk-neutral but also for risk-averse as well as risk-seeking bidders. We model the problem using ARA for non-strategic play and level-k thinking solution concepts. △ Less

Submitted 18 March, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

Comments: 32 pages, 4 figures, 9 tables

arXiv:1910.10769 [pdf, other]

doi 10.1109/TBME.2020.3024826

Learning Multiparametric Biomarkers for Assessing MR-Guided Focused Ultrasound Treatment of Malignant Tumors

Authors: Blake E. Zimmerman, Sara Johnson, Henrik Odéen, Jill Shea, Markus D. Foote, Nicole Winkler, Sarang C. Joshi, Allison Payne

Abstract: Noninvasive MR-guided focused ultrasound (MRgFUS) treatments are promising alternatives to the surgical removal of malignant tumors. A significant challenge is assessing the viability of treated tissue during and immediately after MRgFUS procedures. Current clinical assessment uses the nonperfused volume (NPV) biomarker immediately after treatment from contrast-enhanced MRI. The NPV has variable a… ▽ More Noninvasive MR-guided focused ultrasound (MRgFUS) treatments are promising alternatives to the surgical removal of malignant tumors. A significant challenge is assessing the viability of treated tissue during and immediately after MRgFUS procedures. Current clinical assessment uses the nonperfused volume (NPV) biomarker immediately after treatment from contrast-enhanced MRI. The NPV has variable accuracy, and the use of contrast agent prevents continuing MRgFUS treatment if tumor coverage is inadequate. This work presents a novel, noncontrast, learned multiparametric MR biomarker that can be used during treatment for intratreatment assessment, validated in a VX2 rabbit tumor model. A deep convolutional neural network was trained on noncontrast multiparametric MR images using the NPV biomarker from follow-up MR imaging (3-5 days after MRgFUS treatment) as the accurate label of nonviable tissue. A novel volume-conserving registration algorithm yielded a voxel-wise correlation between treatment and follow-up NPV, providing a rigorous validation of the biomarker. The learned noncontrast multiparametric MR biomarker predicted the follow-up NPV with an average DICE coefficient of 0.71, substantially outperforming the current clinical standard (DICE coefficient = 0.53). Noncontrast multiparametric MR imaging integrated with a deep convolutional neural network provides a more accurate prediction of MRgFUS treatment outcome than current contrast-based techniques. △ Less

Submitted 29 September, 2020; v1 submitted 23 October, 2019; originally announced October 2019.

Comments: 11 pages, 12 figures

arXiv:1910.07210 [pdf, other]

On Learning Paradigms for the Travelling Salesman Problem

Authors: Chaitanya K. Joshi, Thomas Laurent, Xavier Bresson

Abstract: We explore the impact of learning paradigms on training deep neural networks for the Travelling Salesman Problem. We design controlled experiments to train supervised learning (SL) and reinforcement learning (RL) models on fixed graph sizes up to 100 nodes, and evaluate them on variable sized graphs up to 500 nodes. Beyond not needing labelled data, our results reveal favorable properties of RL ov… ▽ More We explore the impact of learning paradigms on training deep neural networks for the Travelling Salesman Problem. We design controlled experiments to train supervised learning (SL) and reinforcement learning (RL) models on fixed graph sizes up to 100 nodes, and evaluate them on variable sized graphs up to 500 nodes. Beyond not needing labelled data, our results reveal favorable properties of RL over SL: RL training leads to better emergent generalization to variable graph sizes and is a key component for learning scale-invariant solvers for novel combinatorial problems. △ Less

Submitted 31 October, 2019; v1 submitted 16 October, 2019; originally announced October 2019.

Comments: Presented at the NeurIPS 2019 Graph Representation Learning Workshop

arXiv:1909.05530 [pdf, ps, other]

Reinforcing Edge Computing with Multipath TCP Enabled Mobile Device Clouds

Authors: Venkatraman Balasubramanian, Kees Kroep, Kishor Chandra Joshi, R. Venkatesha Prasad

Abstract: In recent years, enormous growth has been witnessed in the computational and storage capabilities of mobile devices. However, much of this computational and storage capabilities are not always fully used. On the other hand, popularity of mobile edge computing which aims to replace the traditional centralized powerful cloud with multiple edge servers is rapidly growing. In particular, applications… ▽ More In recent years, enormous growth has been witnessed in the computational and storage capabilities of mobile devices. However, much of this computational and storage capabilities are not always fully used. On the other hand, popularity of mobile edge computing which aims to replace the traditional centralized powerful cloud with multiple edge servers is rapidly growing. In particular, applications having strict latency requirements can be best served by the mobile edge clouds due to a reduced round-trip delay. In this paper we propose a Multi-Path TCP (MPTCP) enabled mobile device cloud (MDC) as a replacement to the existing TCP based or D2D device cloud techniques, as it effectively makes use of the available bandwidth by providing much higher throughput as well as ensures robust wireless connectivity. We investigate the congestion in mobile-device cloud formation resulting mainly due to the message passing for service providing nodes at the time of discovery, service continuity and formation of cloud composition. We propose a user space agent called congestion handler that enable offloading of packets from one sub-flow to the other under link quality constraints. Further, we discuss the benefits of this design and perform preliminary analysis of the system. △ Less

Submitted 30 October, 2019; v1 submitted 12 September, 2019; originally announced September 2019.

Journal ref: IEEE FMEC 2019

arXiv:1909.03904 [pdf, ps, other]

doi 10.1109/JSYST.2019.2937568

Association, Blockage and Handoffs in IEEE 802.11ad based 60GHz Picocells- A Closer Look

Authors: Kishor Chandra Joshi, Rizqi Hersyandika, R. Venkatesha Prasad

Abstract: The link misalignment and high susceptibility to blockages are the biggest hurdles in realizing 60GHz based wireless local area networks (WLANs). However, much of the previous studies investigating 60GHz alignment and blockage issues do not provide an accurate quantitative evaluation from the perspective of WLANs. In this paper, we present an in-depth quantitative evaluation of commodity IEEE 802.… ▽ More The link misalignment and high susceptibility to blockages are the biggest hurdles in realizing 60GHz based wireless local area networks (WLANs). However, much of the previous studies investigating 60GHz alignment and blockage issues do not provide an accurate quantitative evaluation from the perspective of WLANs. In this paper, we present an in-depth quantitative evaluation of commodity IEEE 802.11ad devices by forming a 60GHz WLAN with two docking stations mimicking as access points (APs). Through extensive experiments, we provide important insights about directional coverage pattern of antennas, communication range and co-channel interference and blockages. We are able to measure the IEEE 802.11ad link alignment and association overheads in absolute time units. With a very high accuracy (96-97%), our blockage characterization can differentiate between temporary and permanent blockages caused by humans in the indoor environment, which is a key insight. Utilizing our blockage characterization, we also demonstrate intelligent handoff to alternate APs using consumergrade IEEE 802.11ad devices. Our blockage-induced handoff experiments provide important insights that would be helpful in integrating millimeter wave based WLANs into future wireless networks. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Journal ref: IEEE Systems Journal 2019

arXiv:1909.03902 [pdf, ps, other]

doi 10.1109/TII.2019.2931703

Analyzing the Trade-offs in Using Millimeter Wave Directional Links for High Data Rate Tactile Internet Applications

Authors: Kishor Chandra Joshi, Solmaz Niknam, R. Venkatesha Prasad, Balasubramaniam Natarajan

Abstract: Ultra-low latency and high reliability communications are the two defining characteristics of Tactile Internet (TI). Nevertheless, some TI applications would also require high data-rate transfer of audio-visual information to complement the haptic data. Using Millimeter wave (mmWave) communications is an attractive choice for high datarate TI applications due to the availability of large bandwidth… ▽ More Ultra-low latency and high reliability communications are the two defining characteristics of Tactile Internet (TI). Nevertheless, some TI applications would also require high data-rate transfer of audio-visual information to complement the haptic data. Using Millimeter wave (mmWave) communications is an attractive choice for high datarate TI applications due to the availability of large bandwidth in the mmWave bands. Moreover, mmWave radio access is also advantageous to attain the airinterface-diversity required for high reliability in TI systems as mmWave signal propagation significantly differs to sub-6GHz propagation. However, the use of narrow beamwidth in mmWave systems makes them susceptible to link misalignment-induced unreliability and high access latency. In this paper, we analyze the trade-offs between high gain of narrow beamwidth antennas and corresponding susceptibility to misalignment in mmWave links. To alleviate the effects of random antenna misalignment, we propose a beamwidth-adaptation scheme that significantly stabilize the link throughput performance. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: IEEE Transactions on Industrial Informatics, 2019

arXiv:1906.01227 [pdf, other]

An Efficient Graph Convolutional Network Technique for the Travelling Salesman Problem

Authors: Chaitanya K. Joshi, Thomas Laurent, Xavier Bresson

Abstract: This paper introduces a new learning-based approach for approximately solving the Travelling Salesman Problem on 2D Euclidean graphs. We use deep Graph Convolutional Networks to build efficient TSP graph representations and output tours in a non-autoregressive manner via highly parallelized beam search. Our approach outperforms all recently proposed autoregressive deep learning techniques in terms… ▽ More This paper introduces a new learning-based approach for approximately solving the Travelling Salesman Problem on 2D Euclidean graphs. We use deep Graph Convolutional Networks to build efficient TSP graph representations and output tours in a non-autoregressive manner via highly parallelized beam search. Our approach outperforms all recently proposed autoregressive deep learning techniques in terms of solution quality, inference speed and sample efficiency for problem instances of fixed graph sizes. In particular, we reduce the average optimality gap from 0.52% to 0.01% for 50 nodes, and from 2.26% to 1.39% for 100 nodes. Finally, despite improving upon other learning-based approaches for TSP, our approach falls short of standard Operations Research solvers. △ Less

Submitted 14 October, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

arXiv:1905.03092 [pdf, other]

Working women and caste in India: A study of social disadvantage using feature attribution

Authors: Kuhu Joshi, Chaitanya K. Joshi

Abstract: Women belonging to the socially disadvantaged caste-groups in India have historically been engaged in labour-intensive, blue-collar work. We study whether there has been any change in the ability to predict a woman's work-status and work-type based on her caste by interpreting machine learning models using feature attribution. We find that caste is now a less important determinant of work for the… ▽ More Women belonging to the socially disadvantaged caste-groups in India have historically been engaged in labour-intensive, blue-collar work. We study whether there has been any change in the ability to predict a woman's work-status and work-type based on her caste by interpreting machine learning models using feature attribution. We find that caste is now a less important determinant of work for the younger generation of women compared to the older generation. Moreover, younger women from disadvantaged castes are now more likely to be working in white-collar jobs. △ Less

Submitted 3 January, 2020; v1 submitted 27 April, 2019; originally announced May 2019.

Comments: Presented at the ICLR AI for Social Good Workshop 2019; Updated with Addendum (Jan 2020)

arXiv:1811.07143 [pdf, other]

High Quality Prediction of Protein Q8 Secondary Structure by Diverse Neural Network Architectures

Authors: Iddo Drori, Isht Dwivedi, Pranav Shrestha, Jeffrey Wan, Yueqi Wang, Yunchu He, Anthony Mazza, Hugh Krogh-Freeman, Dimitri Leggas, Kendal Sandridge, Linyong Nan, Kaveri Thakoor, Chinmay Joshi, Sonam Goenka, Chen Keasar, Itsik Pe'er

Abstract: We tackle the problem of protein secondary structure prediction using a common task framework. This lead to the introduction of multiple ideas for neural architectures based on state of the art building blocks, used in this task for the first time. We take a principled machine learning approach, which provides genuine, unbiased performance measures, correcting longstanding errors in the applicatio… ▽ More We tackle the problem of protein secondary structure prediction using a common task framework. This lead to the introduction of multiple ideas for neural architectures based on state of the art building blocks, used in this task for the first time. We take a principled machine learning approach, which provides genuine, unbiased performance measures, correcting longstanding errors in the application domain. We focus on the Q8 resolution of secondary structure, an active area for continuously improving methods. We use an ensemble of strong predictors to achieve accuracy of 70.7% (on the CB513 test set using the CB6133filtered training set). These results are statistically indistinguishable from those of the top existing predictors. In the spirit of reproducible research we make our data, models and code available, aiming to set a gold standard for purity of training and testing sets. Such good practices lower entry barriers to this domain and facilitate reproducible, extendable research. △ Less

Submitted 17 November, 2018; originally announced November 2018.

Comments: NIPS 2018 Workshop on Machine Learning for Molecules and Materials, 10 pages

arXiv:1805.05927 [pdf]

CLINIQA: A Machine Intelligence Based Clinical Question Answering System

Authors: M A H Zahid, Ankush Mittal, R. C. Joshi, G. Atluri

Abstract: The recent developments in the field of biomedicine have made large volumes of biomedical literature available to the medical practitioners. Due to the large size and lack of efficient searching strategies, medical practitioners struggle to obtain necessary information available in the biomedical literature. Moreover, the most sophisticated search engines of age are not intelligent enough to inter… ▽ More The recent developments in the field of biomedicine have made large volumes of biomedical literature available to the medical practitioners. Due to the large size and lack of efficient searching strategies, medical practitioners struggle to obtain necessary information available in the biomedical literature. Moreover, the most sophisticated search engines of age are not intelligent enough to interpret the clinicians' questions. These facts reflect the urgent need of an information retrieval system that accepts the queries from medical practitioners' in natural language and returns the answers quickly and efficiently. In this paper, we present an implementation of a machine intelligence based CLINIcal Question Answering system (CLINIQA) to answer medical practitioner's questions. The system was rigorously evaluated on different text mining algorithms and the best components for the system were selected. The system makes use of Unified Medical Language System for semantic analysis of both questions and medical documents. In addition, the system employs supervised machine learning algorithms for classification of the documents, identifying the focus of the question and answer selection. Effective domain-specific heuristics are designed for answer ranking. The performance evaluation on hundred clinical questions shows the effectiveness of our approach. △ Less

Submitted 15 May, 2018; originally announced May 2018.

Comments: This manuscript was submitted to IEEE Transactions on Information Technology in Biomedicine in 2007 and was in second revision when it was withdrawn. As I moved to industry and could not get enough time to revise it. I am uploading it here for anyone interested in conventional ML based approach to NLP

MSC Class: 68T50

arXiv:1706.07503 [pdf, other]

Personalization in Goal-Oriented Dialog

Authors: Chaitanya K. Joshi, Fei Mi, Boi Faltings

Abstract: The main goal of modeling human conversation is to create agents which can interact with people in both open-ended and goal-oriented scenarios. End-to-end trained neural dialog systems are an important line of research for such generalized dialog models as they do not resort to any situation-specific handcrafting of rules. However, incorporating personalization into such systems is a largely unexp… ▽ More The main goal of modeling human conversation is to create agents which can interact with people in both open-ended and goal-oriented scenarios. End-to-end trained neural dialog systems are an important line of research for such generalized dialog models as they do not resort to any situation-specific handcrafting of rules. However, incorporating personalization into such systems is a largely unexplored topic as there are no existing corpora to facilitate such work. In this paper, we present a new dataset of goal-oriented dialogs which are influenced by speaker profiles attached to them. We analyze the shortcomings of an existing end-to-end dialog system based on Memory Networks and propose modifications to the architecture which enable personalization. We also investigate personalization in dialog as a multi-task learning problem, and show that a single model which shares features among various profiles outperforms separate models for each profile. △ Less

Submitted 15 December, 2017; v1 submitted 22 June, 2017; originally announced June 2017.

Comments: Accepted at NIPS 2017 Conversational AI Workshop; Code and data at https://github.com/chaitjo/personalized-dialog

arXiv:1208.3557 [pdf]

Distributed Denial of Service Prevention Techniques

Authors: B. B. Gupta, R. C. Joshi, Manoj Misra

Abstract: The significance of the DDoS problem and the increased occurrence, sophistication and strength of attacks has led to the dawn of numerous prevention mechanisms. Each proposed prevention mechanism has some unique advantages and disadvantages over the others. In this paper, we present a classification of available mechanisms that are proposed in literature on preventing Internet services from possib… ▽ More The significance of the DDoS problem and the increased occurrence, sophistication and strength of attacks has led to the dawn of numerous prevention mechanisms. Each proposed prevention mechanism has some unique advantages and disadvantages over the others. In this paper, we present a classification of available mechanisms that are proposed in literature on preventing Internet services from possible DDoS attacks and discuss the strengths and weaknesses of each mechanism. This provides better understanding of the problem and enables a security administrator to effectively equip his arsenal with proper prevention mechanisms for fighting against DDoS threat. △ Less

Submitted 17 August, 2012; originally announced August 2012.

Comments: ISSN: 1793-8198

Journal ref: International Journal of Computer and Electrical Engineering (IJCEE), vol. 2, number 2, pp. 268-276, 2010

arXiv:1204.5592 [pdf]

Dynamic and Auto Responsive Solution for Distributed Denial-of-Service Attacks Detection in ISP Network

Authors: B. B. Gupta, R. C. Joshi, Manoj Misra

Abstract: Denial of service (DoS) attacks and more particularly the distributed ones (DDoS) are one of the latest threat and pose a grave danger to users, organizations and infrastructures of the Internet. Several schemes have been proposed on how to detect some of these attacks, but they suffer from a range of problems, some of them being impractical and others not being effective against these attacks. Th… ▽ More Denial of service (DoS) attacks and more particularly the distributed ones (DDoS) are one of the latest threat and pose a grave danger to users, organizations and infrastructures of the Internet. Several schemes have been proposed on how to detect some of these attacks, but they suffer from a range of problems, some of them being impractical and others not being effective against these attacks. This paper reports the design principles and evaluation results of our proposed framework that autonomously detects and accurately characterizes a wide range of flooding DDoS attacks in ISP network. Attacks are detected by the constant monitoring of propagation of abrupt traffic changes inside ISP network. For this, a newly designed flow-volume based approach (FVBA) is used to construct profile of the traffic normally seen in the network, and identify anomalies whenever traffic goes out of profile. Consideration of varying tolerance factors make proposed detection system scalable to the varying network conditions and attack loads in real time. Six-sigma method is used to identify threshold values accurately for malicious flows characterization. FVBA has been extensively evaluated in a controlled test-bed environment. Detection thresholds and efficiency is justified using receiver operating characteristics (ROC) curve. For validation, KDD 99, a publicly available benchmark dataset is used. The results show that our proposed system gives a drastic improvement in terms of detection and false alarm rate. △ Less

Submitted 25 April, 2012; originally announced April 2012.

Comments: arXiv admin note: substantial text overlap with arXiv:1203.2400

Journal ref: International Journal of Computer Theory and Engineering, Vol. 1, No. 1, April 2009 1793-821X

arXiv:1204.5590 [pdf]

doi 10.1145/1523103.1523203

An Efficient Analytical Solution to Thwart DDoS Attacks in Public Domain

Authors: B. B. Gupta, R. C. Joshi, Manoj Misra

Abstract: In this paper, an analytical model for DDoS attacks detection is proposed, in which propagation of abrupt traffic changes inside public domain is monitored to detect a wide range of DDoS attacks. Although, various statistical measures can be used to construct profile of the traffic normally seen in the network to identify anomalies whenever traffic goes out of profile, we have selected volume and… ▽ More In this paper, an analytical model for DDoS attacks detection is proposed, in which propagation of abrupt traffic changes inside public domain is monitored to detect a wide range of DDoS attacks. Although, various statistical measures can be used to construct profile of the traffic normally seen in the network to identify anomalies whenever traffic goes out of profile, we have selected volume and flow measure. Consideration of varying tolerance factors make proposed detection system scalable to the varying network conditions and attack loads in real time. NS-2 network simulator on Linux platform is used as simulation testbed. Simulation results show that our proposed solution gives a drastic improvement in terms of detection rate and false positive rate. However, the mammoth volume generated by DDoS attacks pose the biggest challenge in terms of memory and computational overheads as far as monitoring and analysis of traffic at single point connecting victim is concerned. To address this problem, a distributed cooperative technique is proposed that distributes memory and computational overheads to all edge routers for detecting a wide range of DDoS attacks at early stage. △ Less

Submitted 25 April, 2012; originally announced April 2012.

Comments: arXiv admin note: substantial text overlap with arXiv:1203.2400

Journal ref: Proceedings of ACM International Conference on Advances in Computer, Communication and Computing (ICAC3-2008), pp. 503-509, Jan. 23-24, 2009,India

arXiv:1203.2400 [pdf]

An ISP Level Solution to Combat DDoS Attacks using Combined Statistical Based Approach

Authors: B. B. Gupta, Manoj Misra, R. C. Joshi

Abstract: Disruption from service caused by DDoS attacks is an immense threat to Internet today. These attacks can disrupt the availability of Internet services completely, by eating either computational or communication resources through sheer volume of packets sent from distributed locations in a coordinated manner or graceful degradation of network performance by sending attack traffic at low rate. In th… ▽ More Disruption from service caused by DDoS attacks is an immense threat to Internet today. These attacks can disrupt the availability of Internet services completely, by eating either computational or communication resources through sheer volume of packets sent from distributed locations in a coordinated manner or graceful degradation of network performance by sending attack traffic at low rate. In this paper, we describe a novel framework that deals with the detection of variety of DDoS attacks by monitoring propagation of abrupt traffic changes inside ISP Domain and then characterizes flows that carry attack traffic. Two statistical metrics namely, Volume and Flow are used as parameters to detect DDoS attacks. Effectiveness of an anomaly based detection and characterization system highly depends on accuracy of threshold value settings. Inaccurate threshold values cause a large number of false positives and negatives. Therefore, in our scheme, Six-Sigma and varying tolerance factor methods are used to identify threshold values accurately and dynamically for various statistical metrics. NS-2 network simulator on Linux platform is used as simulation testbed to validate effectiveness of proposed approach. Different attack scenarios are implemented by varying total number of zombie machines and at different attack strengths. The comparison with volume-based approach clearly indicates the supremacy of our proposed system. △ Less

Submitted 12 March, 2012; originally announced March 2012.

Journal ref: International Journal of Information Assurance and Security (JIAS), vol. 3, no. 2, pp. 102-110, 2008

arXiv:1203.2399 [pdf]

Estimating strength of DDoS attack using various regression models

Authors: B. B. Gupta, R. C. Joshi, Manoj Misra

Abstract: Anomaly-based DDoS detection systems construct profile of the traffic normally seen in the network, and identify anomalies whenever traffic deviate from normal profile beyond a threshold. This extend of deviation is normally not utilised. This paper reports the evaluation results of proposed approach that utilises this extend of deviation from detection threshold to estimate strength of DDoS attac… ▽ More Anomaly-based DDoS detection systems construct profile of the traffic normally seen in the network, and identify anomalies whenever traffic deviate from normal profile beyond a threshold. This extend of deviation is normally not utilised. This paper reports the evaluation results of proposed approach that utilises this extend of deviation from detection threshold to estimate strength of DDoS attack using various regression models. A relationship is established between number of zombies and observed deviation in sample entropy. Various statistical performance measures, such as coefficient of determination (R2), coefficient of correlation (CC), sum of square error (SSE), mean square error (MSE), root mean square error (RMSE), normalised mean square error (NMSE), Nash-Sutcliffe efficiency index (η) and mean absolute error (MAE) are used to measure the performance of various regression models. Internet type topologies used for simulation are generated using transit-stub model of GT-ITM topology generator. NS-2 network simulator on Linux platform is used as simulation test bed for launching DDoS attacks with varied attack strength. A comparative study is performed using different regression models for estimating strength of DDoS attack. The simulation results are promising as we are able to estimate strength of DDoS attack efficiently with very less error rate using various regression models. △ Less

Submitted 12 March, 2012; originally announced March 2012.

Journal ref: Int. J. Multimedia Intelligence and Security, Vol. 1, No. 4, pp.378-391

Showing 1–38 of 38 results for author: Joshi, C