Search | arXiv e-print repository

doi 10.5281/zenodo.10680345

AI Sustainability in Practice Part Two: Sustainability Throughout the AI Workflow

Authors: David Leslie, Cami Rincon, Morgan Briggs, Antonella Perini, Smera Jayadeva, Ann Borda, SJ Bennett, Christopher Burr, Mhairi Aitken, Michael Katell, Claudia Fischer, Janis Wong, Ismael Kherroubi Garcia

Abstract: The sustainability of AI systems depends on the capacity of project teams to proceed with a continuous sensitivity to their potential real-world impacts and transformative effects. Stakeholder Impact Assessments (SIAs) are governance mechanisms that enable this kind of responsiveness. They are tools that create a procedure for, and a means of documenting, the collaborative evaluation and reflectiv… ▽ More The sustainability of AI systems depends on the capacity of project teams to proceed with a continuous sensitivity to their potential real-world impacts and transformative effects. Stakeholder Impact Assessments (SIAs) are governance mechanisms that enable this kind of responsiveness. They are tools that create a procedure for, and a means of documenting, the collaborative evaluation and reflective anticipation of the possible harms and benefits of AI innovation projects. SIAs are not one-off governance actions. They require project teams to pay continuous attention to the dynamic and changing character of AI production and use and to the shifting conditions of the real-world environments in which AI technologies are embedded. This workbook is part two of two workbooks on AI Sustainability. It provides a template of the SIA and activities that allow a deeper dive into crucial parts of it. It discusses methods for weighing values and considering trade-offs during the SIA. And, it highlights the need to treat the SIA as an end-to-end process of responsive evaluation and re-assessment. △ Less

Submitted 19 February, 2024; originally announced March 2024.

arXiv:2403.15403 [pdf]

doi 10.5281/zenodo.10679891

AI Ethics and Governance in Practice: An Introduction

Authors: David Leslie, Cami Rincon, Morgan Briggs, Antonella Perini, Smera Jayadeva, Ann Borda, SJ Bennett, Christopher Burr, Mhairi Aitken, Michael Katell, Claudia Fischer

Abstract: AI systems may have transformative and long-term effects on individuals and society. To manage these impacts responsibly and direct the development of AI systems toward optimal public benefit, considerations of AI ethics and governance must be a first priority. In this workbook, we introduce and describe our PBG Framework, a multi-tiered governance model that enables project teams to integrate e… ▽ More AI systems may have transformative and long-term effects on individuals and society. To manage these impacts responsibly and direct the development of AI systems toward optimal public benefit, considerations of AI ethics and governance must be a first priority. In this workbook, we introduce and describe our PBG Framework, a multi-tiered governance model that enables project teams to integrate ethical values and practical principles into their innovation practices and to have clear mechanisms for demonstrating and documenting this. △ Less

Submitted 19 February, 2024; originally announced March 2024.

arXiv:2403.14636 [pdf]

doi 10.5281/zenodo.10680527

AI Fairness in Practice

Authors: David Leslie, Cami Rincon, Morgan Briggs, Antonella Perini, Smera Jayadeva, Ann Borda, SJ Bennett, Christopher Burr, Mhairi Aitken, Michael Katell, Claudia Fischer, Janis Wong, Ismael Kherroubi Garcia

Abstract: Reaching consensus on a commonly accepted definition of AI Fairness has long been a central challenge in AI ethics and governance. There is a broad spectrum of views across society on what the concept of fairness means and how it should best be put to practice. In this workbook, we tackle this challenge by exploring how a context-based and society-centred approach to understanding AI Fairness can… ▽ More Reaching consensus on a commonly accepted definition of AI Fairness has long been a central challenge in AI ethics and governance. There is a broad spectrum of views across society on what the concept of fairness means and how it should best be put to practice. In this workbook, we tackle this challenge by exploring how a context-based and society-centred approach to understanding AI Fairness can help project teams better identify, mitigate, and manage the many ways that unfair bias and discrimination can crop up across the AI project workflow. We begin by exploring how, despite the plurality of understandings about the meaning of fairness, priorities of equality and non-discrimination have come to constitute the broadly accepted core of its application as a practical principle. We focus on how these priorities manifest in the form of equal protection from direct and indirect discrimination and from discriminatory harassment. These elements form ethical and legal criteria based upon which instances of unfair bias and discrimination can be identified and mitigated across the AI project workflow. We then take a deeper dive into how the different contexts of the AI project lifecycle give rise to different fairness concerns. This allows us to identify several types of AI Fairness (Data Fairness, Application Fairness, Model Design and Development Fairness, Metric-Based Fairness, System Implementation Fairness, and Ecosystem Fairness) that form the basis of a multi-lens approach to bias identification, mitigation, and management. Building on this, we discuss how to put the principle of AI Fairness into practice across the AI project workflow through Bias Self-Assessment and Bias Risk Management as well as through the documentation of metric-based fairness criteria in a Fairness Position Statement. △ Less

Submitted 19 February, 2024; originally announced March 2024.

arXiv:2403.14635 [pdf]

doi 10.5281/zenodo.10680113

AI Sustainability in Practice Part One: Foundations for Sustainable AI Projects

Authors: David Leslie, Cami Rincon, Morgan Briggs, Antonella Perini, Smera Jayadeva, Ann Borda, SJ Bennett, Christopher Burr, Mhairi Aitken, Michael Katell, Claudia Fischer, Janis Wong, Ismael Kherroubi Garcia

Abstract: Sustainable AI projects are continuously responsive to the transformative effects as well as short-, medium-, and long-term impacts on individuals and society that the design, development, and deployment of AI technologies may have. Projects, which centre AI Sustainability, ensure that values-led, collaborative, and anticipatory reflection both guides the assessment of potential social and ethical… ▽ More Sustainable AI projects are continuously responsive to the transformative effects as well as short-, medium-, and long-term impacts on individuals and society that the design, development, and deployment of AI technologies may have. Projects, which centre AI Sustainability, ensure that values-led, collaborative, and anticipatory reflection both guides the assessment of potential social and ethical impacts and steers responsible innovation practices. This workbook is the first part of a pair that provides the concepts and tools needed to put AI Sustainability into practice. It introduces the SUM Values, which help AI project teams to assess the potential societal impacts and ethical permissibility of their projects. It then presents a Stakeholder Engagement Process (SEP), which provides tools to facilitate proportionate engagement of and input from stakeholders with an emphasis on equitable and meaningful participation and positionality awareness. △ Less

Submitted 19 February, 2024; originally announced March 2024.

arXiv:2312.10080 [pdf, ps, other]

No prejudice! Fair Federated Graph Neural Networks for Personalized Recommendation

Authors: Nimesh Agrawal, Anuj Kumar Sirohi, Jayadeva, Sandeep Kumar

Abstract: Ensuring fairness in Recommendation Systems (RSs) across demographic groups is critical due to the increased integration of RSs in applications such as personalized healthcare, finance, and e-commerce. Graph-based RSs play a crucial role in capturing intricate higher-order interactions among entities. However, integrating these graph models into the Federated Learning (FL) paradigm with fairness c… ▽ More Ensuring fairness in Recommendation Systems (RSs) across demographic groups is critical due to the increased integration of RSs in applications such as personalized healthcare, finance, and e-commerce. Graph-based RSs play a crucial role in capturing intricate higher-order interactions among entities. However, integrating these graph models into the Federated Learning (FL) paradigm with fairness constraints poses formidable challenges as this requires access to the entire interaction graph and sensitive user information (such as gender, age, etc.) at the central server. This paper addresses the pervasive issue of inherent bias within RSs for different demographic groups without compromising the privacy of sensitive user attributes in FL environment with the graph-based model. To address the group bias, we propose F2PGNN (Fair Federated Personalized Graph Neural Network), a novel framework that leverages the power of Personalized Graph Neural Network (GNN) coupled with fairness considerations. Additionally, we use differential privacy techniques to fortify privacy protection. Experimental evaluation on three publicly available datasets showcases the efficacy of F2PGNN in mitigating group unfairness by 47% - 99% compared to the state-of-the-art while preserving privacy and maintaining the utility. The results validate the significance of our framework in achieving equitable and personalized recommendations using GNN within the FL landscape. △ Less

Submitted 20 December, 2023; v1 submitted 10 December, 2023; originally announced December 2023.

Comments: To appear as a full paper in AAAI 2024

arXiv:2308.09115 [pdf]

MaScQA: A Question Answering Dataset for Investigating Materials Science Knowledge of Large Language Models

Authors: Mohd Zaki, Jayadeva, Mausam, N. M. Anoop Krishnan

Abstract: Information extraction and textual comprehension from materials literature are vital for develo** an exhaustive knowledge base that enables accelerated materials discovery. Language models have demonstrated their capability to answer domain-specific questions and retrieve information from knowledge bases. However, there are no benchmark datasets in the materials domain that can evaluate the unde… ▽ More Information extraction and textual comprehension from materials literature are vital for develo** an exhaustive knowledge base that enables accelerated materials discovery. Language models have demonstrated their capability to answer domain-specific questions and retrieve information from knowledge bases. However, there are no benchmark datasets in the materials domain that can evaluate the understanding of the key concepts by these language models. In this work, we curate a dataset of 650 challenging questions from the materials domain that require the knowledge and skills of a materials student who has cleared their undergraduate degree. We classify these questions based on their structure and the materials science domain-based subcategories. Further, we evaluate the performance of GPT-3.5 and GPT-4 models on solving these questions via zero-shot and chain of thought prompting. It is observed that GPT-4 gives the best performance (~62% accuracy) as compared to GPT-3.5. Interestingly, in contrast to the general observation, no significant improvement in accuracy is observed with the chain of thought prompting. To evaluate the limitations, we performed an error analysis, which revealed conceptual errors (~64%) as the major contributor compared to computational errors (~36%) towards the reduced performance of LLMs. We hope that the dataset and analysis performed in this work will promote further research in develo** better materials science domain-specific LLMs and strategies for information extraction. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2307.05299 [pdf, other]

Discovering Symbolic Laws Directly from Trajectories with Hamiltonian Graph Neural Networks

Authors: Suresh Bishnoi, Ravinder Bhattoo, Jayadeva, Sayan Ranu, N M Anoop Krishnan

Abstract: The time evolution of physical systems is described by differential equations, which depend on abstract quantities like energy and force. Traditionally, these quantities are derived as functionals based on observables such as positions and velocities. Discovering these governing symbolic laws is the key to comprehending the interactions in nature. Here, we present a Hamiltonian graph neural networ… ▽ More The time evolution of physical systems is described by differential equations, which depend on abstract quantities like energy and force. Traditionally, these quantities are derived as functionals based on observables such as positions and velocities. Discovering these governing symbolic laws is the key to comprehending the interactions in nature. Here, we present a Hamiltonian graph neural network (HGNN), a physics-enforced GNN that learns the dynamics of systems directly from their trajectory. We demonstrate the performance of HGNN on n-springs, n-pendulums, gravitational systems, and binary Lennard Jones systems; HGNN learns the dynamics in excellent agreement with the ground truth from small amounts of data. We also evaluate the ability of HGNN to generalize to larger system sizes, and to hybrid spring-pendulum system that is a combination of two original systems (spring and pendulum) on which the models are trained independently. Finally, employing symbolic regression on the learned HGNN, we infer the underlying equations relating the energy functionals, even for complex systems such as the binary Lennard-Jones liquid. Our framework facilitates the interpretable discovery of interaction laws directly from physical system trajectories. Furthermore, this approach can be extended to other systems with topology-dependent dynamics, such as cells, polydisperse gels, or deformable bodies. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2306.11435 [pdf, other]

Graph Neural Stochastic Differential Equations for Learning Brownian Dynamics

Authors: Suresh Bishnoi, Jayadeva, Sayan Ranu, N. M. Anoop Krishnan

Abstract: Neural networks (NNs) that exploit strong inductive biases based on physical laws and symmetries have shown remarkable success in learning the dynamics of physical systems directly from their trajectory. However, these works focus only on the systems that follow deterministic dynamics, for instance, Newtonian or Hamiltonian dynamics. Here, we propose a framework, namely Brownian graph neural netwo… ▽ More Neural networks (NNs) that exploit strong inductive biases based on physical laws and symmetries have shown remarkable success in learning the dynamics of physical systems directly from their trajectory. However, these works focus only on the systems that follow deterministic dynamics, for instance, Newtonian or Hamiltonian dynamics. Here, we propose a framework, namely Brownian graph neural networks (BROGNET), combining stochastic differential equations (SDEs) and GNNs to learn Brownian dynamics directly from the trajectory. We theoretically show that BROGNET conserves the linear momentum of the system, which in turn, provides superior performance on learning dynamics as revealed empirically. We demonstrate this approach on several systems, namely, linear spring, linear spring with binary particle types, and non-linear spring systems, all following Brownian dynamics at finite temperatures. We show that BROGNET significantly outperforms proposed baselines across all the benchmarked Brownian systems. In addition, we demonstrate zero-shot generalizability of BROGNET to simulate unseen system sizes that are two orders of magnitude larger and to different temperatures than those used during training. Altogether, our study contributes to advancing the understanding of the intricate dynamics of Brownian motion and demonstrates the effectiveness of graph neural networks in modeling such complex systems. △ Less

Submitted 20 June, 2023; originally announced June 2023.

arXiv:2211.03223 [pdf]

Cementron: Machine Learning the Constituent Phases in Cement Clinker from Optical Images

Authors: Mohd Zaki, Siddhant Sharma, Sunil Kumar Gurjar, Raju Goyal, Jayadeva, N. M. Anoop Krishnan

Abstract: Cement is the most used construction material. The performance of cement hydrate depends on the constituent phases, viz. alite, belite, aluminate, and ferrites present in the cement clinker, both qualitatively and quantitatively. Traditionally, clinker phases are analyzed from optical images relying on a domain expert and simple image processing techniques. However, the non-uniformity of the image… ▽ More Cement is the most used construction material. The performance of cement hydrate depends on the constituent phases, viz. alite, belite, aluminate, and ferrites present in the cement clinker, both qualitatively and quantitatively. Traditionally, clinker phases are analyzed from optical images relying on a domain expert and simple image processing techniques. However, the non-uniformity of the images, variations in the geometry and size of the phases, and variabilities in the experimental approaches and imaging methods make it challenging to obtain the phases. Here, we present a machine learning (ML) approach to detect clinker microstructure phases automatically. To this extent, we create the first annotated dataset of cement clinker by segmenting alite and belite particles. Further, we use supervised ML methods to train models for identifying alite and belite regions. Specifically, we finetune the image detection and segmentation model Detectron-2 on the cement microstructure to develop a model for detecting the cement phases, namely, Cementron. We demonstrate that Cementron, trained only on literature data, works remarkably well on new images obtained from our experiments, demonstrating its generalizability. We make Cementron available for public use. △ Less

Submitted 6 November, 2022; originally announced November 2022.

arXiv:2210.10507 [pdf]

doi 10.1016/j.jnoncrysol.2023.122488

Predicting Oxide Glass Properties with Low Complexity Neural Network and Physical and Chemical Descriptors

Authors: Suresh Bishnoi, Skyler Badge, Jayadeva, N. M. Anoop Krishnan

Abstract: Due to their disordered structure, glasses present a unique challenge in predicting the composition-property relationships. Recently, several attempts have been made to predict the glass properties using machine learning techniques. However, these techniques have the limitations, namely, (i) predictions are limited to the components that are present in the original dataset, and (ii) predictions to… ▽ More Due to their disordered structure, glasses present a unique challenge in predicting the composition-property relationships. Recently, several attempts have been made to predict the glass properties using machine learning techniques. However, these techniques have the limitations, namely, (i) predictions are limited to the components that are present in the original dataset, and (ii) predictions towards the extreme values of the properties, important regions for new materials discovery, are not very reliable due to the sparse datapoints in this region. To address these challenges, here we present a low complexity neural network (LCNN) that provides improved performance in predicting the properties of oxide glasses. In addition, we combine the LCNN with physical and chemical descriptors that allow the development of universal models that can provide predictions for components beyond the training set. By training on a large dataset (~50000) of glass components, we show the LCNN outperforms state-of-the-art algorithms such as XGBoost. In addition, we interpret the LCNN models using Shapely additive explanations to gain insights into the role played by the descriptors in governing the property. Finally, we demonstrate the universality of the LCNN models by predicting the properties for glasses with new components that were not present in the original training set. Altogether, the present approach provides a promising direction towards accelerated discovery of novel glass compositions. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: 15 pages, 3 figures

arXiv:2207.11312 [pdf, other]

HybMT: Hybrid Meta-Predictor based ML Algorithm for Fast Test Vector Generation

Authors: Shruti Pandey, Jayadeva, Smruti R. Sarangi

Abstract: ML models are increasingly being used to increase the test coverage and decrease the overall testing time. This field is still in its nascent stage and up till now there were no algorithms that could match or outperform commercial tools in terms of speed and accuracy for large circuits. We propose an ATPG algorithm HybMT in this paper that finally breaks this barrier. Like sister methods, we augme… ▽ More ML models are increasingly being used to increase the test coverage and decrease the overall testing time. This field is still in its nascent stage and up till now there were no algorithms that could match or outperform commercial tools in terms of speed and accuracy for large circuits. We propose an ATPG algorithm HybMT in this paper that finally breaks this barrier. Like sister methods, we augment the classical PODEM algorithm that uses recursive backtracking. We design a custom 2-level predictor that predicts the input net of a logic gate whose value needs to be set to ensure that the output is a given value (0 or 1). Our predictor chooses the output from among two first-level predictors, where the most effective one is a bespoke neural network and the other is an SVM regressor. As compared to a popular, state-of-the-art commercial ATPG tool, HybMT shows an overall reduction of 56.6% in the CPU time without compromising on the fault coverage for the EPFL benchmark circuits. HybMT also shows a speedup of 126.4% over the best ML-based algorithm while obtaining an equal or better fault coverage for the EPFL benchmark circuits. △ Less

Submitted 7 August, 2023; v1 submitted 22 July, 2022; originally announced July 2022.

Comments: 6 pages, 5 figures and 5 tables. Changes from the previous version: We modified our novel neural network model "HybNN" with a skip connection and found a significant improvement in the fault coverage and runtime of our HybMT-based PODEM algorithm. We train on the smaller ISCAS'85 circuits, report the results for the EPFL benchmark circuits (most recent and up to 70X large)

arXiv:2205.01037 [pdf]

doi 10.5281/zenodo.6428185

Data Justice in Practice: A Guide for Developers

Authors: David Leslie, Michael Katell, Mhairi Aitken, Jatinder Singh, Morgan Briggs, Rosamund Powell, Cami Rincón, Antonella Perini, Smera Jayadeva, Christopher Burr

Abstract: The Advancing Data Justice Research and Practice project aims to broaden understanding of the social, historical, cultural, political, and economic forces that contribute to discrimination and inequity in contemporary ecologies of data collection, governance, and use. This is the consultation draft of a guide for developers and organisations, which are producing, procuring, or using data-intensive… ▽ More The Advancing Data Justice Research and Practice project aims to broaden understanding of the social, historical, cultural, political, and economic forces that contribute to discrimination and inequity in contemporary ecologies of data collection, governance, and use. This is the consultation draft of a guide for developers and organisations, which are producing, procuring, or using data-intensive technologies.In the first section, we introduce the field of data justice, from its early discussions to more recent proposals to relocate understandings of what data justice means. This section includes a description of the six pillars of data justice around which this guidance revolves. Next, to support developers in designing, develo**, and deploying responsible and equitable data-intensive and AI/ML systems, we outline the AI/ML project lifecycle through a sociotechnical lens. To support the operationalisation data justice throughout the entirety of the AI/ML lifecycle and within data innovation ecosystems, we then present five overarching principles of responsible, equitable, and trustworthy data research and innovation practices, the SAFE-D principles-Safety, Accountability, Fairness, Explainability, and Data Quality, Integrity, Protection, and Privacy. The final section presents guiding questions that will help developers both address data justice issues throughout the AI/ML lifecycle and engage in reflective innovation practices that ensure the design, development, and deployment of responsible and equitable data-intensive and AI/ML systems. △ Less

Submitted 12 April, 2022; originally announced May 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2202.02776

arXiv:2204.03100 [pdf]

doi 10.5281/zenodo.6408326

Data Justice Stories: A Repository of Case Studies

Authors: David Leslie, Morgan Briggs, Antonella Perini, Smera Jayadeva, Cami Rincón, Noopur Raval, Abeba Birhane, Rosamund Powell, Michael Katell, Mhairi Aitken

Abstract: The idea of "data justice" is of recent academic vintage. It has arisen over the past decade in Anglo-European research institutions as an attempt to bring together a critique of the power dynamics that underlie accelerating trends of datafication with a normative commitment to the principles of social justice-a commitment to the achievement of a society that is equitable, fair, and capable of con… ▽ More The idea of "data justice" is of recent academic vintage. It has arisen over the past decade in Anglo-European research institutions as an attempt to bring together a critique of the power dynamics that underlie accelerating trends of datafication with a normative commitment to the principles of social justice-a commitment to the achievement of a society that is equitable, fair, and capable of confronting the root causes of injustice.However, despite the seeming novelty of such a data justice pedigree, this joining up of the critique of the power imbalances that have shaped the digital and "big data" revolutions with a commitment to social equity and constructive societal transformation has a deeper historical, and more geographically diverse, provenance. As the stories of the data justice initiatives, activism, and advocacy contained in this volume well evidence, practices of data justice across the globe have, in fact, largely preceded the elaboration and crystallisation of the idea of data justice in contemporary academic discourse. In telling these data justice stories, we hope to provide the reader with two interdependent tools of data justice thinking: First, we aim to provide the reader with the critical leverage needed to discern those distortions and malformations of data justice that manifest in subtle and explicit forms of power, domination, and coercion. Second, we aim to provide the reader with access to the historically effective forms of normativity and ethical insight that have been marshalled by data justice activists and advocates as tools of societal transformation-so that these forms of normativity and insight can be drawn on, in turn, as constructive resources to spur future transformative data justice practices. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2204.03090 [pdf]

doi 10.5281/zenodo.6408304

Advancing Data Justice Research and Practice: An Integrated Literature Review

Authors: David Leslie, Michael Katell, Mhairi Aitken, Jatinder Singh, Morgan Briggs, Rosamund Powell, Cami Rincón, Thompson Chengeta, Abeba Birhane, Antonella Perini, Smera Jayadeva, Anjali Mazumder

Abstract: The Advancing Data Justice Research and Practice (ADJRP) project aims to widen the lens of current thinking around data justice and to provide actionable resources that will help policymakers, practitioners, and impacted communities gain a broader understanding of what equitable, freedom-promoting, and rights-sustaining data collection, governance, and use should look like in increasingly dynamic… ▽ More The Advancing Data Justice Research and Practice (ADJRP) project aims to widen the lens of current thinking around data justice and to provide actionable resources that will help policymakers, practitioners, and impacted communities gain a broader understanding of what equitable, freedom-promoting, and rights-sustaining data collection, governance, and use should look like in increasingly dynamic and global data innovation ecosystems. In this integrated literature review we hope to lay the conceptual groundwork needed to support this aspiration. The introduction motivates the broadening of data justice that is undertaken by the literature review which follows. First, we address how certain limitations of the current study of data justice drive the need for a re-location of data justice research and practice. We map out the strengths and shortcomings of the contemporary state of the art and then elaborate on the challenges faced by our own effort to broaden the data justice perspective in the decolonial context. The body of the literature review covers seven thematic areas. For each theme, the ADJRP team has systematically collected and analysed key texts in order to tell the critical empirical story of how existing social structures and power dynamics present challenges to data justice and related justice fields. In each case, this critical empirical story is also supplemented by the transformational story of how activists, policymakers, and academics are challenging longstanding structures of inequity to advance social justice in data innovation ecosystems and adjacent areas of technological practice. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2103.03633 [pdf]

Unveiling the Glass Veil: Elucidating the Optical Properties in Glasses with Interpretable Machine Learning

Authors: Mohd Zaki, Vineeth Venugopal, R. Ravinder, Suresh Bishnoi, Sourabh Kumar Singh, Amarnath R. Allu, Jayadeva, N. M. Anoop Krishnan

Abstract: Due to their excellent optical properties, glasses are used for various applications ranging from smartphone screens to telescopes. Develo** compositions with tailored Abbe number (Vd) and refractive index (nd), two crucial optical properties, is a major challenge. To this extent, machine learning (ML) approaches have been successfully used to develop composition-property models. However, these… ▽ More Due to their excellent optical properties, glasses are used for various applications ranging from smartphone screens to telescopes. Develo** compositions with tailored Abbe number (Vd) and refractive index (nd), two crucial optical properties, is a major challenge. To this extent, machine learning (ML) approaches have been successfully used to develop composition-property models. However, these models are essentially black-box in nature and suffer from the lack of interpretability. In this paper, we demonstrate the use of ML models to predict the composition-dependent variations of Vd and n at 587.6 nm (nd). Further, using Shapely Additive exPlanations (SHAP), we interpret the ML models to identify the contribution of each of the input components toward a target prediction. We observe that the glass formers such as SiO2, B2O3, and P2O5, and intermediates like TiO2, PbO, and Bi2O3 play a significant role in controlling the optical properties. Interestingly, components that contribute toward increasing the nd are found to decrease the Vd and vice-versa. Finally, we develop the Abbe diagram, also known as the "glass veil", using the ML models, allowing accelerated discovery of new glasses for optical properties beyond the experimental pareto front. Overall, employing explainable ML, we discover the hidden compositional control on the optical properties of oxide glasses. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: 13 pages, 5 figures

arXiv:2102.07975 [pdf, other]

Twin Augmented Architectures for Robust Classification of COVID-19 Chest X-Ray Images

Authors: Kartikeya Badola, Sameer Ambekar, Himanshu Pant, Sumit Soman, Anuradha Sural, Rajiv Narang, Suresh Chandra, Jayadeva

Abstract: The gold standard for COVID-19 is RT-PCR, testing facilities for which are limited and not always optimally distributed. Test results are delayed, which impacts treatment. Expert radiologists, one of whom is a co-author, are able to diagnose COVID-19 positivity from Chest X-Rays (CXR) and CT scans, that can facilitate timely treatment. Such diagnosis is particularly valuable in locations lacking r… ▽ More The gold standard for COVID-19 is RT-PCR, testing facilities for which are limited and not always optimally distributed. Test results are delayed, which impacts treatment. Expert radiologists, one of whom is a co-author, are able to diagnose COVID-19 positivity from Chest X-Rays (CXR) and CT scans, that can facilitate timely treatment. Such diagnosis is particularly valuable in locations lacking radiologists with sufficient expertise and familiarity with COVID-19 patients. This paper has two contributions. One, we analyse literature on CXR based COVID-19 diagnosis. We show that popular choices of dataset selection suffer from data homogeneity, leading to misleading results. We compile and analyse a viable benchmark dataset from multiple existing heterogeneous sources. Such a benchmark is important for realistically testing models. Our second contribution relates to learning from imbalanced data. Datasets for COVID X-Ray classification face severe class imbalance, since most subjects are COVID -ve. Twin Support Vector Machines (Twin SVM) and Twin Neural Networks (Twin NN) have, in recent years, emerged as effective ways of handling skewed data. We introduce a state-of-the-art technique, termed as Twin Augmentation, for modifying popular pre-trained deep learning models. Twin Augmentation boosts the performance of a pre-trained deep neural network without requiring re-training. Experiments show, that across a multitude of classifiers, Twin Augmentation is very effective in boosting the performance of given pre-trained model for classification in imbalanced settings. △ Less

Submitted 16 February, 2021; originally announced February 2021.

MSC Class: 68T07

arXiv:2011.10223 [pdf, other]

Complexity Controlled Generative Adversarial Networks

Authors: Himanshu Pant, Jayadeva, Sumit Soman

Abstract: One of the issues faced in training Generative Adversarial Nets (GANs) and their variants is the problem of mode collapse, wherein the training stability in terms of the generative loss increases as more training data is used. In this paper, we propose an alternative architecture via the Low-Complexity Neural Network (LCNN), which attempts to learn models with low complexity. The motivation is tha… ▽ More One of the issues faced in training Generative Adversarial Nets (GANs) and their variants is the problem of mode collapse, wherein the training stability in terms of the generative loss increases as more training data is used. In this paper, we propose an alternative architecture via the Low-Complexity Neural Network (LCNN), which attempts to learn models with low complexity. The motivation is that controlling model complexity leads to models that do not overfit the training data. We incorporate the LCNN loss function for GANs, Deep Convolutional GANs (DCGANs) and Spectral Normalized GANs (SNGANs), in order to develop hybrid architectures called the LCNN-GAN, LCNN-DCGAN and LCNN-SNGAN respectively. On various large benchmark image datasets, we show that the use of our proposed models results in stable training while avoiding the problem of mode collapse, resulting in better training stability. We also show how the learning behavior can be controlled by a hyperparameter in the LCNN functional, which also provides an improved inception score. △ Less

Submitted 20 November, 2020; originally announced November 2020.

Comments: 11 pages

arXiv:2011.03729 [pdf, other]

Enhash: A Fast Streaming Algorithm For Concept Drift Detection

Authors: Aashi **dal, Prashant Gupta, Debarka Sengupta, Jayadeva

Abstract: We propose Enhash, a fast ensemble learner that detects \textit{concept drift} in a data stream. A stream may consist of abrupt, gradual, virtual, or recurring events, or a mixture of various types of drift. Enhash employs projection hash to insert an incoming sample. We show empirically that the proposed method has competitive performance to existing ensemble learners in much lesser time. Also, E… ▽ More We propose Enhash, a fast ensemble learner that detects \textit{concept drift} in a data stream. A stream may consist of abrupt, gradual, virtual, or recurring events, or a mixture of various types of drift. Enhash employs projection hash to insert an incoming sample. We show empirically that the proposed method has competitive performance to existing ensemble learners in much lesser time. Also, Enhash has moderate resource requirements. Experiments relevant to performance comparison were performed on 6 artificial and 4 real data sets consisting of various types of drifts. △ Less

Submitted 7 November, 2020; originally announced November 2020.

arXiv:1912.11582 [pdf]

Deep Learning Aided Rational Design of Oxide Glasses

Authors: R. Ravinder, Karthikeya H. Sreedhara, Suresh Bishnoi, Hargun Singh Grover, Mathieu Bauchy, Jayadeva, Hariprasad Kodamana, N. M. Anoop Krishnan

Abstract: Despite the extensive usage of oxide glasses for a few millennia, the composition-property relationships in these materials still remain poorly understood. While empirical and physics-based models have been used to predict properties, these remain limited to a few select compositions or a series of glasses. Designing new glasses requires a priori knowledge of how the composition of a glass dictate… ▽ More Despite the extensive usage of oxide glasses for a few millennia, the composition-property relationships in these materials still remain poorly understood. While empirical and physics-based models have been used to predict properties, these remain limited to a few select compositions or a series of glasses. Designing new glasses requires a priori knowledge of how the composition of a glass dictates its properties such as stiffness, density, or processability. Thus, accelerated design of glasses for targeted applications remain impeded due to the lack of universal composition-property models. Herein, using deep learning, we present a methodology for the rational design of oxide glasses. Exploiting a large dataset of glasses comprising of up to 37 oxide components and more than 100,000 glass compositions, we develop high-fidelity deep neural networks for the prediction of eight properties that enable the design of glasses, namely, density, Young's modulus, shear modulus, hardness, glass transition temperature, thermal expansion coefficient, liquidus temperature, and refractive index. These models are by far the most extensive models developed as they cover the entire range of human-made glass compositions. We demonstrate that the models developed here exhibit excellent predictability, ensuring close agreement with experimental observations. Using these models, we develop a series of new design charts, termed as glass selection charts. These charts enable the rational design of functional glasses for targeted applications by identifying unique compositions that satisfy two or more constraints, on both compositions and properties, simultaneously. The generic design approach presented herein could catalyze machine-learning assisted materials design and discovery for a large class of materials including metals, ceramics, and proteins. △ Less

Submitted 24 December, 2019; originally announced December 2019.

Comments: 15 pages, 5 figures

arXiv:1909.00659 [pdf, other]

Guided Random Forest and its application to data approximation

Authors: Prashant Gupta, Aashi **dal, Jayadeva, Debarka Sengupta

Abstract: We present a new way of constructing an ensemble classifier, named the Guided Random Forest (GRAF) in the sequel. GRAF extends the idea of building oblique decision trees with localized partitioning to obtain a global partitioning. We show that global partitioning bridges the gap between decision trees and boosting algorithms. We empirically demonstrate that global partitioning reduces the general… ▽ More We present a new way of constructing an ensemble classifier, named the Guided Random Forest (GRAF) in the sequel. GRAF extends the idea of building oblique decision trees with localized partitioning to obtain a global partitioning. We show that global partitioning bridges the gap between decision trees and boosting algorithms. We empirically demonstrate that global partitioning reduces the generalization error bound. Results on 115 benchmark datasets show that GRAF yields comparable or better results on a majority of datasets. We also present a new way of approximating the datasets in the framework of random forests. △ Less

Submitted 2 September, 2019; originally announced September 2019.

arXiv:1908.11250 [pdf, other]

Smaller Models, Better Generalization

Authors: Mayank Sharma, Suraj Tripathi, Abhimanyu Dubey, Jayadeva, Sai Guruju, Nihal Goalla

Abstract: Reducing network complexity has been a major research focus in recent years with the advent of mobile technology. Convolutional Neural Networks that perform various vision tasks without memory overhaul is the need of the hour. This paper focuses on qualitative and quantitative analysis of reducing the network complexity using an upper bound on the Vapnik-Chervonenkis dimension, pruning, and quanti… ▽ More Reducing network complexity has been a major research focus in recent years with the advent of mobile technology. Convolutional Neural Networks that perform various vision tasks without memory overhaul is the need of the hour. This paper focuses on qualitative and quantitative analysis of reducing the network complexity using an upper bound on the Vapnik-Chervonenkis dimension, pruning, and quantization. We observe a general trend in improvement of accuracies as we quantize the models. We propose a novel loss function that helps in achieving considerable sparsity at comparable accuracies to that of dense models. We compare various regularizations prevalent in the literature and show the superiority of our method in achieving sparser models that generalize well. △ Less

Submitted 29 August, 2019; originally announced August 2019.

Comments: 10 pages, 3 figures, In Review

arXiv:1901.11458 [pdf, other]

Effect of Various Regularizers on Model Complexities of Neural Networks in Presence of Input Noise

Authors: Mayank Sharma, Aayush Yadav, Sumit Soman, Jayadeva

Abstract: Deep neural networks are over-parameterized, which implies that the number of parameters are much larger than the number of samples used to train the network. Even in such a regime deep architectures do not overfit. This phenomenon is an active area of research and many theories have been proposed trying to understand this peculiar observation. These include the Vapnik Chervonenkis (VC) dimension… ▽ More Deep neural networks are over-parameterized, which implies that the number of parameters are much larger than the number of samples used to train the network. Even in such a regime deep architectures do not overfit. This phenomenon is an active area of research and many theories have been proposed trying to understand this peculiar observation. These include the Vapnik Chervonenkis (VC) dimension bounds and Rademacher complexity bounds which show that the capacity of the network is characterized by the norm of weights rather than the number of parameters. However, the effect of input noise on these measures for shallow and deep architectures has not been studied. In this paper, we analyze the effects of various regularization schemes on the complexity of a neural network which we characterize with the loss, $L_2$ norm of the weights, Rademacher complexities (Directly Approximately Regularizing Complexity-DARC1), VC dimension based Low Complexity Neural Network (LCNN) when subject to varying degrees of Gaussian input noise. We show that $L_2$ regularization leads to a simpler hypothesis class and better generalization followed by DARC1 regularizer, both for shallow as well as deeper architectures. Jacobian regularizer works well for shallow architectures with high level of input noises. Spectral normalization attains highest test set accuracies both for shallow and deeper architectures. We also show that Dropout alone does not perform well in presence of input noise. Finally, we show that deeper architectures are robust to input noise as opposed to their shallow counterparts. △ Less

Submitted 31 January, 2019; originally announced January 2019.

arXiv:1811.01171 [pdf, ps, other]

Radius-margin bounds for deep neural networks

Authors: Mayank Sharma, Jayadeva, Sumit Soman

Abstract: Explaining the unreasonable effectiveness of deep learning has eluded researchers around the globe. Various authors have described multiple metrics to evaluate the capacity of deep architectures. In this paper, we allude to the radius margin bounds described for a support vector machine (SVM) with hinge loss, apply the same to the deep feed-forward architectures and derive the Vapnik-Chervonenkis… ▽ More Explaining the unreasonable effectiveness of deep learning has eluded researchers around the globe. Various authors have described multiple metrics to evaluate the capacity of deep architectures. In this paper, we allude to the radius margin bounds described for a support vector machine (SVM) with hinge loss, apply the same to the deep feed-forward architectures and derive the Vapnik-Chervonenkis (VC) bounds which are different from the earlier bounds proposed in terms of number of weights of the network. In doing so, we also relate the effectiveness of techniques like Dropout and Dropconnect in bringing down the capacity of the network. Finally, we describe the effect of maximizing the input as well as the output margin to achieve an input noise-robust deep architecture. △ Less

Submitted 3 November, 2018; originally announced November 2018.

arXiv:1708.05840 [pdf, other]

A Data and Model-Parallel, Distributed and Scalable Framework for Training of Deep Networks in Apache Spark

Authors: Disha Shrivastava, Santanu Chaudhury, Dr. Jayadeva

Abstract: Training deep networks is expensive and time-consuming with the training period increasing with data size and growth in model parameters. In this paper, we provide a framework for distributed training of deep networks over a cluster of CPUs in Apache Spark. The framework implements both Data Parallelism and Model Parallelism making it suitable to use for deep networks which require huge training d… ▽ More Training deep networks is expensive and time-consuming with the training period increasing with data size and growth in model parameters. In this paper, we provide a framework for distributed training of deep networks over a cluster of CPUs in Apache Spark. The framework implements both Data Parallelism and Model Parallelism making it suitable to use for deep networks which require huge training data and model parameters which are too big to fit into the memory of a single machine. It can be scaled easily over a cluster of cheap commodity hardware to attain significant speedup and obtain better results making it quite economical as compared to farm of GPUs and supercomputers. We have proposed a new algorithm for training of deep networks for the case when the network is partitioned across the machines (Model Parallelism) along with detailed cost analysis and proof of convergence of the same. We have developed implementations for Fully-Connected Feedforward Networks, Convolutional Neural Networks, Recurrent Neural Networks and Long Short-Term Memory architectures. We present the results of extensive simulations demonstrating the speedup and accuracy obtained by our framework for different sizes of the data and model parameters with variation in the number of worker cores/partitions; thereby showing that our proposed framework can achieve significant speedup (upto 11X for CNN) and is also quite scalable. △ Less

Submitted 19 August, 2017; originally announced August 2017.

Comments: 12 pages

arXiv:1707.09933 [pdf, other]

Learning Neural Network Classifiers with Low Model Complexity

Authors: Jayadeva, Himanshu Pant, Mayank Sharma, Abhimanyu Dubey, Sumit Soman, Suraj Tripathi, Sai Guruju, Nihal Goalla

Abstract: Modern neural network architectures for large-scale learning tasks have substantially higher model complexities, which makes understanding, visualizing and training these architectures difficult. Recent contributions to deep learning techniques have focused on architectural modifications to improve parameter efficiency and performance. In this paper, we derive a continuous and differentiable error… ▽ More Modern neural network architectures for large-scale learning tasks have substantially higher model complexities, which makes understanding, visualizing and training these architectures difficult. Recent contributions to deep learning techniques have focused on architectural modifications to improve parameter efficiency and performance. In this paper, we derive a continuous and differentiable error functional for a neural network that minimizes its empirical error as well as a measure of the model complexity. The latter measure is obtained by deriving a differentiable upper bound on the Vapnik-Chervonenkis (VC) dimension of the classifier layer of a class of deep networks. Using standard backpropagation, we realize a training rule that tries to minimize the error on training samples, while improving generalization by kee** the model complexity low. We demonstrate the effectiveness of our formulation (the Low Complexity Neural Network - LCNN) across several deep learning algorithms, and a variety of large benchmark datasets. We show that hidden layer neurons in the resultant networks learn features that are crisp, and in the case of image datasets, quantitatively sharper. Our proposed approach yields benefits across a wide range of architectures, in comparison to and in conjunction with methods such as Dropout and Batch Normalization, and our results strongly suggest that deep learning techniques can benefit from model complexity control methods such as the LCNN learning rule. △ Less

Submitted 5 March, 2021; v1 submitted 31 July, 2017; originally announced July 2017.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

MSC Class: 68T05; 68T10; 68Q32

arXiv:1705.00347 [pdf, other]

doi 10.1016/j.neucom.2018.07.089

Scalable Twin Neural Networks for Classification of Unbalanced Data

Authors: Jayadeva, Himanshu Pant, Sumit Soman, Mayank Sharma

Abstract: Twin Support Vector Machines (TWSVMs) have emerged an efficient alternative to Support Vector Machines (SVM) for learning from imbalanced datasets. The TWSVM learns two non-parallel classifying hyperplanes by solving a couple of smaller sized problems. However, it is unsuitable for large datasets, as it involves matrix operations. In this paper, we discuss a Twin Neural Network (Twin NN) architect… ▽ More Twin Support Vector Machines (TWSVMs) have emerged an efficient alternative to Support Vector Machines (SVM) for learning from imbalanced datasets. The TWSVM learns two non-parallel classifying hyperplanes by solving a couple of smaller sized problems. However, it is unsuitable for large datasets, as it involves matrix operations. In this paper, we discuss a Twin Neural Network (Twin NN) architecture for learning from large unbalanced datasets. The Twin NN also learns an optimal feature map, allowing for better discrimination between classes. We also present an extension of this network architecture for multiclass datasets. Results presented in the paper demonstrate that the Twin NN generalizes well and scales well on large unbalanced datasets. △ Less

Submitted 27 January, 2018; v1 submitted 30 April, 2017; originally announced May 2017.

Comments: 20 pages, 8 figures, 14 tables

MSC Class: 68T05; 68T10; 68Q32

Journal ref: Neurocomputing (Special Issue on Learning in the Presence of Class Imbalance and Concept Drift), 2019

arXiv:1609.03529 [pdf, other]

Examining Representational Similarity in ConvNets and the Primate Visual Cortex

Authors: Abhimanyu Dubey, Jayadeva, Sumeet Agarwal

Abstract: We compare several ConvNets with different depth and regularization techniques with multi-unit macaque IT cortex recordings and assess the impact of the same on representational similarity with the primate visual cortex. We find that with increasing depth and validation performance, ConvNet features are closer to cortical IT representations. We compare several ConvNets with different depth and regularization techniques with multi-unit macaque IT cortex recordings and assess the impact of the same on representational similarity with the primate visual cortex. We find that with increasing depth and validation performance, ConvNet features are closer to cortical IT representations. △ Less

Submitted 12 September, 2016; originally announced September 2016.

Comments: 4 pages, short abstract, Accepted to the Workshop on Biological and Artificial Vision, ECCV, 2016

arXiv:1509.08112 [pdf, other]

Feature Selection for classification of hyperspectral data by minimizing a tight bound on the VC dimension

Authors: Phool Preet, Sanjit Singh Batra, Jayadeva

Abstract: Hyperspectral data consists of large number of features which require sophisticated analysis to be extracted. A popular approach to reduce computational cost, facilitate information representation and accelerate knowledge discovery is to eliminate bands that do not improve the classification and analysis methods being applied. In particular, algorithms that perform band elimination should be desig… ▽ More Hyperspectral data consists of large number of features which require sophisticated analysis to be extracted. A popular approach to reduce computational cost, facilitate information representation and accelerate knowledge discovery is to eliminate bands that do not improve the classification and analysis methods being applied. In particular, algorithms that perform band elimination should be designed to take advantage of the specifics of the classification method being used. This paper employs a recently proposed filter-feature-selection algorithm based on minimizing a tight bound on the VC dimension. We have successfully applied this algorithm to determine a reasonable subset of bands without any user-defined stop** criteria on widely used hyperspectral images and demonstrate that this method outperforms state-of-the-art methods in terms of both sparsity of feature set as well as accuracy of classification.\end{abstract} △ Less

Submitted 27 September, 2015; originally announced September 2015.

Comments: basic papers are on http://www.jayadeva.net

MSC Class: 68T05; 68T10; 68Q32 ACM Class: I.5.1, I.5.2, I.4

arXiv:1503.03175 [pdf, other]

doi 10.1016/j.swevo.2015.10.005

Benchmarking NLopt and state-of-art algorithms for Continuous Global Optimization via Hybrid IACO$_\mathbb{R}$

Authors: Udit Kumar, Sumit Soman, Jayadeva

Abstract: This paper presents a comparative analysis of the performance of the Incremental Ant Colony algorithm for continuous optimization ($IACO_\mathbb{R}$), with different algorithms provided in the NLopt library. The key objective is to understand how the various algorithms in the NLopt library perform in combination with the Multi Trajectory Local Search (Mtsls1) technique. A hybrid approach has been… ▽ More This paper presents a comparative analysis of the performance of the Incremental Ant Colony algorithm for continuous optimization ($IACO_\mathbb{R}$), with different algorithms provided in the NLopt library. The key objective is to understand how the various algorithms in the NLopt library perform in combination with the Multi Trajectory Local Search (Mtsls1) technique. A hybrid approach has been introduced in the local search strategy by the use of a parameter which allows for probabilistic selection between Mtsls1 and a NLopt algorithm. In case of stagnation, the algorithm switch is made based on the algorithm being used in the previous iteration. The paper presents an exhaustive comparison on the performance of these approaches on Soft Computing (SOCO) and Congress on Evolutionary Computation (CEC) 2014 benchmarks. For both benchmarks, we conclude that the best performing algorithm is a hybrid variant of Mtsls1 with BFGS for local search. △ Less

Submitted 11 March, 2015; originally announced March 2015.

Comments: 24 pages, 10 figures

MSC Class: 80M50 ACM Class: G.1.6

Journal ref: Swarm and Evolutionary Computation 27 (2016): 116-131

arXiv:1503.03148 [pdf, other]

doi 10.1016/j.neunet.2020.08.013

A Neurodynamical System for finding a Minimal VC Dimension Classifier

Authors: Jayadeva, Sumit Soman, Amit Bhaya

Abstract: The recently proposed Minimal Complexity Machine (MCM) finds a hyperplane classifier by minimizing an exact bound on the Vapnik-Chervonenkis (VC) dimension. The VC dimension measures the capacity of a learning machine, and a smaller VC dimension leads to improved generalization. On many benchmark datasets, the MCM generalizes better than SVMs and uses far fewer support vectors than the number used… ▽ More The recently proposed Minimal Complexity Machine (MCM) finds a hyperplane classifier by minimizing an exact bound on the Vapnik-Chervonenkis (VC) dimension. The VC dimension measures the capacity of a learning machine, and a smaller VC dimension leads to improved generalization. On many benchmark datasets, the MCM generalizes better than SVMs and uses far fewer support vectors than the number used by SVMs. In this paper, we describe a neural network based on a linear dynamical system, that converges to the MCM solution. The proposed MCM dynamical system is conducive to an analogue circuit implementation on a chip or simulation using Ordinary Differential Equation (ODE) solvers. Numerical experiments on benchmark datasets from the UCI repository show that the proposed approach is scalable and accurate, as we obtain improved accuracies and fewer number of support vectors (upto 74.3% reduction) with the MCM dynamical system. △ Less

Submitted 10 March, 2015; originally announced March 2015.

Comments: 15 pages, 3 figures

MSC Class: 70G660; 68T05 ACM Class: I.5.1; I.5.5; G.1.7; I.2.6

Journal ref: Neural Networks, Volume 132, 2020, Pages 405-415

arXiv:1501.02432 [pdf, ps, other]

Learning a Fuzzy Hyperplane Fat Margin Classifier with Minimum VC dimension

Authors: Jayadeva, Sanjit Singh Batra, Siddarth Sabharwal

Abstract: The Vapnik-Chervonenkis (VC) dimension measures the complexity of a learning machine, and a low VC dimension leads to good generalization. The recently proposed Minimal Complexity Machine (MCM) learns a hyperplane classifier by minimizing an exact bound on the VC dimension. This paper extends the MCM classifier to the fuzzy domain. The use of a fuzzy membership is known to reduce the effect of out… ▽ More The Vapnik-Chervonenkis (VC) dimension measures the complexity of a learning machine, and a low VC dimension leads to good generalization. The recently proposed Minimal Complexity Machine (MCM) learns a hyperplane classifier by minimizing an exact bound on the VC dimension. This paper extends the MCM classifier to the fuzzy domain. The use of a fuzzy membership is known to reduce the effect of outliers, and to reduce the effect of noise on learning. Experimental results show, that on a number of benchmark datasets, the the fuzzy MCM classifier outperforms SVMs and the conventional MCM in terms of generalization, and that the fuzzy MCM uses fewer support vectors. On several benchmark datasets, the fuzzy MCM classifier yields excellent test set accuracies while using one-tenth the number of support vectors used by SVMs. △ Less

Submitted 11 January, 2015; originally announced January 2015.

Comments: arXiv admin note: text overlap with arXiv:1410.4573

MSC Class: 68T05; 68T10; 68Q32 ACM Class: I.5.1; I.5.2

arXiv:1410.7372 [pdf, ps, other]

Feature Selection through Minimization of the VC dimension

Authors: Jayadeva, Sanjit S. Batra, Siddharth Sabharwal

Abstract: Feature selection involes identifying the most relevant subset of input features, with a view to improving generalization of predictive models by reducing overfitting. Directly searching for the most relevant combination of attributes is NP-hard. Variable selection is of critical importance in many applications, such as micro-array data analysis, where selecting a small number of discriminative fe… ▽ More Feature selection involes identifying the most relevant subset of input features, with a view to improving generalization of predictive models by reducing overfitting. Directly searching for the most relevant combination of attributes is NP-hard. Variable selection is of critical importance in many applications, such as micro-array data analysis, where selecting a small number of discriminative features is crucial to develo** useful models of disease mechanisms, as well as for prioritizing targets for drug discovery. The recently proposed Minimal Complexity Machine (MCM) provides a way to learn a hyperplane classifier by minimizing an exact (\boldmath{$Θ$}) bound on its VC dimension. It is well known that a lower VC dimension contributes to good generalization. For a linear hyperplane classifier in the input space, the VC dimension is upper bounded by the number of features; hence, a linear classifier with a small VC dimension is parsimonious in the set of features it employs. In this paper, we use the linear MCM to learn a classifier in which a large number of weights are zero; features with non-zero weights are the ones that are chosen. Selected features are used to learn a kernel SVM classifier. On a number of benchmark datasets, the features chosen by the linear MCM yield comparable or better test set accuracy than when methods such as ReliefF and FCBF are used for the task. The linear MCM typically chooses one-tenth the number of attributes chosen by the other methods; on some very high dimensional datasets, the MCM chooses about $0.6\%$ of the features; in comparison, ReliefF and FCBF choose 70 to 140 times more features, thus demonstrating that minimizing the VC dimension may provide a new, and very effective route for feature selection and for learning sparse representations. △ Less

Submitted 27 October, 2014; originally announced October 2014.

Comments: arXiv admin note: text overlap with arXiv:1410.4573

MSC Class: 68T05; 68T10; 68Q32 ACM Class: I.5.1; I.5.2

arXiv:1410.4573 [pdf, ps, other]

doi 10.1016/j.neucom.2015.06.065

Learning a hyperplane regressor by minimizing an exact bound on the VC dimension

Authors: Jayadeva, Suresh Chandra, Siddarth Sabharwal, Sanjit S. Batra

Abstract: The capacity of a learning machine is measured by its Vapnik-Chervonenkis dimension, and learning machines with a low VC dimension generalize better. It is well known that the VC dimension of SVMs can be very large or unbounded, even though they generally yield state-of-the-art learning performance. In this paper, we show how to learn a hyperplane regressor by minimizing an exact, or \boldmath{… ▽ More The capacity of a learning machine is measured by its Vapnik-Chervonenkis dimension, and learning machines with a low VC dimension generalize better. It is well known that the VC dimension of SVMs can be very large or unbounded, even though they generally yield state-of-the-art learning performance. In this paper, we show how to learn a hyperplane regressor by minimizing an exact, or \boldmath{$Θ$} bound on its VC dimension. The proposed approach, termed as the Minimal Complexity Machine (MCM) Regressor, involves solving a simple linear programming problem. Experimental results show, that on a number of benchmark datasets, the proposed approach yields regressors with error rates much less than those obtained with conventional SVM regresssors, while often using fewer support vectors. On some benchmark datasets, the number of support vectors is less than one tenth the number used by SVMs, indicating that the MCM does indeed learn simpler representations. △ Less

Submitted 16 October, 2014; originally announced October 2014.

Comments: see http://www.sciencedirect.com/science/article/pii/S0925231214010194 or arXiv:1408.2803 for background information

MSC Class: 68T05; 68T10; 68Q32 ACM Class: I.5.1, I.5.2

Journal ref: Neurocomputing, Volume 171, 1 January 2016, Pages 1610-1616, ISSN 0925-2312

arXiv:1408.5559 [pdf, ps, other]

Convergence results for continuous-time dynamics arising in ant colony optimization

Authors: Pierre-Alexandre Bliman, Amit Bhaya, Eugenius Kaszkurewicz, Jayadeva

Abstract: This paper studies the asymptotic behavior of several continuous-time dynamical systems which are analogs of ant colony optimization algorithms that solve shortest path problems. Local asymptotic stability of the equilibrium corresponding to the shortest path is shown under mild assumptions. A complete study is given for a recently proposed model called EigenAnt: global asymptotic stability is sho… ▽ More This paper studies the asymptotic behavior of several continuous-time dynamical systems which are analogs of ant colony optimization algorithms that solve shortest path problems. Local asymptotic stability of the equilibrium corresponding to the shortest path is shown under mild assumptions. A complete study is given for a recently proposed model called EigenAnt: global asymptotic stability is shown, and the speed of convergence is calculated explicitly and shown to be proportional to the difference between the reciprocals of the second shortest and the shortest paths. △ Less

Submitted 24 August, 2014; originally announced August 2014.

Comments: A short version of this paper was published in the preprints of the 19th World Congress of the International Federation of Automatic Control, Cape Town, South Africa, 24-29 August 2014

arXiv:1408.2803 [pdf, ps, other]

doi 10.1016/j.neucom.2014.07.062

Learning a hyperplane classifier by minimizing an exact bound on the VC dimension

Authors: Jayadeva

Abstract: The VC dimension measures the capacity of a learning machine, and a low VC dimension leads to good generalization. While SVMs produce state-of-the-art learning performance, it is well known that the VC dimension of a SVM can be unbounded; despite good results in practice, there is no guarantee of good generalization. In this paper, we show how to learn a hyperplane classifier by minimizing an exac… ▽ More The VC dimension measures the capacity of a learning machine, and a low VC dimension leads to good generalization. While SVMs produce state-of-the-art learning performance, it is well known that the VC dimension of a SVM can be unbounded; despite good results in practice, there is no guarantee of good generalization. In this paper, we show how to learn a hyperplane classifier by minimizing an exact, or \boldmath{$Θ$} bound on its VC dimension. The proposed approach, termed as the Minimal Complexity Machine (MCM), involves solving a simple linear programming problem. Experimental results show, that on a number of benchmark datasets, the proposed approach learns classifiers with error rates much less than conventional SVMs, while often using fewer support vectors. On many benchmark datasets, the number of support vectors is less than one-tenth the number used by SVMs, indicating that the MCM does indeed learn simpler representations. △ Less

Submitted 13 August, 2014; v1 submitted 12 August, 2014; originally announced August 2014.

Comments: Accepted Author Manuscript (Neurocomputing, Elsevier); 10 pages

MSC Class: 68T05; 68T10; 68Q32; ACM Class: I.5.1, I.5.2

Journal ref: Neurocomputing, Volume 149, Part B, 3 February 2015, Pages 683-689, ISSN 0925-2312,

Showing 1–35 of 35 results for author: Jayadeva