Search | arXiv e-print repository

arXiv:2405.20971 [pdf, other]

Amortizing intractable inference in diffusion models for vision, language, and control

Authors: Siddarth Venkatraman, Moksh Jain, Luca Scimeca, Minsu Kim, Marcin Sendera, Mohsin Hasan, Luke Rowe, Sarthak Mittal, Pablo Lemos, Emmanuel Bengio, Alexandre Adam, Jarrid Rector-Brooks, Yoshua Bengio, Glen Berseth, Nikolay Malkin

Abstract: Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$, in a model that consists of a diffusion generat… ▽ More Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data, $\mathbf{x}\sim p^{\rm post}(\mathbf{x})\propto p(\mathbf{x})r(\mathbf{x})$, in a model that consists of a diffusion generative model prior $p(\mathbf{x})$ and a black-box constraint or likelihood function $r(\mathbf{x})$. We state and prove the asymptotic correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from this posterior, a problem that existing methods solve only approximately or in restricted cases. Relative trajectory balance arises from the generative flow network perspective on diffusion models, which allows the use of deep reinforcement learning techniques to improve mode coverage. Experiments illustrate the broad potential of unbiased inference of arbitrary posteriors under diffusion priors: in vision (classifier guidance), language (infilling under a discrete diffusion LLM), and multimodal data (text-to-image generation). Beyond generative modeling, we apply relative trajectory balance to the problem of continuous control with a score-based behavior prior, achieving state-of-the-art results on benchmarks in offline reinforcement learning. △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: Code: https://github.com/GFNOrg/diffusion-finetuning

arXiv:2405.18540 [pdf, other]

Learning diverse attacks on large language models for robust red-teaming and safety tuning

Authors: Seanie Lee, Minsu Kim, Lynn Cherif, David Dobre, Juho Lee, Sung Ju Hwang, Kenji Kawaguchi, Gauthier Gidel, Yoshua Bengio, Nikolay Malkin, Moksh Jain

Abstract: Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs). Develo** effective protection against many modes of attack prompts requires discovering diverse attacks. Automated red-teaming typically uses reinforcement learning to fine-tune an attacker language model to generate prompts that e… ▽ More Red-teaming, or identifying prompts that elicit harmful responses, is a critical step in ensuring the safe and responsible deployment of large language models (LLMs). Develo** effective protection against many modes of attack prompts requires discovering diverse attacks. Automated red-teaming typically uses reinforcement learning to fine-tune an attacker language model to generate prompts that elicit undesirable responses from a target LLM, as measured, for example, by an auxiliary toxicity classifier. We show that even with explicit regularization to favor novelty and diversity, existing approaches suffer from mode collapse or fail to generate effective attacks. As a flexible and probabilistically principled alternative, we propose to use GFlowNet fine-tuning, followed by a secondary smoothing phase, to train the attacker model to generate diverse and effective attack prompts. We find that the attacks generated by our method are effective against a wide range of target LLMs, both with and without safety tuning, and transfer well between target LLMs. Finally, we demonstrate that models safety-tuned using a dataset of red-teaming prompts generated by our method are robust to attacks from other RL-based red-teaming approaches. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.13546 [pdf, other]

Knowledge-Driven Cross-Document Relation Extraction

Authors: Monika Jain, Raghava Mutharaju, Kuldeep Singh, Ramakanth Kavuluru

Abstract: Relation extraction (RE) is a well-known NLP application often treated as a sentence- or document-level task. However, a handful of recent efforts explore it across documents or in the cross-document setting (CrossDocRE). This is distinct from the single document case because different documents often focus on disparate themes, while text within a document tends to have a single goal. Linking find… ▽ More Relation extraction (RE) is a well-known NLP application often treated as a sentence- or document-level task. However, a handful of recent efforts explore it across documents or in the cross-document setting (CrossDocRE). This is distinct from the single document case because different documents often focus on disparate themes, while text within a document tends to have a single goal. Linking findings from disparate documents to identify new relationships is at the core of the popular literature-based knowledge discovery paradigm in biomedicine and other domains. Current CrossDocRE efforts do not consider domain knowledge, which are often assumed to be known to the reader when documents are authored. Here, we propose a novel approach, KXDocRE, that embed domain knowledge of entities with input text for cross-document RE. Our proposed framework has three main benefits over baselines: 1) it incorporates domain knowledge of entities along with documents' text; 2) it offers interpretability by producing explanatory text for predicted relations between entities 3) it improves performance over the prior methods. △ Less

Submitted 18 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: Accepted in ACL 2024 Findings

arXiv:2405.01616 [pdf, other]

Generative Active Learning for the Search of Small-molecule Protein Binders

Authors: Maksym Korablyov, Cheng-Hao Liu, Moksh Jain, Almer M. van der Sloot, Eric Jolicoeur, Edward Ruediger, Andrei Cristian Nica, Emmanuel Bengio, Kostiantyn Lapchevskyi, Daniel St-Cyr, Doris Alexandra Schuetz, Victor Ion Butoi, Jarrid Rector-Brooks, Simon Blackburn, Leo Feng, Hadi Nekoei, SaiKrishna Gottipati, Priyesh Vijayan, Prateek Gupta, Ladislav Rampášek, Sasikanth Avancha, Pierre-Luc Bacon, William L. Hamilton, Brooks Paige, Sanchit Misra , et al. (9 additional authors not shown)

Abstract: Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecu… ▽ More Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exhibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.10094 [pdf, other]

Towards DNA-Encoded Library Generation with GFlowNets

Authors: Michał Koziarski, Mohammed Abukalam, Vedant Shah, Louis Vaillancourt, Doris Alexandra Schuetz, Moksh Jain, Almer van der Sloot, Mathieu Bourgey, Anne Marinier, Yoshua Bengio

Abstract: DNA-encoded libraries (DELs) are a powerful approach for rapidly screening large numbers of diverse compounds. One of the key challenges in using DELs is library design, which involves choosing the building blocks that will be combinatorially combined to produce the final library. In this paper we consider the task of protein-protein interaction (PPI) biased DEL design. To this end, we evaluate se… ▽ More DNA-encoded libraries (DELs) are a powerful approach for rapidly screening large numbers of diverse compounds. One of the key challenges in using DELs is library design, which involves choosing the building blocks that will be combinatorially combined to produce the final library. In this paper we consider the task of protein-protein interaction (PPI) biased DEL design. To this end, we evaluate several machine learning algorithms on the PPI modulation task and use them as a reward for the proposed GFlowNet-based generative approach. We additionally investigate the possibility of using structural information about building blocks to design a hierarchical action space for the GFlowNet. The observed results indicate that GFlowNets are a promising approach for generating diverse combinatorial library candidates. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.08423 [pdf, other]

SIR-RL: Reinforcement Learning for Optimized Policy Control during Epidemiological Outbreaks in Emerging Market and Develo** Economies

Authors: Maeghal Jain, Ziya Uddin, Wubshet Ibrahim

Abstract: The outbreak of COVID-19 has highlighted the intricate interplay between public health and economic stability on a global scale. This study proposes a novel reinforcement learning framework designed to optimize health and economic outcomes during pandemics. The framework leverages the SIR model, integrating both lockdown measures (via a stringency index) and vaccination strategies to simulate dise… ▽ More The outbreak of COVID-19 has highlighted the intricate interplay between public health and economic stability on a global scale. This study proposes a novel reinforcement learning framework designed to optimize health and economic outcomes during pandemics. The framework leverages the SIR model, integrating both lockdown measures (via a stringency index) and vaccination strategies to simulate disease dynamics. The stringency index, indicative of the severity of lockdown measures, influences both the spread of the disease and the economic health of a country. Develo** nations, which bear a disproportionate economic burden under stringent lockdowns, are the primary focus of our study. By implementing reinforcement learning, we aim to optimize governmental responses and strike a balance between the competing costs associated with public health and economic stability. This approach also enhances transparency in governmental decision-making by establishing a well-defined reward function for the reinforcement learning agent. In essence, this study introduces an innovative and ethical strategy to navigate the challenge of balancing public health and economic stability amidst infectious disease outbreaks. △ Less

Submitted 30 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: 27 pages, 12 figures

arXiv:2402.04620 [pdf, other]

CataractBot: An LLM-Powered Expert-in-the-Loop Chatbot for Cataract Patients

Authors: Pragnya Ramjee, Bhuvan Sachdeva, Satvik Golechha, Shreyas Kulkarni, Geeta Fulari, Kaushik Murali, Mohit Jain

Abstract: The healthcare landscape is evolving, with patients seeking more reliable information about their health conditions, treatment options, and potential risks. Despite the abundance of information sources, the digital age overwhelms individuals with excess, often inaccurate information. Patients primarily trust doctors and hospital staff, highlighting the need for expert-endorsed health information.… ▽ More The healthcare landscape is evolving, with patients seeking more reliable information about their health conditions, treatment options, and potential risks. Despite the abundance of information sources, the digital age overwhelms individuals with excess, often inaccurate information. Patients primarily trust doctors and hospital staff, highlighting the need for expert-endorsed health information. However, the pressure on experts has led to reduced communication time, impacting information sharing. To address this gap, we propose CataractBot, an experts-in-the-loop chatbot powered by large language models (LLMs). Developed in collaboration with a tertiary eye hospital in India, CataractBot answers cataract surgery related questions instantly by querying a curated knowledge base, and provides expert-verified responses asynchronously. CataractBot features multimodal support and multilingual capabilities. In an in-the-wild deployment study with 49 participants, CataractBot proved valuable, providing anytime accessibility, saving time, and accommodating diverse literacy levels. Trust was established through expert verification. Broadly, our results could inform future work on designing expert-mediated LLM bots. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2401.11800 [pdf, other]

Revisiting Document-Level Relation Extraction with Context-Guided Link Prediction

Authors: Monika Jain, Raghava Mutharaju, Ramakanth Kavuluru, Kuldeep Singh

Abstract: Document-level relation extraction (DocRE) poses the challenge of identifying relationships between entities within a document as opposed to the traditional RE setting where a single sentence is input. Existing approaches rely on logical reasoning or contextual cues from entities. This paper reframes document-level RE as link prediction over a knowledge graph with distinct benefits: 1) Our approac… ▽ More Document-level relation extraction (DocRE) poses the challenge of identifying relationships between entities within a document as opposed to the traditional RE setting where a single sentence is input. Existing approaches rely on logical reasoning or contextual cues from entities. This paper reframes document-level RE as link prediction over a knowledge graph with distinct benefits: 1) Our approach combines entity context with document-derived logical reasoning, enhancing link prediction quality. 2) Predicted links between entities offer interpretability, elucidating employed reasoning. We evaluate our approach on three benchmark datasets: DocRED, ReDocRED, and DWIE. The results indicate that our proposed method outperforms the state-of-the-art models and suggests that incorporating context-based link prediction techniques can enhance the performance of document-level relation extraction models. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: Accepted in AAAI 2024

arXiv:2312.11805 [pdf, other]

Gemini: A Family of Highly Capable Multimodal Models

Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2310.08774 [pdf, other]

PhyloGFN: Phylogenetic inference with generative flow networks

Authors: Mingyang Zhou, Zichao Yan, Elliot Layne, Nikolay Malkin, Dinghuai Zhang, Moksh Jain, Mathieu Blanchette, Yoshua Bengio

Abstract: Phylogenetics is a branch of computational biology that studies the evolutionary relationships among biological entities. Its long history and numerous applications notwithstanding, inference of phylogenetic trees from sequence data remains challenging: the high complexity of tree space poses a significant obstacle for the current combinatorial and probabilistic techniques. In this paper, we adopt… ▽ More Phylogenetics is a branch of computational biology that studies the evolutionary relationships among biological entities. Its long history and numerous applications notwithstanding, inference of phylogenetic trees from sequence data remains challenging: the high complexity of tree space poses a significant obstacle for the current combinatorial and probabilistic techniques. In this paper, we adopt the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and Bayesian phylogenetic inference. Because GFlowNets are well-suited for sampling complex combinatorial structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies and evolutionary distances. We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets. PhyloGFN is competitive with prior works in marginal likelihood estimation and achieves a closer fit to the target distribution than state-of-the-art variational inference methods. Our code is available at https://github.com/zmy1116/phylogfn. △ Less

Submitted 24 March, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.04363 [pdf, other]

Amortizing intractable inference in large language models

Authors: Edward J. Hu, Moksh Jain, Eric Elmoznino, Younesse Kaddar, Guillaume Lajoie, Yoshua Bengio, Nikolay Malkin

Abstract: Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions. This limits tractable querying of this knowledge to start-to-end autoregressive sampling. However, many tasks of interest -- including sequence continuation, infilling, and other forms of constrained generation -- involve sampling from intractable posterior distribu… ▽ More Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions. This limits tractable querying of this knowledge to start-to-end autoregressive sampling. However, many tasks of interest -- including sequence continuation, infilling, and other forms of constrained generation -- involve sampling from intractable posterior distributions. We address this limitation by using amortized Bayesian inference to sample from these intractable posteriors. Such amortization is algorithmically achieved by fine-tuning LLMs via diversity-seeking reinforcement learning algorithms: generative flow networks (GFlowNets). We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training and reward-maximizing policy optimization. As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem and demonstrate that our approach enables data-efficient adaptation of LLMs to tasks that require multi-step rationalization and tool use. △ Less

Submitted 13 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: ICLR 2024; 23 pages; code: https://github.com/GFNOrg/gfn-lm-tuning

arXiv:2310.03419 [pdf, other]

Pre-Training and Fine-Tuning Generative Flow Networks

Authors: Ling Pan, Moksh Jain, Kanika Madan, Yoshua Bengio

Abstract: Generative Flow Networks (GFlowNets) are amortized samplers that learn stochastic policies to sequentially generate compositional objects from a given unnormalized reward distribution. They can generate diverse sets of high-reward objects, which is an important consideration in scientific discovery tasks. However, as they are typically trained from a given extrinsic reward function, it remains an… ▽ More Generative Flow Networks (GFlowNets) are amortized samplers that learn stochastic policies to sequentially generate compositional objects from a given unnormalized reward distribution. They can generate diverse sets of high-reward objects, which is an important consideration in scientific discovery tasks. However, as they are typically trained from a given extrinsic reward function, it remains an important open challenge about how to leverage the power of pre-training and train GFlowNets in an unsupervised fashion for efficient adaptation to downstream tasks. Inspired by recent successes of unsupervised pre-training in various domains, we introduce a novel approach for reward-free pre-training of GFlowNets. By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet (OC-GFN) that learns to explore the candidate space. Specifically, OC-GFN learns to reach any targeted outcomes, akin to goal-conditioned policies in reinforcement learning. We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks. Nonetheless, adapting OC-GFN on a downstream task-specific reward involves an intractable marginalization over possible outcomes. We propose a novel way to approximate this marginalization by learning an amortized predictor enabling efficient fine-tuning. Extensive experimental results validate the efficacy of our approach, demonstrating the effectiveness of pre-training the OC-GFN, and its ability to swiftly adapt to downstream tasks and discover modes more efficiently. This work may serve as a foundation for further exploration of pre-training strategies in the context of GFlowNets. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2309.09495 [pdf, other]

PwR: Exploring the Role of Representations in Conversational Programming

Authors: Pradyumna YM, Vinod Ganesan, Dinesh Kumar Arumugam, Meghna Gupta, Nischith Shadagopan, Tanay Dixit, Sameer Segal, Pratyush Kumar, Mohit Jain, Sriram Rajamani

Abstract: Large Language Models (LLMs) have revolutionized programming and software engineering. AI programming assistants such as GitHub Copilot X enable conversational programming, narrowing the gap between human intent and code generation. However, prior literature has identified a key challenge--there is a gap between user's mental model of the system's understanding after a sequence of natural language… ▽ More Large Language Models (LLMs) have revolutionized programming and software engineering. AI programming assistants such as GitHub Copilot X enable conversational programming, narrowing the gap between human intent and code generation. However, prior literature has identified a key challenge--there is a gap between user's mental model of the system's understanding after a sequence of natural language utterances, and the AI system's actual understanding. To address this, we introduce Programming with Representations (PwR), an approach that uses representations to convey the system's understanding back to the user in natural language. We conducted an in-lab task-centered study with 14 users of varying programming proficiency and found that representations significantly improve understandability, and instilled a sense of agency among our participants. Expert programmers use them for verification, while intermediate programmers benefit from confirmation. Natural language-based development with LLMs, coupled with representations, promises to transform software development, making it more accessible and efficient. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: 23 pages, 3 figures, 2 tables, under submission for ACM CHI 2024

ACM Class: H.5.2

arXiv:2309.06386 [pdf, other]

Lung Diseases Image Segmentation using Faster R-CNNs

Authors: Mihir Jain

Abstract: Lung diseases are a leading cause of child mortality in the develo** world, with India accounting for approximately half of global pneumonia deaths (370,000) in 2016. Timely diagnosis is crucial for reducing mortality rates. This paper introduces a low-density neural network structure to mitigate topological challenges in deep networks. The network incorporates parameters into a feature pyramid,… ▽ More Lung diseases are a leading cause of child mortality in the develo** world, with India accounting for approximately half of global pneumonia deaths (370,000) in 2016. Timely diagnosis is crucial for reducing mortality rates. This paper introduces a low-density neural network structure to mitigate topological challenges in deep networks. The network incorporates parameters into a feature pyramid, enhancing data extraction and minimizing information loss. Soft Non-Maximal Suppression optimizes regional proposals generated by the Region Proposal Network. The study evaluates the model on chest X-ray images, computing a confusion matrix to determine accuracy, precision, sensitivity, and specificity. We analyze loss functions, highlighting their trends during training. The regional proposal loss and classification loss assess model performance during training and classification phases. This paper analysis lung disease detection and neural network structures. △ Less

Submitted 10 September, 2023; originally announced September 2023.

arXiv:2309.06082 [pdf, other]

A Machine Learning Framework to Deconstruct the Primary Drivers for Electricity Market Price Events

Authors: Milan Jain, Xueqing Sun, Sohom Datta, Abhishek Somani

Abstract: Power grids are moving towards 100% renewable energy source bulk power grids, and the overall dynamics of power system operations and electricity markets are changing. The electricity markets are not only dispatching resources economically but also taking into account various controllable actions like renewable curtailment, transmission congestion mitigation, and energy storage optimization to ens… ▽ More Power grids are moving towards 100% renewable energy source bulk power grids, and the overall dynamics of power system operations and electricity markets are changing. The electricity markets are not only dispatching resources economically but also taking into account various controllable actions like renewable curtailment, transmission congestion mitigation, and energy storage optimization to ensure grid reliability. As a result, price formations in electricity markets have become quite complex. Traditional root cause analysis and statistical approaches are rendered inapplicable to analyze and infer the main drivers behind price formation in the modern grid and markets with variable renewable energy (VRE). In this paper, we propose a machine learning-based analysis framework to deconstruct the primary drivers for price spike events in modern electricity markets with high renewable energy. The outcomes can be utilized for various critical aspects of market design, renewable dispatch and curtailment, operations, and cyber-security applications. The framework can be applied to any ISO or market data; however, in this paper, it is applied to open-source publicly available datasets from California Independent System Operator (CAISO) and ISO New England (ISO-NE). △ Less

Submitted 12 September, 2023; originally announced September 2023.

Comments: Published in IEEE PES GM 2023

arXiv:2309.01370 [pdf, other]

ReOnto: A Neuro-Symbolic Approach for Biomedical Relation Extraction

Authors: Monika Jain, Kuldeep Singh, Raghava Mutharaju

Abstract: Relation Extraction (RE) is the task of extracting semantic relationships between entities in a sentence and aligning them to relations defined in a vocabulary, which is generally in the form of a Knowledge Graph (KG) or an ontology. Various approaches have been proposed so far to address this task. However, applying these techniques to biomedical text often yields unsatisfactory results because i… ▽ More Relation Extraction (RE) is the task of extracting semantic relationships between entities in a sentence and aligning them to relations defined in a vocabulary, which is generally in the form of a Knowledge Graph (KG) or an ontology. Various approaches have been proposed so far to address this task. However, applying these techniques to biomedical text often yields unsatisfactory results because it is hard to infer relations directly from sentences due to the nature of the biomedical relations. To address these issues, we present a novel technique called ReOnto, that makes use of neuro symbolic knowledge for the RE task. ReOnto employs a graph neural network to acquire the sentence representation and leverages publicly accessible ontologies as prior knowledge to identify the sentential relation between two entities. The approach involves extracting the relation path between the two entities from the ontology. We evaluate the effect of using symbolic knowledge from ontologies with graph neural networks. Experimental results on two public biomedical datasets, BioRel and ADE, show that our method outperforms all the baselines (approximately by 3\%). △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: Accepted in ECML 2023

arXiv:2308.09726 [pdf, other]

Equitable Restless Multi-Armed Bandits: A General Framework Inspired By Digital Health

Authors: Jackson A. Killian, Manish Jain, Yugang Jia, Jonathan Amar, Erich Huang, Milind Tambe

Abstract: Restless multi-armed bandits (RMABs) are a popular framework for algorithmic decision making in sequential settings with limited resources. RMABs are increasingly being used for sensitive decisions such as in public health, treatment scheduling, anti-poaching, and -- the motivation for this work -- digital health. For such high stakes settings, decisions must both improve outcomes and prevent disp… ▽ More Restless multi-armed bandits (RMABs) are a popular framework for algorithmic decision making in sequential settings with limited resources. RMABs are increasingly being used for sensitive decisions such as in public health, treatment scheduling, anti-poaching, and -- the motivation for this work -- digital health. For such high stakes settings, decisions must both improve outcomes and prevent disparities between groups (e.g., ensure health equity). We study equitable objectives for RMABs (ERMABs) for the first time. We consider two equity-aligned objectives from the fairness literature, minimax reward and max Nash welfare. We develop efficient algorithms for solving each -- a water filling algorithm for the former, and a greedy algorithm with theoretically motivated nuance to balance disparate group sizes for the latter. Finally, we demonstrate across three simulation domains, including a new digital health model, that our approaches can be multiple times more equitable than the current state of the art without drastic sacrifices to utility. Our findings underscore our work's urgency as RMABs permeate into systems that impact human and wildlife outcomes. Code is available at https://github.com/google-research/socialgood/tree/equitable-rmab △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: 16 pages, 8 figures, 2 tables

arXiv:2306.17693 [pdf, other]

Thompson sampling for improved exploration in GFlowNets

Authors: Jarrid Rector-Brooks, Kanika Madan, Moksh Jain, Maksym Korablyov, Cheng-Hao Liu, Sarath Chandar, Nikolay Malkin, Yoshua Bengio

Abstract: Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering mod… ▽ More Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over compositional objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work. △ Less

Submitted 30 June, 2023; originally announced June 2023.

Comments: Structured Probabilistic Inference and Generative Modeling (SPIGM) workshop @ ICML 2023

arXiv:2306.15058 [pdf, other]

BatchGFN: Generative Flow Networks for Batch Active Learning

Authors: Shreshth A. Malik, Salem Lahlou, Andrew Jesson, Moksh Jain, Nikolay Malkin, Tristan Deleu, Yoshua Bengio, Yarin Gal

Abstract: We introduce BatchGFN -- a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points proportional to a batch reward. With an appropriate reward function to quantify the utility of acquiring a batch, such as the joint mutual information between the batch and the model parameters, BatchGFN is able to construct highly informative batches for active… ▽ More We introduce BatchGFN -- a novel approach for pool-based active learning that uses generative flow networks to sample sets of data points proportional to a batch reward. With an appropriate reward function to quantify the utility of acquiring a batch, such as the joint mutual information between the batch and the model parameters, BatchGFN is able to construct highly informative batches for active learning in a principled way. We show our approach enables sampling near-optimal utility batches at inference time with a single forward pass per point in the batch in toy regression problems. This alleviates the computational complexity of batch-aware algorithms and removes the need for greedy approximations to find maximizers for the batch reward. We also present early results for amortizing training across acquisition steps, which will enable scaling to real-world tasks. △ Less

Submitted 26 June, 2023; originally announced June 2023.

Comments: Accepted at the Structured Probabilistic Inference & Generative Modeling workshop, ICML 2023

arXiv:2306.14939 [pdf, other]

The Art of Embedding Fusion: Optimizing Hate Speech Detection

Authors: Mohammad Aflah Khan, Neemesh Yadav, Mohit Jain, Sanyam Goyal

Abstract: Hate speech detection is a challenging natural language processing task that requires capturing linguistic and contextual nuances. Pre-trained language models (PLMs) offer rich semantic representations of text that can improve this task. However there is still limited knowledge about ways to effectively combine representations across PLMs and leverage their complementary strengths. In this work, w… ▽ More Hate speech detection is a challenging natural language processing task that requires capturing linguistic and contextual nuances. Pre-trained language models (PLMs) offer rich semantic representations of text that can improve this task. However there is still limited knowledge about ways to effectively combine representations across PLMs and leverage their complementary strengths. In this work, we shed light on various combination techniques for several PLMs and comprehensively analyze their effectiveness. Our findings show that combining embeddings leads to slight improvements but at a high computational cost and the choice of combination has marginal effect on the final outcome. We also make our codebase public at https://github.com/aflah02/The-Art-of-Embedding-Fusion-Optimizing-Hate-Speech-Detection . △ Less

Submitted 8 October, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: Published as a Tiny Paper at ICLR 2023, 12 Pages

arXiv:2306.11715 [pdf, other]

Multi-Fidelity Active Learning with GFlowNets

Authors: Alex Hernandez-Garcia, Nikita Saxena, Moksh Jain, Cheng-Hao Liu, Yoshua Bengio

Abstract: In the last decades, the capacity to generate large amounts of data in science and engineering applications has been growing steadily. Meanwhile, the progress in machine learning has turned it into a suitable tool to process and utilise the available data. Nonetheless, many relevant scientific and engineering problems present challenges where current machine learning methods cannot yet efficiently… ▽ More In the last decades, the capacity to generate large amounts of data in science and engineering applications has been growing steadily. Meanwhile, the progress in machine learning has turned it into a suitable tool to process and utilise the available data. Nonetheless, many relevant scientific and engineering problems present challenges where current machine learning methods cannot yet efficiently leverage the available data and resources. For example, in scientific discovery, we are often faced with the problem of exploring very large, high-dimensional spaces, where querying a high fidelity, black-box objective function is very expensive. Progress in machine learning methods that can efficiently tackle such problems would help accelerate currently crucial areas such as drug and materials discovery. In this paper, we propose the use of GFlowNets for multi-fidelity active learning, where multiple approximations of the black-box function are available at lower fidelity and cost. GFlowNets are recently proposed methods for amortised probabilistic inference that have proven efficient for exploring large, high-dimensional spaces and can hence be practical in the multi-fidelity setting too. Here, we describe our algorithm for multi-fidelity active learning with GFlowNets and evaluate its performance in both well-studied synthetic tasks and practically relevant applications of molecular discovery. Our results show that multi-fidelity active learning with GFlowNets can efficiently leverage the availability of multiple oracles with different costs and fidelities to accelerate scientific discovery and engineering design. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: Code: https://github.com/nikita-0209/mf-al-gfn

arXiv:2305.09600 [pdf, other]

Deep Reinforcement Learning to Maximize Arterial Usage during Extreme Congestion

Authors: Ashutosh Dutta, Milan Jain, Arif Khan, Arun Sathanur

Abstract: Collisions, crashes, and other incidents on road networks, if left unmitigated, can potentially cause cascading failures that can affect large parts of the system. Timely handling such extreme congestion scenarios is imperative to reduce emissions, enhance productivity, and improve the quality of urban living. In this work, we propose a Deep Reinforcement Learning (DRL) approach to reduce traffic… ▽ More Collisions, crashes, and other incidents on road networks, if left unmitigated, can potentially cause cascading failures that can affect large parts of the system. Timely handling such extreme congestion scenarios is imperative to reduce emissions, enhance productivity, and improve the quality of urban living. In this work, we propose a Deep Reinforcement Learning (DRL) approach to reduce traffic congestion on multi-lane freeways during extreme congestion. The agent is trained to learn adaptive detouring strategies for congested freeway traffic such that the freeway lanes along with the local arterial network in proximity are utilized optimally, with rewards being congestion reduction and traffic speed improvement. The experimental setup is a 2.6-mile-long 4-lane freeway stretch in Shoreline, Washington, USA with two exits and associated arterial roads simulated on a microscopic and continuous multi-modal traffic simulator SUMO (Simulation of Urban MObility) while using parameterized traffic profiles generated using real-world traffic data. Our analysis indicates that DRL-based controllers can improve average traffic speed by 21\% when compared to no-action during steep congestion. The study further discusses the trade-offs involved in the choice of reward functions, the impact of human compliance on agent performance, and the feasibility of knowledge transfer from one agent to other to address data sparsity and scaling issues. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2305.07552 [pdf, other]

Dish detection in food platters: A framework for automated diet logging and nutrition management

Authors: Mansi Goel, Shashank Dargar, Shounak Ghatak, Nidhi Verma, Pratik Chauhan, Anushka Gupta, Nikhila Vishnumolakala, Hareesh Amuru, Ekta Gambhir, Ronak Chhajed, Meenal Jain, Astha Jain, Samiksha Garg, Nitesh Narwade, Nikhilesh Verhwani, Abhuday Tiwari, Kirti Vashishtha, Ganesh Bagler

Abstract: Diet is central to the epidemic of lifestyle disorders. Accurate and effortless diet logging is one of the significant bottlenecks for effective diet management and calorie restriction. Dish detection from food platters is a challenging problem due to a visually complex food layout. We present an end-to-end computational framework for diet management, from data compilation, annotation, and state-o… ▽ More Diet is central to the epidemic of lifestyle disorders. Accurate and effortless diet logging is one of the significant bottlenecks for effective diet management and calorie restriction. Dish detection from food platters is a challenging problem due to a visually complex food layout. We present an end-to-end computational framework for diet management, from data compilation, annotation, and state-of-the-art model identification to its mobile app implementation. As a case study, we implement the framework in the context of Indian food platters known for their complex presentation that poses a challenge for the automated detection of dishes. Starting with the 61 most popular Indian dishes, we identify the state-of-the-art model through a comparative analysis of deep-learning-based object detection architectures. Rooted in a meticulous compilation of 68,005 platter images with 134,814 manual dish annotations, we first compare ten architectures for multi-label classification to identify ResNet152 (mAP=84.51%) as the best model. YOLOv8x (mAP=87.70%) emerged as the best model architecture for dish detection among the eight deep-learning models implemented after a thorough performance evaluation. By comparing with the state-of-the-art model for the IndianFood10 dataset, we demonstrate the superior object detection performance of YOLOv8x for this subset and establish Resnet152 as the best architecture for multi-label classification. The models thus trained on richly annotated data can be extended to include dishes from across global cuisines. The proposed framework is demonstrated through a proof-of-concept mobile application with diverse applications for diet logging, food recommendation systems, nutritional interventions, and mitigation of lifestyle disorders. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: 11 pages, 5 figures, 5 tables. Submitted to the 8th International Conference on Computer Vision & Image Processing (CVIP-2023)

ACM Class: I.4.9; I.5.4; J.3

arXiv:2304.14916 [pdf, other]

"Can't Take the Pressure?": Examining the Challenges of Blood Pressure Estimation via Pulse Wave Analysis

Authors: Suril Mehta, Nipun Kwatra, Mohit Jain, Daniel McDuff

Abstract: The use of observed wearable sensor data (e.g., photoplethysmograms [PPG]) to infer health measures (e.g., glucose level or blood pressure) is a very active area of research. Such technology can have a significant impact on health screening, chronic disease management and remote monitoring. A common approach is to collect sensor data and corresponding labels from a clinical grade device (e.g., blo… ▽ More The use of observed wearable sensor data (e.g., photoplethysmograms [PPG]) to infer health measures (e.g., glucose level or blood pressure) is a very active area of research. Such technology can have a significant impact on health screening, chronic disease management and remote monitoring. A common approach is to collect sensor data and corresponding labels from a clinical grade device (e.g., blood pressure cuff), and train deep learning models to map one to the other. Although well intentioned, this approach often ignores a principled analysis of whether the input sensor data has enough information to predict the desired metric. We analyze the task of predicting blood pressure from PPG pulse wave analysis. Our review of the prior work reveals that many papers fall prey data leakage, and unrealistic constraints on the task and the preprocessing steps. We propose a set of tools to help determine if the input signal in question (e.g., PPG) is indeed a good predictor of the desired label (e.g., blood pressure). Using our proposed tools, we have found that blood pressure prediction using PPG has a high multi-valued map** factor of 33.2% and low mutual information of 9.8%. In comparison, heart rate prediction using PPG, a well-established task, has a very low multi-valued map** factor of 0.75% and high mutual information of 87.7%. We argue that these results provide a more realistic representation of the current progress towards to goal of wearable blood pressure measurement via PPG pulse wave analysis. △ Less

Submitted 23 April, 2023; originally announced April 2023.

arXiv:2304.00132 [pdf, other]

doi 10.1145/3544548.3581542

Understanding Journalists' Workflows in News Curation

Authors: Shubham Atreja, Shruthi Srinath, Mohit Jain, Joyojeet Pal

Abstract: With the increasing dominance of the internet as a source of news consumption, there has been a rise in the production and popularity of email newsletters compiled by individual journalists. However, there is little research on the processes of aggregation, and how these differ between expert journalists and trained machines. In this paper, we interviewed journalists who curate newsletters from ar… ▽ More With the increasing dominance of the internet as a source of news consumption, there has been a rise in the production and popularity of email newsletters compiled by individual journalists. However, there is little research on the processes of aggregation, and how these differ between expert journalists and trained machines. In this paper, we interviewed journalists who curate newsletters from around the world. Through an in-depth understanding of journalists' workflows, our findings lay out the role of their prior experience in the value they bring into the curation process, their use of algorithms in finding stories for their newsletter, and their internalization of their readers' interests and the context they are curating for. While identifying the role of human expertise, we highlight the importance of hybrid curation and provide design insights on how technology can support the work of these experts. △ Less

Submitted 31 March, 2023; originally announced April 2023.

Comments: accepted at CHI'23

arXiv:2303.03677 [pdf, other]

Training Machine Learning Models to Characterize Temporal Evolution of Disadvantaged Communities

Authors: Milan Jain, Narmadha Meenu Mohankumar, Heng Wan, Sumitrra Ganguly, Kyle D Wilson, David M Anderson

Abstract: Disadvantaged communities (DAC), as defined by the Justice40 initiative of the Department of Energy (DOE), USA, identifies census tracts across the USA to determine where benefits of climate and energy investments are or are not currently accruing. The DAC status not only helps in determining the eligibility for future Justice40-related investments but is also critical for exploring ways to achiev… ▽ More Disadvantaged communities (DAC), as defined by the Justice40 initiative of the Department of Energy (DOE), USA, identifies census tracts across the USA to determine where benefits of climate and energy investments are or are not currently accruing. The DAC status not only helps in determining the eligibility for future Justice40-related investments but is also critical for exploring ways to achieve equitable distribution of resources. However, designing inclusive and equitable strategies not just requires a good understanding of current demographics, but also a deeper analysis of the transformations that happened in those demographics over the years. In this paper, machine learning (ML) models are trained on publicly available census data from recent years to classify the DAC status at the census tracts level and then the trained model is used to classify DAC status for historical years. A detailed analysis of the feature and model selection along with the evolution of disadvantaged communities between 2013 and 2018 is presented in this study. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2302.11953 [pdf, other]

doi 10.1007/978-3-031-33380-4_9

MFBE: Leveraging Multi-Field Information of FAQs for Efficient Dense Retrieval

Authors: Debopriyo Banerjee, Mausam Jain, Ashish Kulkarni

Abstract: In the domain of question-answering in NLP, the retrieval of Frequently Asked Questions (FAQ) is an important sub-area which is well researched and has been worked upon for many languages. Here, in response to a user query, a retrieval system typically returns the relevant FAQs from a knowledge-base. The efficacy of such a system depends on its ability to establish semantic match between the query… ▽ More In the domain of question-answering in NLP, the retrieval of Frequently Asked Questions (FAQ) is an important sub-area which is well researched and has been worked upon for many languages. Here, in response to a user query, a retrieval system typically returns the relevant FAQs from a knowledge-base. The efficacy of such a system depends on its ability to establish semantic match between the query and the FAQs in real-time. The task becomes challenging due to the inherent lexical gap between queries and FAQs, lack of sufficient context in FAQ titles, scarcity of labeled data and high retrieval latency. In this work, we propose a bi-encoder-based query-FAQ matching model that leverages multiple combinations of FAQ fields (like, question, answer, and category) both during model training and inference. Our proposed Multi-Field Bi-Encoder (MFBE) model benefits from the additional context resulting from multiple FAQ fields and performs well even with minimal labeled data. We empirically support this claim through experiments on proprietary as well as open-source public datasets in both unsupervised and supervised settings. Our model achieves around 27% and 20% better top-1 accuracy for the FAQ retrieval task on internal and open datasets, respectively over the best performing baseline. △ Less

Submitted 21 March, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

Comments: The first two authors contributed equally to this work. 12 pages, 3 figures, 5 tables. Accepted at the 2023 Pacific-Asia Conference On Knowledge Discovery And Data Mining (PAKDD)

arXiv:2302.09465 [pdf, other]

Stochastic Generative Flow Networks

Authors: Ling Pan, Dinghuai Zhang, Moksh Jain, Longbo Huang, Yoshua Bengio

Abstract: Generative Flow Networks (or GFlowNets for short) are a family of probabilistic agents that learn to sample complex combinatorial structures through the lens of "inference as control". They have shown great potential in generating high-quality and diverse candidates from a given energy landscape. However, existing GFlowNets can be applied only to deterministic environments, and fail in more genera… ▽ More Generative Flow Networks (or GFlowNets for short) are a family of probabilistic agents that learn to sample complex combinatorial structures through the lens of "inference as control". They have shown great potential in generating high-quality and diverse candidates from a given energy landscape. However, existing GFlowNets can be applied only to deterministic environments, and fail in more general tasks with stochastic dynamics, which can limit their applicability. To overcome this challenge, this paper introduces Stochastic GFlowNets, a new algorithm that extends GFlowNets to stochastic environments. By decomposing state transitions into two steps, Stochastic GFlowNets isolate environmental stochasticity and learn a dynamics model to capture it. Extensive experimental results demonstrate that Stochastic GFlowNets offer significant advantages over standard GFlowNets as well as MCMC- and RL-based approaches, on a variety of standard benchmarks with stochastic dynamics. △ Less

Submitted 24 June, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

Comments: UAI 2023

arXiv:2302.06576 [pdf, other]

GFlowNet-EM for learning compositional latent variable models

Authors: Edward J. Hu, Nikolay Malkin, Moksh Jain, Katie Everett, Alexandros Graikos, Yoshua Bengio

Abstract: Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large number of possible configurations of the latents. A key tradeoff in modeling the posteriors over latents is between expressivity and tractable optimization. For algorithms based on expectation-maximization (EM), the E-step is often intractable without restrictiv… ▽ More Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large number of possible configurations of the latents. A key tradeoff in modeling the posteriors over latents is between expressivity and tractable optimization. For algorithms based on expectation-maximization (EM), the E-step is often intractable without restrictive approximations to the posterior. We propose the use of GFlowNets, algorithms for sampling from an unnormalized density by learning a stochastic policy for sequential construction of samples, for this intractable E-step. By training GFlowNets to sample from the posterior over latents, we take advantage of their strengths as amortized variational inference algorithms for complex distributions over discrete structures. Our approach, GFlowNet-EM, enables the training of expressive LVMs with discrete compositional latents, as shown by experiments on non-context-free grammar induction and on images using discrete variational autoencoders (VAEs) without conditional independence enforced in the encoder. △ Less

Submitted 3 June, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: ICML 2023; code: https://github.com/GFNOrg/GFlowNet-EM

arXiv:2302.00615 [pdf, other]

doi 10.1039/D3DD00002H

GFlowNets for AI-Driven Scientific Discovery

Authors: Moksh Jain, Tristan Deleu, Jason Hartford, Cheng-Hao Liu, Alex Hernandez-Garcia, Yoshua Bengio

Abstract: Tackling the most pressing problems for humanity, such as the climate crisis and the threat of global pandemics, requires accelerating the pace of scientific discovery. While science has traditionally relied on trial and error and even serendipity to a large extent, the last few decades have seen a surge of data-driven scientific discoveries. However, in order to truly leverage large-scale data se… ▽ More Tackling the most pressing problems for humanity, such as the climate crisis and the threat of global pandemics, requires accelerating the pace of scientific discovery. While science has traditionally relied on trial and error and even serendipity to a large extent, the last few decades have seen a surge of data-driven scientific discoveries. However, in order to truly leverage large-scale data sets and high-throughput experimental setups, machine learning methods will need to be further improved and better integrated in the scientific discovery pipeline. A key challenge for current machine learning methods in this context is the efficient exploration of very large search spaces, which requires techniques for estimating reducible (epistemic) uncertainty and generating sets of diverse and informative experiments to perform. This motivated a new probabilistic machine learning framework called GFlowNets, which can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop. GFlowNets learn to sample from a distribution given indirectly by a reward function corresponding to an unnormalized probability, which enables sampling diverse, high-reward candidates. GFlowNets can also be used to form efficient and amortized Bayesian posterior estimators for causal models conditioned on the already acquired experimental data. Having such posterior models can then provide estimators of epistemic uncertainty and information gain that can drive an experimental design policy. Altogether, here we will argue that GFlowNets can become a valuable tool for AI-driven scientific discovery, especially in scenarios of very large candidate spaces where we have access to cheap but inaccurate measurements or to expensive but accurate measurements. This is a common setting in the context of drug and material discovery, which we use as examples throughout the paper. △ Less

Submitted 27 June, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

Comments: 31 pages, 5 figures. Updated with camera-ready changes

arXiv:2301.02872 [pdf, other]

Machine Learning to Estimate Gross Loss of Jewelry for Wax Patterns

Authors: Mihir Jain, Kashish Jain, Sandip Mane

Abstract: In mass manufacturing of jewellery, the gross loss is estimated before manufacturing to calculate the wax weight of the pattern that would be investment casted to make multiple identical pieces of jewellery. Machine learning is a technology that is a part of AI which helps create a model with decision-making capabilities based on a large set of user-defined data. In this paper, the authors found a… ▽ More In mass manufacturing of jewellery, the gross loss is estimated before manufacturing to calculate the wax weight of the pattern that would be investment casted to make multiple identical pieces of jewellery. Machine learning is a technology that is a part of AI which helps create a model with decision-making capabilities based on a large set of user-defined data. In this paper, the authors found a way to use Machine Learning in the jewellery industry to estimate this crucial Gross Loss. Choosing a small data set of manufactured rings and via regression analysis, it was found out that there is a potential of reducing the error in estimation from +-2-3 to +-0.5 using ML Algorithms from historic data and attributes collected from the CAD file during the design phase itself. To evaluate the approach's viability, additional study must be undertaken with a larger data set. △ Less

Submitted 7 January, 2023; originally announced January 2023.

arXiv:2211.12374 [pdf, other]

An Emotion-Aware Multi-Task Approach to Fake News and Rumour Detection using Transfer Learning

Authors: Arjun Choudhry, Inder Khatri, Minni Jain, Dinesh Kumar Vishwakarma

Abstract: Social networking sites, blogs, and online articles are instant sources of news for internet users globally. However, in the absence of strict regulations mandating the genuineness of every text on social media, it is probable that some of these texts are fake news or rumours. Their deceptive nature and ability to propagate instantly can have an adverse effect on society. This necessitates the nee… ▽ More Social networking sites, blogs, and online articles are instant sources of news for internet users globally. However, in the absence of strict regulations mandating the genuineness of every text on social media, it is probable that some of these texts are fake news or rumours. Their deceptive nature and ability to propagate instantly can have an adverse effect on society. This necessitates the need for more effective detection of fake news and rumours on the web. In this work, we annotate four fake news detection and rumour detection datasets with their emotion class labels using transfer learning. We show the correlation between the legitimacy of a text with its intrinsic emotion for fake news and rumour detection, and prove that even within the same emotion class, fake and real news are often represented differently, which can be used for improved feature extraction. Based on this, we propose a multi-task framework for fake news and rumour detection, predicting both the emotion and legitimacy of the text. We train a variety of deep learning models in single-task and multi-task settings for a more comprehensive comparison. We further analyze the performance of our multi-task approach for fake news detection in cross-domain settings to verify its efficacy for better generalization across datasets, and to verify that emotions act as a domain-independent feature. Experimental results verify that our multi-task models consistently outperform their single-task counterparts in terms of accuracy, precision, recall, and F1 score, both for in-domain and cross-domain settings. We also qualitatively analyze the difference in performance in single-task and multi-task learning models. △ Less

Submitted 7 December, 2022; v1 submitted 22 November, 2022; originally announced November 2022.

Comments: Accepted in IEEE Transaction on Computational Social Systems 18 pages 5 figures

arXiv:2211.01885 [pdf, other]

Using U-Net Network for Efficient Brain Tumor Segmentation in MRI Images

Authors: Jason Walsh, Alice Othmani, Mayank Jain, Soumyabrata Dev

Abstract: Magnetic Resonance Imaging (MRI) is the most commonly used non-intrusive technique for medical image acquisition. Brain tumor segmentation is the process of algorithmically identifying tumors in brain MRI scans. While many approaches have been proposed in the literature for brain tumor segmentation, this paper proposes a lightweight implementation of U-Net. Apart from providing real-time segmentat… ▽ More Magnetic Resonance Imaging (MRI) is the most commonly used non-intrusive technique for medical image acquisition. Brain tumor segmentation is the process of algorithmically identifying tumors in brain MRI scans. While many approaches have been proposed in the literature for brain tumor segmentation, this paper proposes a lightweight implementation of U-Net. Apart from providing real-time segmentation of MRI scans, the proposed architecture does not need large amount of data to train the proposed lightweight U-Net. Moreover, no additional data augmentation step is required. The lightweight U-Net shows very promising results on BITE dataset and it achieves a mean intersection-over-union (IoU) of 89% while outperforming the standard benchmark algorithms. Additionally, this work demonstrates an effective use of the three perspective planes, instead of the original three-dimensional volumetric images, for simplified brain tumor segmentation. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: Published in Healthcare Analytics, 2022

arXiv:2211.00568 [pdf, other]

Consistent Training via Energy-Based GFlowNets for Modeling Discrete Joint Distributions

Authors: Chanakya Ekbote, Moksh Jain, Payel Das, Yoshua Bengio

Abstract: Generative Flow Networks (GFlowNets) have demonstrated significant performance improvements for generating diverse discrete objects $x$ given a reward function $R(x)$, indicating the utility of the object and trained independently from the GFlowNet by supervised learning to predict a desirable property $y$ given $x$. We hypothesize that this can lead to incompatibility between the inductive optimi… ▽ More Generative Flow Networks (GFlowNets) have demonstrated significant performance improvements for generating diverse discrete objects $x$ given a reward function $R(x)$, indicating the utility of the object and trained independently from the GFlowNet by supervised learning to predict a desirable property $y$ given $x$. We hypothesize that this can lead to incompatibility between the inductive optimization biases in training $R$ and in training the GFlowNet, potentially leading to worse samples and slow adaptation to changes in the distribution. In this work, we build upon recent work on jointly learning energy-based models with GFlowNets and extend it to learn the joint over multiple variables, which we call Joint Energy-Based GFlowNets (JEBGFNs), such as peptide sequences and their antimicrobial activity. Joint learning of the energy-based model, used as a reward for the GFlowNet, can resolve the issues of incompatibility since both the reward function $R$ and the GFlowNet sampler are trained jointly. We find that this joint training or joint energy-based formulation leads to significant improvements in generating anti-microbial peptides. As the training sequences arose out of evolutionary or artificial selection for high antibiotic activity, there is presumably some structure in the distribution of sequences that reveals information about the antibiotic activity. This results in an advantage to modeling their joint generatively vs. pure discriminative modeling. We also evaluate JEBGFN in an active learning setting for discovering anti-microbial peptides. △ Less

Submitted 2 November, 2022; v1 submitted 1 November, 2022; originally announced November 2022.

Comments: 9 Pages, 10 Figures

arXiv:2211.00112 [pdf, other]

Indexability is Not Enough for Whittle: Improved, Near-Optimal Algorithms for Restless Bandits

Authors: Abheek Ghosh, Dheeraj Nagaraj, Manish Jain, Milind Tambe

Abstract: We study the problem of planning restless multi-armed bandits (RMABs) with multiple actions. This is a popular model for multi-agent systems with applications like multi-channel communication, monitoring and machine maintenance tasks, and healthcare. Whittle index policies, which are based on Lagrangian relaxations, are widely used in these settings due to their simplicity and near-optimality unde… ▽ More We study the problem of planning restless multi-armed bandits (RMABs) with multiple actions. This is a popular model for multi-agent systems with applications like multi-channel communication, monitoring and machine maintenance tasks, and healthcare. Whittle index policies, which are based on Lagrangian relaxations, are widely used in these settings due to their simplicity and near-optimality under certain conditions. In this work, we first show that Whittle index policies can fail in simple and practically relevant RMAB settings, even when the RMABs are indexable. We discuss why the optimality guarantees fail and why asymptotic optimality may not translate well to practically relevant planning horizons. We then propose an alternate planning algorithm based on the mean-field method, which can provably and efficiently obtain near-optimal policies with a large number of arms, without the stringent structural assumptions required by the Whittle index policies. This borrows ideas from existing research with some improvements: our approach is hyper-parameter free, and we provide an improved non-asymptotic analysis which has: (a) no requirement for exogenous hyper-parameters and tighter polynomial dependence on known problem parameters; (b) high probability bounds which show that the reward of the policy is reliable; and (c) matching sub-optimality lower bounds for this algorithm with respect to the number of arms, thus demonstrating the tightness of our bounds. Our extensive experimental analysis shows that the mean-field approach matches or outperforms other baselines. △ Less

Submitted 28 February, 2023; v1 submitted 31 October, 2022; originally announced November 2022.

Comments: 21 pages; AAMAS'23 version with appendix

arXiv:2210.12928 [pdf, other]

GFlowOut: Dropout with Generative Flow Networks

Authors: Dianbo Liu, Moksh Jain, Bonaventure Dossou, Qianli Shen, Salem Lahlou, Anirudh Goyal, Nikolay Malkin, Chris Emezue, Dinghuai Zhang, Nadhir Hassen, Xu Ji, Kenji Kawaguchi, Yoshua Bengio

Abstract: Bayesian Inference offers principled tools to tackle many critical problems with modern neural networks such as poor calibration and generalization, and data inefficiency. However, scaling Bayesian inference to large architectures is challenging and requires restrictive approximations. Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference and to estimate unce… ▽ More Bayesian Inference offers principled tools to tackle many critical problems with modern neural networks such as poor calibration and generalization, and data inefficiency. However, scaling Bayesian inference to large architectures is challenging and requires restrictive approximations. Monte Carlo Dropout has been widely used as a relatively cheap way for approximate Inference and to estimate uncertainty with deep neural networks. Traditionally, the dropout mask is sampled independently from a fixed distribution. Recent works show that the dropout mask can be viewed as a latent variable, which can be inferred with variational inference. These methods face two important challenges: (a) the posterior distribution over masks can be highly multi-modal which can be difficult to approximate with standard variational inference and (b) it is not trivial to fully utilize sample-dependent information and correlation among dropout masks to improve posterior estimation. In this work, we propose GFlowOut to address these issues. GFlowOut leverages the recently proposed probabilistic framework of Generative Flow Networks (GFlowNets) to learn the posterior distribution over dropout masks. We empirically demonstrate that GFlowOut results in predictive distributions that generalize better to out-of-distribution data, and provide uncertainty estimates which lead to better performance in downstream tasks. △ Less

Submitted 23 June, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

arXiv:2210.12765 [pdf, other]

Multi-Objective GFlowNets

Authors: Moksh Jain, Sharath Chandra Raparthy, Alex Hernandez-Garcia, Jarrid Rector-Brooks, Yoshua Bengio, Santiago Miret, Emmanuel Bengio

Abstract: We study the problem of generating diverse candidates in the context of Multi-Objective Optimization. In many applications of machine learning such as drug discovery and material design, the goal is to generate candidates which simultaneously optimize a set of potentially conflicting objectives. Moreover, these objectives are often imperfect evaluations of some underlying property of interest, mak… ▽ More We study the problem of generating diverse candidates in the context of Multi-Objective Optimization. In many applications of machine learning such as drug discovery and material design, the goal is to generate candidates which simultaneously optimize a set of potentially conflicting objectives. Moreover, these objectives are often imperfect evaluations of some underlying property of interest, making it important to generate diverse candidates to have multiple options for expensive downstream evaluations. We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse Pareto optimal solutions, based on GFlowNets. We introduce two variants of MOGFNs: MOGFN-PC, which models a family of independent sub-problems defined by a scalarization function, with reward-conditional GFlowNets, and MOGFN-AL, which solves a sequence of sub-problems defined by an acquisition function in an active learning loop. Our experiments on wide variety of synthetic and benchmark tasks demonstrate advantages of the proposed methods in terms of the Pareto performance and importantly, improved candidate diversity, which is the main contribution of this work. △ Less

Submitted 17 July, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

Comments: 23 pages, 8 figures. ICML 2023. Code at: https://github.com/GFNOrg/multi-objective-gfn

arXiv:2210.07241 [pdf, other]

Visual Reinforcement Learning with Self-Supervised 3D Representations

Authors: Yanjie Ze, Nicklas Hansen, Yinbo Chen, Mohit Jain, Xiaolong Wang

Abstract: A prominent approach to visual Reinforcement Learning (RL) is to learn an internal state representation using self-supervised methods, which has the potential benefit of improved sample-efficiency and generalization through additional learning signal and inductive biases. However, while the real world is inherently 3D, prior efforts have largely been focused on leveraging 2D computer vision techni… ▽ More A prominent approach to visual Reinforcement Learning (RL) is to learn an internal state representation using self-supervised methods, which has the potential benefit of improved sample-efficiency and generalization through additional learning signal and inductive biases. However, while the real world is inherently 3D, prior efforts have largely been focused on leveraging 2D computer vision techniques as auxiliary self-supervision. In this work, we present a unified framework for self-supervised learning of 3D representations for motor control. Our proposed framework consists of two phases: a pretraining phase where a deep voxel-based 3D autoencoder is pretrained on a large object-centric dataset, and a finetuning phase where the representation is jointly finetuned together with RL on in-domain data. We empirically show that our method enjoys improved sample efficiency in simulated manipulation tasks compared to 2D representation learning methods. Additionally, our learned policies transfer zero-shot to a real robot setup with only approximate geometric correspondence, and successfully solve motor control tasks that involve gras** and lifting from a single, uncalibrated RGB camera. Code and videos are available at https://yanjieze.com/3d4rl/ . △ Less

Submitted 15 March, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: Accepted in RA-L 2023 and IROS 2023. Project page: https://yanjieze.com/3d4rl/

arXiv:2209.13518 [pdf]

Graph-Based Active Machine Learning Method for Diverse and Novel Antimicrobial Peptides Generation and Selection

Authors: Bonaventure F. P. Dossou, Dianbo Liu, Xu Ji, Moksh Jain, Almer M. van der Sloot, Roger Palou, Michael Tyers, Yoshua Bengio

Abstract: As antibiotic-resistant bacterial strains are rapidly spreading worldwide, infections caused by these strains are emerging as a global crisis causing the death of millions of people every year. Antimicrobial Peptides (AMPs) are one of the candidates to tackle this problem because of their potential diversity, and ability to favorably modulate the host immune response. However, large-scale screenin… ▽ More As antibiotic-resistant bacterial strains are rapidly spreading worldwide, infections caused by these strains are emerging as a global crisis causing the death of millions of people every year. Antimicrobial Peptides (AMPs) are one of the candidates to tackle this problem because of their potential diversity, and ability to favorably modulate the host immune response. However, large-scale screening of new AMP candidates is expensive, time-consuming, and now affordable in develo** countries, which need the treatments the most. In this work, we propose a novel active machine learning-based framework that statistically minimizes the number of wet-lab experiments needed to design new AMPs, while ensuring a high diversity and novelty of generated AMPs sequences, in multi-rounds of wet-lab AMP screening settings. Combining recurrent neural network models and a graph-based filter (GraphCC), our proposed approach delivers novel and diverse candidates and demonstrates better performances according to our defined metrics. △ Less

Submitted 18 September, 2022; originally announced September 2022.

Comments: Under Review at Sciences Advances

arXiv:2209.12782 [pdf, other]

Learning GFlowNets from partial episodes for improved convergence and stability

Authors: Kanika Madan, Jarrid Rector-Brooks, Maksym Korablyov, Emmanuel Bengio, Moksh Jain, Andrei Nica, Tom Bosc, Yoshua Bengio, Nikolay Malkin

Abstract: Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized target density and have been successfully used for various probabilistic modeling tasks. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. We argue that these… ▽ More Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized target density and have been successfully used for various probabilistic modeling tasks. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. We argue that these alternatives represent opposite ends of a gradient bias-variance tradeoff and propose a way to exploit this tradeoff to mitigate its harmful effects. Inspired by the TD($λ$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($λ$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths. We show that SubTB($λ$) accelerates sampler convergence in previously studied and new environments and enables training GFlowNets in environments with longer action sequences and sparser reward landscapes than what was possible before. We also perform a comparative analysis of stochastic gradient dynamics, shedding light on the bias-variance tradeoff in GFlowNet training and the advantages of subtrajectory balance. △ Less

Submitted 3 June, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: ICML 2023

arXiv:2208.05552 [pdf, other]

Towards Automating Retinoscopy for Refractive Error Diagnosis

Authors: Aditya Aggarwal, Siddhartha Gairola, Uddeshya Upadhyay, Akshay P Vasishta, Diwakar Rao, Aditya Goyal, Kaushik Murali, Nipun Kwatra, Mohit Jain

Abstract: Refractive error is the most common eye disorder and is the key cause behind correctable visual impairment, responsible for nearly 80% of the visual impairment in the US. Refractive error can be diagnosed using multiple methods, including subjective refraction, retinoscopy, and autorefractors. Although subjective refraction is the gold standard, it requires cooperation from the patient and hence i… ▽ More Refractive error is the most common eye disorder and is the key cause behind correctable visual impairment, responsible for nearly 80% of the visual impairment in the US. Refractive error can be diagnosed using multiple methods, including subjective refraction, retinoscopy, and autorefractors. Although subjective refraction is the gold standard, it requires cooperation from the patient and hence is not suitable for infants, young children, and developmentally delayed adults. Retinoscopy is an objective refraction method that does not require any input from the patient. However, retinoscopy requires a lens kit and a trained examiner, which limits its use for mass screening. In this work, we automate retinoscopy by attaching a smartphone to a retinoscope and recording retinoscopic videos with the patient wearing a custom pair of paper frames. We develop a video processing pipeline that takes retinoscopic videos as input and estimates the net refractive error based on our proposed extension of the retinoscopy mathematical model. Our system alleviates the need for a lens kit and can be performed by an untrained examiner. In a clinical trial with 185 eyes, we achieved a sensitivity of 91.0% and specificity of 74.0% on refractive error diagnosis. Moreover, the mean absolute error of our approach was 0.75$\pm$0.67D on net refractive error estimation compared to subjective refraction measurements. Our results indicate that our approach has the potential to be used as a retinoscopy-based refractive error screening tool in real-world medical settings. △ Less

Submitted 10 August, 2022; originally announced August 2022.

Comments: This paper is accepted for publication in IMWUT 2022

arXiv:2206.13395 [pdf, other]

Gait Cycle Reconstruction and Human Identification from Occluded Sequences

Authors: Abhishek Paul, Manav Mukesh Jain, **esh Jain, Pratik Chattopadhyay

Abstract: Gait-based person identification from videos captured at surveillance sites using Computer Vision-based techniques is quite challenging since these walking sequences are usually corrupted with occlusion, and a complete cycle of gait is not always available. In this work, we propose an effective neural network-based model to reconstruct the occluded frames in an input sequence before carrying out g… ▽ More Gait-based person identification from videos captured at surveillance sites using Computer Vision-based techniques is quite challenging since these walking sequences are usually corrupted with occlusion, and a complete cycle of gait is not always available. In this work, we propose an effective neural network-based model to reconstruct the occluded frames in an input sequence before carrying out gait recognition. Specifically, we employ LSTM networks to predict an embedding for each occluded frame both from the forward and the backward directions, and next fuse the predictions from the two LSTMs by employing a network of residual blocks and convolutional layers. While the LSTMs are trained to minimize the mean-squared loss, the fusion network is trained to optimize the pixel-wise cross-entropy loss between the ground-truth and the reconstructed samples. Evaluation of our approach has been done using synthetically occluded sequences generated from the OU-ISIR LP and CASIA-B data and real-occluded sequences present in the TUM-IITKGP data. The effectiveness of the proposed reconstruction model has been verified through the Dice score and gait-based recognition accuracy using some popular gait recognition methods. Comparative study with existing occlusion handling methods in gait recognition highlights the superiority of our proposed occlusion reconstruction approach over the others. △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:2205.03702 [pdf, other]

Keratoconus Classifier for Smartphone-based Corneal Topographer

Authors: Siddhartha Gairola, Pallavi Joshi, Anand Balasubramaniam, Kaushik Murali, Nipun Kwatra, Mohit Jain

Abstract: Keratoconus is a severe eye disease that leads to deformation of the cornea. It impacts people aged 10-25 years and is the leading cause of blindness in that demography. Corneal topography is the gold standard for keratoconus diagnosis. It is a non-invasive process performed using expensive and bulky medical devices called corneal topographers. This makes it inaccessible to large populations, espe… ▽ More Keratoconus is a severe eye disease that leads to deformation of the cornea. It impacts people aged 10-25 years and is the leading cause of blindness in that demography. Corneal topography is the gold standard for keratoconus diagnosis. It is a non-invasive process performed using expensive and bulky medical devices called corneal topographers. This makes it inaccessible to large populations, especially in the Global South. Low-cost smartphone-based corneal topographers, such as SmartKC, have been proposed to make keratoconus diagnosis accessible. Similar to medical-grade topographers, SmartKC outputs curvature heatmaps and quantitative metrics that need to be evaluated by doctors for keratoconus diagnosis. An automatic scheme for evaluation of these heatmaps and quantitative values can play a crucial role in screening keratoconus in areas where doctors are not available. In this work, we propose a dual-head convolutional neural network (CNN) for classifying keratoconus on the heatmaps generated by SmartKC. Since SmartKC is a new device and only had a small dataset (114 samples), we developed a 2-stage transfer learning strategy -- using historical data collected from a medical-grade topographer and a subset of SmartKC data -- to satisfactorily train our network. This, combined with our domain-specific data augmentations, achieved a sensitivity of 91.3% and a specificity of 94.2%. △ Less

Submitted 7 May, 2022; originally announced May 2022.

Comments: 4 pages

arXiv:2204.11115 [pdf]

Time Series Forecasting (TSF) Using Various Deep Learning Models

Authors: Jimeng Shi, Mahek Jain, Giri Narasimhan

Abstract: Time Series Forecasting (TSF) is used to predict the target variables at a future time point based on the learning from previous time points. To keep the problem tractable, learning methods use data from a fixed length window in the past as an explicit input. In this paper, we study how the performance of predictive models change as a function of different look-back window sizes and different amou… ▽ More Time Series Forecasting (TSF) is used to predict the target variables at a future time point based on the learning from previous time points. To keep the problem tractable, learning methods use data from a fixed length window in the past as an explicit input. In this paper, we study how the performance of predictive models change as a function of different look-back window sizes and different amounts of time to predict into the future. We also consider the performance of the recent attention-based Transformer models, which has had good success in the image processing and natural language processing domains. In all, we compare four different deep learning methods (RNN, LSTM, GRU, and Transformer) along with a baseline method. The dataset (hourly) we used is the Bei**g Air Quality Dataset from the UCI website, which includes a multivariate time series of many factors measured on an hourly basis for a period of 5 years (2010-14). For each model, we also report on the relationship between the performance and the look-back window sizes and the number of predicted time points into the future. Our experiments suggest that Transformer models have the best performance with the lowest Mean Average Errors (MAE = 14.599, 23.273) and Root Mean Square Errors (RSME = 23.573, 38.131) for most of our single-step and multi-steps predictions. The best size for the look-back window to predict 1 hour into the future appears to be one day, while 2 or 4 days perform the best to predict 3 hours into the future. △ Less

Submitted 23 April, 2022; originally announced April 2022.

Comments: 9 pages

arXiv:2204.02790 [pdf, other]

Global Readiness of Language Technology for Healthcare: What would it Take to Combat the Next Pandemic?

Authors: Ishani Mondal, Kabir Ahuja, Mohit Jain, Jacki O Neil, Kalika Bali, Monojit Choudhury

Abstract: The COVID-19 pandemic has brought out both the best and worst of language technology (LT). On one hand, conversational agents for information dissemination and basic diagnosis have seen widespread use, and arguably, had an important role in combating the pandemic. On the other hand, it has also become clear that such technologies are readily available for a handful of languages, and the vast major… ▽ More The COVID-19 pandemic has brought out both the best and worst of language technology (LT). On one hand, conversational agents for information dissemination and basic diagnosis have seen widespread use, and arguably, had an important role in combating the pandemic. On the other hand, it has also become clear that such technologies are readily available for a handful of languages, and the vast majority of the global south is completely bereft of these benefits. What is the state of LT, especially conversational agents, for healthcare across the world's languages? And, what would it take to ensure global readiness of LT before the next pandemic? In this paper, we try to answer these questions through survey of existing literature and resources, as well as through a rapid chatbot building exercise for 15 Asian and African languages with varying amount of resource-availability. The study confirms the pitiful state of LT even for languages with large speaker bases, such as Sinhala and Hausa, and identifies the gaps that could help us prioritize research and investment strategies in LT for healthcare. △ Less

Submitted 6 April, 2022; originally announced April 2022.

Comments: Under Revision

arXiv:2203.04115 [pdf, other]

Biological Sequence Design with GFlowNets

Authors: Moksh Jain, Emmanuel Bengio, Alex-Hernandez Garcia, Jarrid Rector-Brooks, Bonaventure F. P. Dossou, Chanakya Ekbote, Jie Fu, Tianyu Zhang, Micheal Kilgour, Dinghuai Zhang, Lena Simine, Payel Das, Yoshua Bengio

Abstract: Design of de novo biological sequences with desired properties, like protein and DNA sequences, often involves an active loop with several rounds of molecule ideation and expensive wet-lab evaluations. These experiments can consist of multiple stages, with increasing levels of precision and cost of evaluation, where candidates are filtered. This makes the diversity of proposed candidates a key con… ▽ More Design of de novo biological sequences with desired properties, like protein and DNA sequences, often involves an active loop with several rounds of molecule ideation and expensive wet-lab evaluations. These experiments can consist of multiple stages, with increasing levels of precision and cost of evaluation, where candidates are filtered. This makes the diversity of proposed candidates a key consideration in the ideation phase. In this work, we propose an active learning algorithm leveraging epistemic uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions, with the objective to obtain a diverse batch of useful (as defined by some utility function, for example, the predicted anti-microbial activity of a peptide) and informative candidates after each round. We also propose a scheme to incorporate existing labeled datasets of candidates, in addition to a reward function, to speed up learning in GFlowNets. We present empirical results on several biological sequence design tasks, and we find that our method generates more diverse and novel batches with high scoring candidates compared to existing approaches. △ Less

Submitted 24 May, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

Comments: ICML 2022. 15 pages, 3 figures. Code available at: https://github.com/MJ10/BioSeq-GFN-AL. Updated GFP results

arXiv:2201.13259 [pdf, other]

Trajectory balance: Improved credit assignment in GFlowNets

Authors: Nikolay Malkin, Moksh Jain, Emmanuel Bengio, Chen Sun, Yoshua Bengio

Abstract: Generative flow networks (GFlowNets) are a method for learning a stochastic policy for generating compositional objects, such as graphs or strings, from a given unnormalized density by sequences of actions, where many possible action sequences may lead to the same object. We find previously proposed learning objectives for GFlowNets, flow matching and detailed balance, which are analogous to tempo… ▽ More Generative flow networks (GFlowNets) are a method for learning a stochastic policy for generating compositional objects, such as graphs or strings, from a given unnormalized density by sequences of actions, where many possible action sequences may lead to the same object. We find previously proposed learning objectives for GFlowNets, flow matching and detailed balance, which are analogous to temporal difference learning, to be prone to inefficient credit propagation across long action sequences. We thus propose a new learning objective for GFlowNets, trajectory balance, as a more efficient alternative to previously used objectives. We prove that any global minimizer of the trajectory balance objective can define a policy that samples exactly from the target distribution. In experiments on four distinct domains, we empirically demonstrate the benefits of the trajectory balance objective for GFlowNet convergence, diversity of generated samples, and robustness to long action sequences and large action spaces. △ Less

Submitted 4 October, 2023; v1 submitted 31 January, 2022; originally announced January 2022.

Comments: NeurIPS 2022; see footnotes for code; v3 fixes minor errata

arXiv:2201.07779 [pdf, other]

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

Authors: Rishabh Jangir, Nicklas Hansen, Sambaran Ghosal, Mohit Jain, Xiaolong Wang

Abstract: Learning to solve precision-based manipulation tasks from visual feedback using Reinforcement Learning (RL) could drastically reduce the engineering efforts required by traditional robot systems. However, performing fine-grained motor control from visual inputs alone is challenging, especially with a static third-person camera as often used in previous work. We propose a setting for robotic manipu… ▽ More Learning to solve precision-based manipulation tasks from visual feedback using Reinforcement Learning (RL) could drastically reduce the engineering efforts required by traditional robot systems. However, performing fine-grained motor control from visual inputs alone is challenging, especially with a static third-person camera as often used in previous work. We propose a setting for robotic manipulation in which the agent receives visual feedback from both a third-person camera and an egocentric camera mounted on the robot's wrist. While the third-person camera is static, the egocentric camera enables the robot to actively control its vision to aid in precise manipulation. To fuse visual information from both cameras effectively, we additionally propose to use Transformers with a cross-view attention mechanism that models spatial attention from one view to another (and vice-versa), and use the learned features as input to an RL policy. Our method improves learning over strong single-view and multi-view baselines, and successfully transfers to a set of challenging manipulation tasks on a real robot with uncalibrated cameras, no access to state information, and a high degree of task variability. In a hammer manipulation task, our method succeeds in 75% of trials versus 38% and 13% for multi-view and single-view baselines, respectively. △ Less

Submitted 20 January, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

Comments: Accepted in Robotics and Automation Letters Journal (RA-L 2022). Website at https://jangirrishabh.github.io/lookcloser .8 Pages

arXiv:2112.08611 [pdf]

Clickbait in YouTube Prevention, Detection and Analysis of the Bait using Ensemble Learning

Authors: Peya Mowar, Mini Jain, Ruchika Goel, Dinesh Kumar Vishwakarma

Abstract: Unscrupulous content creators on YouTube employ deceptive techniques such as spam and clickbait to reach a broad audience and trick users into clicking on their videos to increase their advertisement revenue. Clickbait detection on YouTube requires an in depth examination and analysis of the intricate relationship between the video content and video descriptors title and thumbnail. However, the cu… ▽ More Unscrupulous content creators on YouTube employ deceptive techniques such as spam and clickbait to reach a broad audience and trick users into clicking on their videos to increase their advertisement revenue. Clickbait detection on YouTube requires an in depth examination and analysis of the intricate relationship between the video content and video descriptors title and thumbnail. However, the current solutions are mostly centred around the study of video descriptors and other metadata such as likes, tags, comments, etc and fail to utilize the video content, both video and audio. Therefore, we introduce a novel model to detect clickbaits on YouTube that consider the relationship between video content and title or thumbnail. The proposed model consists of a stacking classifier framework composed of six base models (K Nearest Neighbours, Support Vector Machine, XGBoost, Naive Bayes, Logistic Regression, and Multilayer Perceptron) and a meta classifier. The developed clickbait detection model achieved a high accuracy of 92.89% for the novel BollyBAIT dataset and 95.38% for Misleading Video Dataset. Additionally, the stated classifier does not use meta features or other statistics dependent on user interaction with the video (the number of likes, followers, or comments) for classification, and thus, can be used to detect potential clickbait videos before they are uploaded, thereby preventing the nuisance of clickbaits altogether and improving the users streaming experience. △ Less

Submitted 15 December, 2021; originally announced December 2021.

Comments: 26 pages, 16 figures

arXiv:2112.03849 [pdf, ps, other]

Natural Answer Generation: From Factoid Answer to Full-length Answer using Grammar Correction

Authors: Manas Jain, Sriparna Saha, Pushpak Bhattacharyya, Gladvin Chinnadurai, Manish Kumar Vatsa

Abstract: Question Answering systems these days typically use template-based language generation. Though adequate for a domain-specific task, these systems are too restrictive and predefined for domain-independent systems. This paper proposes a system that outputs a full-length answer given a question and the extracted factoid answer (short spans such as named entities) as the input. Our system uses constit… ▽ More Question Answering systems these days typically use template-based language generation. Though adequate for a domain-specific task, these systems are too restrictive and predefined for domain-independent systems. This paper proposes a system that outputs a full-length answer given a question and the extracted factoid answer (short spans such as named entities) as the input. Our system uses constituency and dependency parse trees of questions. A transformer-based Grammar Error Correction model GECToR (2020), is used as a post-processing step for better fluency. We compare our system with (i) Modified Pointer Generator (SOTA) and (ii) Fine-tuned DialoGPT for factoid questions. We also test our approach on existential (yes-no) questions with better results. Our model generates accurate and fluent answers than the state-of-the-art (SOTA) approaches. The evaluation is done on NewsQA and SqUAD datasets with an increment of 0.4 and 0.9 percentage points in ROUGE-1 score respectively. Also the inference time is reduced by 85\% as compared to the SOTA. The improved datasets used for our evaluation will be released as part of the research contribution. △ Less

Submitted 7 December, 2021; originally announced December 2021.

Showing 1–50 of 103 results for author: Jain, M