Search | arXiv e-print repository

Anticathode effect on electron kinetics in electron beam generated $\mathbf{E} \times \mathbf{B}$ plasma

Authors: Nirbhav Singh Chopra, Ivan Romadanov, Yevgeny Raitses

Abstract: Electron beam (e-beam) generated plasmas with applied crossed electric and magnetic $\left(\mathbf{E} \times \mathbf{B}\right)$ fields are promising for low damage processing of materials with applications to microelectronics and quantum information systems. In cylindrical e-beam $\mathbf{E} \times \mathbf{B}$ plasmas, radial confinement of electrons and ions is achieved by an axial magnetic field… ▽ More Electron beam (e-beam) generated plasmas with applied crossed electric and magnetic $\left(\mathbf{E} \times \mathbf{B}\right)$ fields are promising for low damage processing of materials with applications to microelectronics and quantum information systems. In cylindrical e-beam $\mathbf{E} \times \mathbf{B}$ plasmas, radial confinement of electrons and ions is achieved by an axial magnetic field and radial electric field, respectively. To control the axial confinement of electrons, such e-beam generated plasma sources may incorporate a conducting boundary known as an anticathode, which is placed on the axially opposite side of the plasma from the cathode. In this work, it is shown that varying the anticathode voltage bias can control the degree to which the anticathode collects or repels incident electrons, allowing control of warm electron (electron energies in 10-30 eV range) and beam electron population confinement. It is suggested that the effect of the anticathode bias on the formation of these distinct electron populations is also associated with the transition between weak turbulence and strong Langmuir turbulence. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 31 pages, 16 figures, submitted to Plasma Sources Science and Technology

arXiv:2406.04318 [pdf, other]

Adaptive Sampling of k-Space in Magnetic Resonance for Rapid Pathology Prediction

Authors: Chen-Yu Yen, Raghav Singhal, Umang Sharma, Rajesh Ranganath, Sumit Chopra, Lerrel Pinto

Abstract: Magnetic Resonance (MR) imaging, despite its proven diagnostic utility, remains an inaccessible imaging modality for disease surveillance at the population level. A major factor rendering MR inaccessible is lengthy scan times. An MR scanner collects measurements associated with the underlying anatomy in the Fourier space, also known as the k-space. Creating a high-fidelity image requires collectin… ▽ More Magnetic Resonance (MR) imaging, despite its proven diagnostic utility, remains an inaccessible imaging modality for disease surveillance at the population level. A major factor rendering MR inaccessible is lengthy scan times. An MR scanner collects measurements associated with the underlying anatomy in the Fourier space, also known as the k-space. Creating a high-fidelity image requires collecting large quantities of such measurements, increasing the scan time. Traditionally to accelerate an MR scan, image reconstruction from under-sampled k-space data is the method of choice. However, recent works show the feasibility of bypassing image reconstruction and directly learning to detect disease directly from a sparser learned subset of the k-space measurements. In this work, we propose Adaptive Sampling for MR (ASMR), a sampling method that learns an adaptive policy to sequentially select k-space samples to optimize for target disease detection. On 6 out of 8 pathology classification tasks spanning the Knee, Brain, and Prostate MR scans, ASMR reaches within 2% of the performance of a fully sampled classifier while using only 8% of the k-space, as well as outperforming prior state-of-the-art work in k-space sampling such as EMRT, LOUPE, and DPS. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: ICML 2024. Project website at https://adaptive-sampling-mr.github.io

arXiv:2405.17613 [pdf, other]

A Framework for Multi-modal Learning: Jointly Modeling Inter- & Intra-Modality Dependencies

Authors: Divyam Madaan, Taro Makino, Sumit Chopra, Kyunghyun Cho

Abstract: Supervised multi-modal learning involves map** multiple modalities to a target label. Previous studies in this field have concentrated on capturing in isolation either the inter-modality dependencies (the relationships between different modalities and the label) or the intra-modality dependencies (the relationships within a single modality and the label). We argue that these conventional approac… ▽ More Supervised multi-modal learning involves map** multiple modalities to a target label. Previous studies in this field have concentrated on capturing in isolation either the inter-modality dependencies (the relationships between different modalities and the label) or the intra-modality dependencies (the relationships within a single modality and the label). We argue that these conventional approaches that rely solely on either inter- or intra-modality dependencies may not be optimal in general. We view the multi-modal learning problem from the lens of generative models where we consider the target as a source of multiple modalities and the interaction between them. Towards that end, we propose inter- & intra-modality modeling (I2M2) framework, which captures and integrates both the inter- and intra-modality dependencies, leading to more accurate predictions. We evaluate our approach using real-world healthcare and vision-and-language datasets with state-of-the-art models, demonstrating superior performance over traditional methods focusing only on one type of modality dependency. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.09010 [pdf, ps, other]

On Low Field Size Constructions of Access-Optimal Convertible Codes

Authors: Saransh Chopra, Francisco Maturana, K. V. Rashmi

Abstract: Most large-scale storage systems employ erasure coding to provide resilience against disk failures. Recent work has shown that tuning this redundancy to changes in disk failure rates leads to substantial storage savings. This process requires code conversion, wherein data encoded using an $[n^{I\mskip-2mu},k^{I\mskip-2mu}]$ initial code has to be transformed into data encoded using an… ▽ More Most large-scale storage systems employ erasure coding to provide resilience against disk failures. Recent work has shown that tuning this redundancy to changes in disk failure rates leads to substantial storage savings. This process requires code conversion, wherein data encoded using an $[n^{I\mskip-2mu},k^{I\mskip-2mu}]$ initial code has to be transformed into data encoded using an $[n^{F\mskip-2mu},k^{F\mskip-2mu}]$ final code, a resource-intensive operation. Convertible codes are a class of codes that enable efficient code conversion while maintaining other desirable properties. In this paper, we focus on the access cost of conversion (total number of code symbols accessed in the conversion process) and on an important subclass of conversions known as the merge regime (combining multiple initial codewords into a single final codeword). In this setting, explicit constructions are known for systematic access-optimal Maximum Distance Separable (MDS) convertible codes for all parameters in the merge regime. However, the existing construction for a key subset of these parameters, which makes use of Vandermonde parity matrices, requires a large field size making it unsuitable for practical applications. In this paper, we provide (1) sharper bounds on the minimum field size requirement for such codes, and (2) explicit constructions for low field sizes for several parameter ranges. In doing so, we provide a proof of super-regularity of specially designed classes of Vandermonde matrices that could be of independent interest. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: This is an extended version of an IEEE ISIT 2024 paper with the same title

arXiv:2404.16478 [pdf, other]

Evaluating Consistency and Reasoning Capabilities of Large Language Models

Authors: Yash Saxena, Sarthak Chopra, Arunendra Mani Tripathi

Abstract: Large Language Models (LLMs) are extensively used today across various sectors, including academia, research, business, and finance, for tasks such as text generation, summarization, and translation. Despite their widespread adoption, these models often produce incorrect and misleading information, exhibiting a tendency to hallucinate. This behavior can be attributed to several factors, with consi… ▽ More Large Language Models (LLMs) are extensively used today across various sectors, including academia, research, business, and finance, for tasks such as text generation, summarization, and translation. Despite their widespread adoption, these models often produce incorrect and misleading information, exhibiting a tendency to hallucinate. This behavior can be attributed to several factors, with consistency and reasoning capabilities being significant contributors. LLMs frequently lack the ability to generate explanations and engage in coherent reasoning, leading to inaccurate responses. Moreover, they exhibit inconsistencies in their outputs. This paper aims to evaluate and compare the consistency and reasoning capabilities of both public and proprietary LLMs. The experiments utilize the Boolq dataset as the ground truth, comprising questions, answers, and corresponding explanations. Queries from the dataset are presented as prompts to the LLMs, and the generated responses are evaluated against the ground truth answers. Additionally, explanations are generated to assess the models' reasoning abilities. Consistency is evaluated by repeatedly presenting the same query to the models and observing for variations in their responses. For measuring reasoning capabilities, the generated explanations are compared to the ground truth explanations using metrics such as BERT, BLEU, and F-1 scores. The findings reveal that proprietary models generally outperform public models in terms of both consistency and reasoning capabilities. However, even when presented with basic general knowledge questions, none of the models achieved a score of 90\% in both consistency and reasoning. This study underscores the direct correlation between consistency and reasoning abilities in LLMs and highlights the inherent reasoning challenges present in current language models. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2403.16422 [pdf, other]

Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation

Authors: Sanyam Lakhanpal, Shivang Chopra, Vinija Jain, Aman Chadha, Man Luo

Abstract: Over the past few years, Text-to-Image (T2I) generation approaches based on diffusion models have gained significant attention. However, vanilla diffusion models often suffer from spelling inaccuracies in the text displayed within the generated images. The capability to generate visual text is crucial, offering both academic interest and a wide range of practical applications. To produce accurate… ▽ More Over the past few years, Text-to-Image (T2I) generation approaches based on diffusion models have gained significant attention. However, vanilla diffusion models often suffer from spelling inaccuracies in the text displayed within the generated images. The capability to generate visual text is crucial, offering both academic interest and a wide range of practical applications. To produce accurate visual text images, state-of-the-art techniques adopt a glyph-controlled image generation approach, consisting of a text layout generator followed by an image generator that is conditioned on the generated text layout. Nevertheless, our study reveals that these models still face three primary challenges, prompting us to develop a testbed to facilitate future research. We introduce a benchmark, LenCom-Eval, specifically designed for testing models' capability in generating images with Lengthy and Complex visual text. Subsequently, we introduce a training-free framework to enhance the two-stage generation approaches. We examine the effectiveness of our approach on both LenCom-Eval and MARIO-Eval benchmarks and demonstrate notable improvements across a range of evaluation metrics, including CLIPScore, OCR precision, recall, F1 score, accuracy, and edit distance scores. For instance, our proposed framework improves the backbone model, TextDiffuser, by more than 23\% and 13.5\% in terms of OCR word F1 on LenCom-Eval and MARIO-Eval, respectively. Our work makes a unique contribution to the field by focusing on generating images with long and rare text sequences, a niche previously unexplored by existing literature △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2402.04937 [pdf, other]

doi 10.1145/3613904.3642827

Charting the COVID Long Haul Experience -- A Longitudinal Exploration of Symptoms, Activity, and Clinical Adherence

Authors: Jessica Pater, Shaan Chopra, Juliette Zaccour, Jeanne Carroll, Fayika Farhat Nova, Tammy Toscos, Shion Guha, Fen Lei Chang

Abstract: COVID Long Haul (CLH) is an emerging chronic illness with varied patient experiences. Our understanding of CLH is often limited to data from electronic health records (EHRs), such as diagnoses or problem lists, which do not capture the volatility and severity of symptoms or their impact. To better understand the unique presentation of CLH, we conducted a 3-month long cohort study with 14 CLH patie… ▽ More COVID Long Haul (CLH) is an emerging chronic illness with varied patient experiences. Our understanding of CLH is often limited to data from electronic health records (EHRs), such as diagnoses or problem lists, which do not capture the volatility and severity of symptoms or their impact. To better understand the unique presentation of CLH, we conducted a 3-month long cohort study with 14 CLH patients, collecting objective (EHR, daily Fitbit logs) and subjective (weekly surveys, interviews) data. Our findings reveal a complex presentation of symptoms, associated uncertainty, and the ensuing impact CLH has on patients' personal and professional lives. We identify patient needs, practices, and challenges around adhering to clinical recommendations, engaging with health data, and establishing "new normals" post COVID. We reflect on the potential found at the intersection of these various data streams and the persuasive heuristics possible when designing for this new population and their specific needs. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 21 pages, 4 figures, 7 tables, ACM Conference CHI Conference on Human Factors in Computing Systems

ACM Class: K.4

arXiv:2402.04929 [pdf, other]

Source-Free Domain Adaptation with Diffusion-Guided Source Data Generation

Authors: Shivang Chopra, Suraj Kothawade, Houda Aynaou, Aman Chadha

Abstract: This paper introduces a novel approach to leverage the generalizability of Diffusion Models for Source-Free Domain Adaptation (DM-SFDA). Our proposed DMSFDA method involves fine-tuning a pre-trained text-to-image diffusion model to generate source domain images using features from the target images to guide the diffusion process. Specifically, the pre-trained diffusion model is fine-tuned to gener… ▽ More This paper introduces a novel approach to leverage the generalizability of Diffusion Models for Source-Free Domain Adaptation (DM-SFDA). Our proposed DMSFDA method involves fine-tuning a pre-trained text-to-image diffusion model to generate source domain images using features from the target images to guide the diffusion process. Specifically, the pre-trained diffusion model is fine-tuned to generate source samples that minimize entropy and maximize confidence for the pre-trained source model. We then use a diffusion model-based image mixup strategy to bridge the domain gap between the source and target domains. We validate our approach through comprehensive experiments across a range of datasets, including Office-31, Office-Home, and VisDA. The results demonstrate significant improvements in SFDA performance, highlighting the potential of diffusion models in generating contextually relevant, domain-specific images. △ Less

Submitted 26 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2310.01701

arXiv:2401.05727 [pdf, other]

Zero Resource Cross-Lingual Part Of Speech Tagging

Authors: Sahil Chopra

Abstract: Part of speech tagging in zero-resource settings can be an effective approach for low-resource languages when no labeled training data is available. Existing systems use two main techniques for POS tagging i.e. pretrained multilingual large language models(LLM) or project the source language labels into the zero resource target language and train a sequence labeling model on it. We explore the lat… ▽ More Part of speech tagging in zero-resource settings can be an effective approach for low-resource languages when no labeled training data is available. Existing systems use two main techniques for POS tagging i.e. pretrained multilingual large language models(LLM) or project the source language labels into the zero resource target language and train a sequence labeling model on it. We explore the latter approach using the off-the-shelf alignment module and train a hidden Markov model(HMM) to predict the POS tags. We evaluate transfer learning setup with English as a source language and French, German, and Spanish as target languages for part-of-speech tagging. Our conclusion is that projected alignment data in zero-resource language can be beneficial to predict POS tags. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2310.14196 [pdf, other]

Learning to Discern: Imitating Heterogeneous Human Demonstrations with Preference and Representation Learning

Authors: Sachit Kuhar, Shuo Cheng, Shivang Chopra, Matthew Bronars, Danfei Xu

Abstract: Practical Imitation Learning (IL) systems rely on large human demonstration datasets for successful policy learning. However, challenges lie in maintaining the quality of collected data and addressing the suboptimal nature of some demonstrations, which can compromise the overall dataset quality and hence the learning outcome. Furthermore, the intrinsic heterogeneity in human behavior can produce e… ▽ More Practical Imitation Learning (IL) systems rely on large human demonstration datasets for successful policy learning. However, challenges lie in maintaining the quality of collected data and addressing the suboptimal nature of some demonstrations, which can compromise the overall dataset quality and hence the learning outcome. Furthermore, the intrinsic heterogeneity in human behavior can produce equally successful but disparate demonstrations, further exacerbating the challenge of discerning demonstration quality. To address these challenges, this paper introduces Learning to Discern (L2D), an offline imitation learning framework for learning from demonstrations with diverse quality and style. Given a small batch of demonstrations with sparse quality labels, we learn a latent representation for temporally embedded trajectory segments. Preference learning in this latent space trains a quality evaluator that generalizes to new demonstrators exhibiting different styles. Empirically, we show that L2D can effectively assess and learn from varying demonstrations, thereby leading to improved policy performance across a range of tasks in both simulations and on a physical robot. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: To appear at the 7th Annual Conference on Robot Learning (CoRL) 2023

arXiv:2310.05592 [pdf, other]

InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations

Authors: Nils Feldhus, Qianli Wang, Tatiana Anikina, Sahil Chopra, Cennet Oguz, Sebastian Möller

Abstract: While recently developed NLP explainability methods let us open the black box in various ways (Madsen et al., 2022), a missing ingredient in this endeavor is an interactive tool offering a conversational interface. Such a dialogue system can help users explore datasets and models with explanations in a contextualized manner, e.g. via clarification or follow-up questions, and through a natural lang… ▽ More While recently developed NLP explainability methods let us open the black box in various ways (Madsen et al., 2022), a missing ingredient in this endeavor is an interactive tool offering a conversational interface. Such a dialogue system can help users explore datasets and models with explanations in a contextualized manner, e.g. via clarification or follow-up questions, and through a natural language interface. We adapt the conversational explanation framework TalkToModel (Slack et al., 2022) to the NLP domain, add new NLP-specific operations such as free-text rationalization, and illustrate its generalizability on three NLP tasks (dialogue act classification, question answering, hate speech detection). To recognize user queries for explanations, we evaluate fine-tuned and few-shot prompting models and implement a novel Adapter-based approach. We then conduct two user studies on (1) the perceived correctness and helpfulness of the dialogues, and (2) the simulatability, i.e. how objectively helpful dialogical explanations are for humans in figuring out the model's predicted label when it's not shown. We found rationalization and feature attribution were helpful in explaining the model behavior. Moreover, users could more reliably predict the model outcome based on an explanation dialogue rather than one-off explanations. △ Less

Submitted 23 October, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 Findings. Camera-ready version

arXiv:2310.01701

Transcending Domains through Text-to-Image Diffusion: A Source-Free Approach to Domain Adaptation

Authors: Shivang Chopra, Suraj Kothawade, Houda Aynaou, Aman Chadha

Abstract: Domain Adaptation (DA) is a method for enhancing a model's performance on a target domain with inadequate annotated data by applying the information the model has acquired from a related source domain with sufficient labeled data. The escalating enforcement of data-privacy regulations like HIPAA, COPPA, FERPA, etc. have sparked a heightened interest in adapting models to novel domains while circum… ▽ More Domain Adaptation (DA) is a method for enhancing a model's performance on a target domain with inadequate annotated data by applying the information the model has acquired from a related source domain with sufficient labeled data. The escalating enforcement of data-privacy regulations like HIPAA, COPPA, FERPA, etc. have sparked a heightened interest in adapting models to novel domains while circumventing the need for direct access to the source data, a problem known as Source-Free Domain Adaptation (SFDA). In this paper, we propose a novel framework for SFDA that generates source data using a text-to-image diffusion model trained on the target domain samples. Our method starts by training a text-to-image diffusion model on the labeled target domain samples, which is then fine-tuned using the pre-trained source model to generate samples close to the source data. Finally, we use Domain Adaptation techniques to align the artificially generated source data with the target domain data, resulting in significant performance improvements of the model on the target domain. Through extensive comparison against several baselines on the standard Office-31, Office-Home, and VisDA benchmarks, we demonstrate the effectiveness of our approach for the SFDA task. △ Less

Submitted 6 February, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Revamped the whole paper; new version will be re-submitted

arXiv:2306.13276 [pdf, other]

On Sensitivity and Robustness of Normalization Schemes to Input Distribution Shifts in Automatic MR Image Diagnosis

Authors: Divyam Madaan, Daniel Sodickson, Kyunghyun Cho, Sumit Chopra

Abstract: Magnetic Resonance Imaging (MRI) is considered the gold standard of medical imaging because of the excellent soft-tissue contrast exhibited in the images reconstructed by the MRI pipeline, which in-turn enables the human radiologist to discern many pathologies easily. More recently, Deep Learning (DL) models have also achieved state-of-the-art performance in diagnosing multiple diseases using thes… ▽ More Magnetic Resonance Imaging (MRI) is considered the gold standard of medical imaging because of the excellent soft-tissue contrast exhibited in the images reconstructed by the MRI pipeline, which in-turn enables the human radiologist to discern many pathologies easily. More recently, Deep Learning (DL) models have also achieved state-of-the-art performance in diagnosing multiple diseases using these reconstructed images as input. However, the image reconstruction process within the MRI pipeline, which requires the use of complex hardware and adjustment of a large number of scanner parameters, is highly susceptible to noise of various forms, resulting in arbitrary artifacts within the images. Furthermore, the noise distribution is not stationary and varies within a machine, across machines, and patients, leading to varying artifacts within the images. Unfortunately, DL models are quite sensitive to these varying artifacts as it leads to changes in the input data distribution between the training and testing phases. The lack of robustness of these models against varying artifacts impedes their use in medical applications where safety is critical. In this work, we focus on improving the generalization performance of these models in the presence of multiple varying artifacts that manifest due to the complexity of the MR data acquisition. In our experiments, we observe that Batch Normalization, a widely used technique during the training of DL models for medical image analysis, is a significant cause of performance degradation in these changing environments. As a solution, we propose to use other normalization techniques, such as Group Normalization and Layer Normalization (LN), to inject robustness into model performance against varying image artifacts. Through a systematic set of experiments, we show that GN and LN provide better accuracy for various MR artifacts and distribution shifts. △ Less

Submitted 22 June, 2023; originally announced June 2023.

Comments: Accepted at MIDL 2023

arXiv:2304.09254 [pdf]

FastMRI Prostate: A Publicly Available, Biparametric MRI Dataset to Advance Machine Learning for Prostate Cancer Imaging

Authors: Radhika Tibrewala, Tarun Dutt, Angela Tong, Luke Ginocchio, Mahesh B Keerthivasan, Steven H Baete, Sumit Chopra, Yvonne W Lui, Daniel K Sodickson, Hersh Chandarana, Patricia M Johnson

Abstract: The fastMRI brain and knee dataset has enabled significant advances in exploring reconstruction methods for improving speed and image quality for Magnetic Resonance Imaging (MRI) via novel, clinically relevant reconstruction approaches. In this study, we describe the April 2023 expansion of the fastMRI dataset to include biparametric prostate MRI data acquired on a clinical population. The dataset… ▽ More The fastMRI brain and knee dataset has enabled significant advances in exploring reconstruction methods for improving speed and image quality for Magnetic Resonance Imaging (MRI) via novel, clinically relevant reconstruction approaches. In this study, we describe the April 2023 expansion of the fastMRI dataset to include biparametric prostate MRI data acquired on a clinical population. The dataset consists of raw k-space and reconstructed images for T2-weighted and diffusion-weighted sequences along with slice-level labels that indicate the presence and grade of prostate cancer. As has been the case with fastMRI, increasing accessibility to raw prostate MRI data will further facilitate research in MR image reconstruction and evaluation with the larger goal of improving the utility of MRI for prostate cancer detection and evaluation. The dataset is available at https://fastmri.med.nyu.edu. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: 4 pages, 1 figure

arXiv:2301.11962 [pdf, other]

On the Feasibility of Machine Learning Augmented Magnetic Resonance for Point-of-Care Identification of Disease

Authors: Raghav Singhal, Mukund Sudarshan, Anish Mahishi, Sri Kaushik, Luke Ginocchio, Angela Tong, Hersh Chandarana, Daniel K. Sodickson, Rajesh Ranganath, Sumit Chopra

Abstract: Early detection of many life-threatening diseases (e.g., prostate and breast cancer) within at-risk population can improve clinical outcomes and reduce cost of care. While numerous disease-specific "screening" tests that are closer to Point-of-Care (POC) are in use for this task, their low specificity results in unnecessary biopsies, leading to avoidable patient trauma and wasteful healthcare spen… ▽ More Early detection of many life-threatening diseases (e.g., prostate and breast cancer) within at-risk population can improve clinical outcomes and reduce cost of care. While numerous disease-specific "screening" tests that are closer to Point-of-Care (POC) are in use for this task, their low specificity results in unnecessary biopsies, leading to avoidable patient trauma and wasteful healthcare spending. On the other hand, despite the high accuracy of Magnetic Resonance (MR) imaging in disease diagnosis, it is not used as a POC disease identification tool because of poor accessibility. The root cause of poor accessibility of MR stems from the requirement to reconstruct high-fidelity images, as it necessitates a lengthy and complex process of acquiring large quantities of high-quality k-space measurements. In this study we explore the feasibility of an ML-augmented MR pipeline that directly infers the disease sidestep** the image reconstruction process. We hypothesise that the disease classification task can be solved using a very small tailored subset of k-space data, compared to image reconstruction. Towards that end, we propose a method that performs two tasks: 1) identifies a subset of the k-space that maximizes disease identification accuracy, and 2) infers the disease directly using the identified k-space subset, bypassing the image reconstruction step. We validate our hypothesis by measuring the performance of the proposed system across multiple diseases and anatomies. We show that comparable performance to image-based classifiers, trained on images reconstructed with full k-space data, can be achieved using small quantities of data: 8% of the data for detecting multiple abnormalities in prostate and brain scans, and 5% of the data for knee abnormalities. To better understand the proposed approach and instigate future research, we provide an extensive analysis and release code. △ Less

Submitted 2 February, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

arXiv:2301.08051 [pdf, other]

doi 10.52953/JTPE9471

xURLCC in 6g with meshed RAN

Authors: Mohammad Ali Khoshkholghi, Toktam Mahmoodi, Subhankar Pal, Subhash Chopra, Mayuri Tendulkar, Sandip Sarkar

Abstract: 5G Ultra-Reliable Low Latency Communications Technology (URLLC) will not be able to provide extremely reliable low latency services to the complex networks in 6G. Moreover, URLLC that began with 5G has to be refined and improved in 6G to provide xURLCC (extreme URLCC) with sub-millisecond latency, for supporting diverse mission-critical applications. This paper aims to highlight the importance of… ▽ More 5G Ultra-Reliable Low Latency Communications Technology (URLLC) will not be able to provide extremely reliable low latency services to the complex networks in 6G. Moreover, URLLC that began with 5G has to be refined and improved in 6G to provide xURLCC (extreme URLCC) with sub-millisecond latency, for supporting diverse mission-critical applications. This paper aims to highlight the importance of peer-to-peer mesh connectivity for services that require xURLLC. Deploying mesh connectivity among RAN nodes would add significant value to the current 5G New Radio (5G NR) enabling 6G to increase flexibility and reliability of the networks while reducing the inherent latency introduced by the core network. To provide a mesh connectivity in RAN, the nodes should be able to communicate with each other directly and be independent from the mobile core network so that data can be directly exchanged between base stations (gNBs) whereas certain aspects of signalling procedure including data session establishment will be managed by RAN itself. In this paper, we introduce several architectural choices for a mesh network topology that could potentially be crucial to a number of applications. In addition, three possible options to create mesh connectivity in RAN are provided, and their pros and cons are discussed in detail. △ Less

Submitted 19 January, 2023; originally announced January 2023.

Journal ref: ITU Journal on Future and Evolving Technologies, Volume 3 (2022), Issue 3, Pages 612-622

arXiv:2206.08566 [pdf, other]

Active Data Discovery: Mining Unknown Data using Submodular Information Measures

Authors: Suraj Kothawade, Shivang Chopra, Saikat Ghosh, Rishabh Iyer

Abstract: Active Learning is a very common yet powerful framework for iteratively and adaptively sampling subsets of the unlabeled sets with a human in the loop with the goal of achieving labeling efficiency. Most real world datasets have imbalance either in classes and slices, and correspondingly, parts of the dataset are rare. As a result, there has been a lot of work in designing active learning approach… ▽ More Active Learning is a very common yet powerful framework for iteratively and adaptively sampling subsets of the unlabeled sets with a human in the loop with the goal of achieving labeling efficiency. Most real world datasets have imbalance either in classes and slices, and correspondingly, parts of the dataset are rare. As a result, there has been a lot of work in designing active learning approaches for mining these rare data instances. Most approaches assume access to a seed set of instances which contain these rare data instances. However, in the event of more extreme rareness, it is reasonable to assume that these rare data instances (either classes or slices) may not even be present in the seed labeled set, and a critical need for the active learning paradigm is to efficiently discover these rare data instances. In this work, we provide an active data discovery framework which can mine unknown data slices and classes efficiently using the submodular conditional gain and submodular conditional mutual information functions. We provide a general algorithmic framework which works in a number of scenarios including image classification and object detection and works with both rare classes and rare slices present in the unlabeled set. We show significant accuracy and labeling efficiency gains with our approach compared to existing state-of-the-art active learning approaches for actively discovering these rare classes and slices. △ Less

Submitted 17 June, 2022; originally announced June 2022.

arXiv:2204.01837 [pdf, other]

Parallel Power System Restoration

Authors: Sunil Chopra, Feng Qiu, Sangho Shim

Abstract: Power system restoration is an essential activity for grid resilience, where grid operators restart generators, re-establish transmission paths, and restore loads after a blackout event. With a goal of restoring electric service in the shortest time, the core decisions in restoration planning are to partition the grid into sub-networks, each of which has an initial power source for black-start (ca… ▽ More Power system restoration is an essential activity for grid resilience, where grid operators restart generators, re-establish transmission paths, and restore loads after a blackout event. With a goal of restoring electric service in the shortest time, the core decisions in restoration planning are to partition the grid into sub-networks, each of which has an initial power source for black-start (called sectionalization problem), and then restart all generators in each network (called generator startup sequencing problem or GSS) as soon as possible. Due to the complexity of each problem, the sectionalization and GSS problems are usually solved separately, often resulting in a sub-optimal solution. Our paper develops models and computational methods to solve the two problems simultaneously. We first study the computational complexity of the GSS problem and develop an efficient integer linear programming formulation. We then integrate the GSS problem with the sectionalization problem and develop an integer linear programming formulation for the parallel power system restoration (PPSR) problem to find exact optimal solutions. To solve larger systems, we then develop bounding approaches that find good upper and lower bounds efficiently. Finally, to address computational challenges for very large power grids, we develop a randomized approach to find a high-quality feasible solution quickly. Our computational experiments demonstrate that the proposed approaches are able to find good solutions for PPSR in up to 2000-bus systems. △ Less

Submitted 18 August, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

Comments: 30 pages, working paper

arXiv:2109.09849 [pdf]

doi 10.1088/1361-6463/ac3bf2

Determination of positive anode sheath in anodic carbon arc for synthesis of nanomaterials

Authors: Nirbhav S. Chopra, Yevgeny Raitses, Shurik Yatom, Jorge M. Muñoz Burgos

Abstract: In the atmospheric pressure anodic carbon arc, ablation of the anode serves as a feedstock of carbon for production of nanomaterials. It is known that the ablation of the graphite anode in this arc can have two distinctive modes with low and high ablation rates. The transition between these modes is governed by the power deposition at the arc attachment to the anode and depends on the gap between… ▽ More In the atmospheric pressure anodic carbon arc, ablation of the anode serves as a feedstock of carbon for production of nanomaterials. It is known that the ablation of the graphite anode in this arc can have two distinctive modes with low and high ablation rates. The transition between these modes is governed by the power deposition at the arc attachment to the anode and depends on the gap between the anode and the cathode electrodes. Probe measurements combined with optical emission spectroscopy (OES) are used to analyze the voltage drop between the arc electrodes. These measurements corroborated previous predictions of a positive anode sheath (i.e. electron attracting sheath) in this arc, which appears in both low and high ablation modes. Another key result is a relatively low electron temperature (~ 0.6 eV) obtained from OES using a collisional radiative model. This result partially explains a higher arc voltage (~ 20 V) required to sustain the arc current of 50-70 A than predicted by existing simulations of this discharge. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: 38 pages, 25 figures, submitted to Journal of Physics D: Applied Physics

arXiv:2007.04297 [pdf, other]

Open Domain Suggestion Mining Leveraging Fine-Grained Analysis

Authors: Shreya Singal, Tanishq Goel, Shivang Chopra, Sonika Dahiya

Abstract: Suggestion mining tasks are often semantically complex and lack sophisticated methodologies that can be applied to real-world data. The presence of suggestions across a large diversity of domains and the absence of large labelled and balanced datasets render this task particularly challenging to deal with. In an attempt to overcome these challenges, we propose a two-tier pipeline that leverages Di… ▽ More Suggestion mining tasks are often semantically complex and lack sophisticated methodologies that can be applied to real-world data. The presence of suggestions across a large diversity of domains and the absence of large labelled and balanced datasets render this task particularly challenging to deal with. In an attempt to overcome these challenges, we propose a two-tier pipeline that leverages Discourse Marker based oversampling and fine-grained suggestion mining techniques to retrieve suggestions from online forums. Through extensive comparison on a real-world open-domain suggestion dataset, we demonstrate how the oversampling technique combined with transformer based fine-grained analysis can beat the state of the art. Additionally, we perform extensive qualitative and qualitative analysis to give construct validity to our proposed pipeline. Finally, we discuss the practical, computational and reproducibility aspects of the deployment of our pipeline across the web. △ Less

Submitted 11 July, 2020; v1 submitted 27 June, 2020; originally announced July 2020.

arXiv:1904.12258 [pdf, other]

Generalizing the Covering Path Problem on a Grid

Authors: Liwei Zeng, Karen Smilowitz, Sunil Chopra

Abstract: We study the covering path problem on a grid of R^{2}. We generalize earlier results on a rectangular grid and prove that the covering path cost can be bounded by the area and perimeter of the grid. We provide (2+ε) and (1+ε)-approximations for the problem on a general grid and on a convex grid, respectively. We study the covering path problem on a grid of R^{2}. We generalize earlier results on a rectangular grid and prove that the covering path cost can be bounded by the area and perimeter of the grid. We provide (2+ε) and (1+ε)-approximations for the problem on a general grid and on a convex grid, respectively. △ Less

Submitted 28 April, 2019; originally announced April 2019.

arXiv:1903.04879 [pdf, other]

What sets Verified Users apart? Insights, Analysis and Prediction of Verified Users on Twitter

Authors: Indraneil Paul, Abhinav Khattar, Shaan Chopra, Ponnurangam Kumaraguru, Manish Gupta

Abstract: Social network and publishing platforms, such as Twitter, support the concept of a secret proprietary verification process, for handles they deem worthy of platform-wide public interest. In line with significant prior work which suggests that possessing such a status symbolizes enhanced credibility in the eyes of the platform audience, a verified badge is clearly coveted among public figures and b… ▽ More Social network and publishing platforms, such as Twitter, support the concept of a secret proprietary verification process, for handles they deem worthy of platform-wide public interest. In line with significant prior work which suggests that possessing such a status symbolizes enhanced credibility in the eyes of the platform audience, a verified badge is clearly coveted among public figures and brands. What are less obvious are the inner workings of the verification process and what being verified represents. This lack of clarity, coupled with the flak that Twitter received by extending aforementioned status to political extremists in 2017, backed Twitter into publicly admitting that the process and what the status represented needed to be rethought. With this in mind, we seek to unravel the aspects of a user's profile which likely engender or preclude verification. The aim of the paper is two-fold: First, we test if discerning the verification status of a handle from profile metadata and content features is feasible. Second, we unravel the features which have the greatest bearing on a handle's verification status. We collected a dataset consisting of profile metadata of all 231,235 verified English-speaking users (as of July 2018), a control sample of 175,930 non-verified English-speaking users and all their 494 million tweets over a one year collection period. Our proposed models are able to reliably identify verification status (Area under curve AUC > 99%). We show that number of public list memberships, presence of neutral sentiment in tweets and an authoritative language style are the most pertinent predictors of verification status. To the best of our knowledge, this work represents the first attempt at discerning and classifying verification worthy users on Twitter. △ Less

Submitted 12 March, 2019; originally announced March 2019.

arXiv:1902.02248 [pdf, other]

Generative Image Translation for Data Augmentation of Bone Lesion Pathology

Authors: Anant Gupta, Srivas Venkatesh, Sumit Chopra, Christian Ledig

Abstract: Insufficient training data and severe class imbalance are often limiting factors when develo** machine learning models for the classification of rare diseases. In this work, we address the problem of classifying bone lesions from X-ray images by increasing the small number of positive samples in the training set. We propose a generative data augmentation approach based on a cycle-consistent gene… ▽ More Insufficient training data and severe class imbalance are often limiting factors when develo** machine learning models for the classification of rare diseases. In this work, we address the problem of classifying bone lesions from X-ray images by increasing the small number of positive samples in the training set. We propose a generative data augmentation approach based on a cycle-consistent generative adversarial network that synthesizes bone lesions on images without pathology. We pose the generative task as an image-patch translation problem that we optimize specifically for distinct bones (humerus, tibia, femur). In experimental results, we confirm that the described method mitigates the class imbalance problem in the binary classification task of bone lesion detection. We show that the augmented training sets enable the training of superior classifiers achieving better performance on a held-out test set. Additionally, we demonstrate the feasibility of transfer learning and apply a generative model that was trained on one body part to another. △ Less

Submitted 6 February, 2019; originally announced February 2019.

arXiv:1812.09710 [pdf, other]

Elites Tweet? Characterizing the Twitter Verified User Network

Authors: Indraneil Paul, Abhinav Khattar, Ponnurangam Kumaraguru, Manish Gupta, Shaan Chopra

Abstract: Social network and publishing platforms, such as Twitter, support the concept of verification. Verified accounts are deemed worthy of platform-wide public interest and are separately authenticated by the platform itself. There have been repeated assertions by these platforms about verification not being tantamount to endorsement. However, a significant body of prior work suggests that possessing a… ▽ More Social network and publishing platforms, such as Twitter, support the concept of verification. Verified accounts are deemed worthy of platform-wide public interest and are separately authenticated by the platform itself. There have been repeated assertions by these platforms about verification not being tantamount to endorsement. However, a significant body of prior work suggests that possessing a verified status symbolizes enhanced credibility in the eyes of the platform audience. As a result, such a status is highly coveted among public figures and influencers. Hence, we attempt to characterize the network of verified users on Twitter and compare the results to similar analysis performed for the entire Twitter network. We extracted the entire network of verified users on Twitter (as of July 2018) and obtained 231,246 user profiles and 79,213,811 connections. Subsequently in the network analysis, we found that the sub-graph of verified users mirrors the full Twitter users graph in some aspects such as possessing a short diameter. However, our findings contrast with earlier findings on multiple aspects, such as the possession of a power law out-degree distribution, slight dissortativity and a significantly higher reciprocity rate, as elucidated in the paper. Moreover, we attempt to gauge the presence of salient components within this sub-graph and detect the absence of homophily with respect to popularity, which again is in stark contrast to the full Twitter graph. Finally, we demonstrate stationarity in the time series of verified user activity levels. To the best of our knowledge, this work represents the first quantitative attempt at characterizing verified users on Twitter. △ Less

Submitted 12 March, 2019; v1 submitted 23 December, 2018; originally announced December 2018.

arXiv:1805.08712 [pdf, other]

A Distributed Version of the Hungarian Method for Multi-Robot Assignment

Authors: Smriti Chopra, Giuseppe Notarstefano, Matthew Rice, Magnus Egerstedt

Abstract: In this paper, we propose a distributed version of the Hungarian Method to solve the well known assignment problem. In the context of multi-robot applications, all robots cooperatively compute a common assignment that optimizes a given global criterion (e.g. the total distance traveled) within a finite set of local computations and communications over a peer-to-peer network. As a motivating applic… ▽ More In this paper, we propose a distributed version of the Hungarian Method to solve the well known assignment problem. In the context of multi-robot applications, all robots cooperatively compute a common assignment that optimizes a given global criterion (e.g. the total distance traveled) within a finite set of local computations and communications over a peer-to-peer network. As a motivating application, we consider a class of multi-robot routing problems with "spatio-temporal" constraints, i.e. spatial targets that require servicing at particular time instants. As a means of demonstrating the theory developed in this paper, the robots cooperatively find online, suboptimal routes by applying an iterative version of the proposed algorithm, in a distributed and dynamic setting. As a concrete experimental test-bed, we provide an interactive "multi-robot orchestral" framework in which a team of robots cooperatively plays a piece of music on a so-called orchestral floor. △ Less

Submitted 22 May, 2018; originally announced May 2018.

arXiv:1804.02063 [pdf, ps, other]

Few-Shot Text Classification with Pre-Trained Word Embeddings and a Human in the Loop

Authors: Katherine Bailey, Sunny Chopra

Abstract: Most of the literature around text classification treats it as a supervised learning problem: given a corpus of labeled documents, train a classifier such that it can accurately predict the classes of unseen documents. In industry, however, it is not uncommon for a business to have entire corpora of documents where few or none have been classified, or where existing classifications have become mea… ▽ More Most of the literature around text classification treats it as a supervised learning problem: given a corpus of labeled documents, train a classifier such that it can accurately predict the classes of unseen documents. In industry, however, it is not uncommon for a business to have entire corpora of documents where few or none have been classified, or where existing classifications have become meaningless. With web content, for example, poor taxonomy management can result in labels being applied indiscriminately, making filtering by these labels unhelpful. Our work aims to make it possible to classify an entire corpus of unlabeled documents using a human-in-the-loop approach, where the content owner manually classifies just one or two documents per category and the rest can be automatically classified. This "few-shot" learning approach requires rich representations of the documents such that those that have been manually labeled can be treated as prototypes, and automatic classification of the rest is a simple case of measuring the distance to prototypes. This approach uses pre-trained word embeddings, where documents are represented using a simple weighted average of constituent word embeddings. We have tested the accuracy of the approach on existing labeled datasets and provide the results here. We have also made code available for reproducing the results we got on the 20 Newsgroups dataset. △ Less

Submitted 5 April, 2018; originally announced April 2018.

Comments: 8 pages

arXiv:1803.09040 [pdf, other]

A Bounded Formulation for The School Bus Scheduling Problem

Authors: Liwei Zeng, Sunil Chopra, Karen Smilowitz

Abstract: This paper proposes a new formulation for the school bus scheduling problem (SBSP) which optimizes school start times and bus operation times to minimize transportation cost. Our goal is to minimize the number of buses to serve all bus routes such that each route arrives in a time window before school starts. We present a new time-indexed integer linear programming (ILP) formulation for this probl… ▽ More This paper proposes a new formulation for the school bus scheduling problem (SBSP) which optimizes school start times and bus operation times to minimize transportation cost. Our goal is to minimize the number of buses to serve all bus routes such that each route arrives in a time window before school starts. We present a new time-indexed integer linear programming (ILP) formulation for this problem. Based on a strengthened version of the linear relaxation of the ILP, we develop a dependent randomized rounding algorithm that yields near-optimal solutions for large-scale problem instances. We also generalize our methodologies to solve a robust version of the SBSP. △ Less

Submitted 1 August, 2020; v1 submitted 23 March, 2018; originally announced March 2018.

arXiv:1801.05588 [pdf, other]

White or Blue, the Whale gets its Vengeance: A Social Media Analysis of the Blue Whale Challenge

Authors: Abhinav Khattar, Karan Dabas, Kshitij Gupta, Shaan Chopra, Ponnurangam Kumaraguru

Abstract: The Blue Whale Challenge is a series of self-harm causing tasks that are propagated via online social media under the disguise of a "game." The list of tasks must be completed in a duration of 50 days and they cause both physical and mental harm to the player. The final task is to commit suicide. The game is supposed to be administered by people called "curators" who incite others to cause self-mu… ▽ More The Blue Whale Challenge is a series of self-harm causing tasks that are propagated via online social media under the disguise of a "game." The list of tasks must be completed in a duration of 50 days and they cause both physical and mental harm to the player. The final task is to commit suicide. The game is supposed to be administered by people called "curators" who incite others to cause self-mutilation and commit suicide. The curators and potential players are known to contact each other on social networking websites and the conversations between them are suspected to take place mainly via direct messages which are difficult to track. Though, in order to find curators, the players make public posts containing certain hashtags/keywords to catch their attention. Even though a lot of these social networks have moderated posts talking about the game, yet some posts manage to pass their filters. Our research focuses on (1) understanding the social media spread of the challenge, (2) spotting the behaviour of the people taking interest in Blue Whale challenge and, (3) analysing demographics of the users who may be involved in playing the game. △ Less

Submitted 17 January, 2018; originally announced January 2018.

Comments: 18 pages

arXiv:1709.07485 [pdf, other]

The Covering Path Problem on a Grid

Authors: Liwei Zeng, Sunil Chopra, Karen Smilowitz

Abstract: This paper introduces the covering path problem on a grid (CPPG) which finds the cost-minimizing path connecting a subset of points in a grid such that each point that needs to be covered is within a predetermined distance of a point from the chosen subset. We leverage the geometric properties of the grid graph which captures the road network structure in many transportation problems, including ou… ▽ More This paper introduces the covering path problem on a grid (CPPG) which finds the cost-minimizing path connecting a subset of points in a grid such that each point that needs to be covered is within a predetermined distance of a point from the chosen subset. We leverage the geometric properties of the grid graph which captures the road network structure in many transportation problems, including our motivating setting of school bus routing. As defined in this paper, the CPPG is a bi-objective optimization problem comprised of one cost term related to path length and one cost term related to stop count. We develop a trade-off constraint which quantifies the trade-off between path length and stop count and provides a lower bound for the bi-objective optimization problem. We introduce simple construction techniques to provide feasible paths that match the lower bound within a constant factor. Importantly, this solution approach uses transformations of the general CPPG to either a discrete CPPG or continuous CPPG based on the value of the coverage radius. For both the discrete and continuous versions, we provide fast constant-factor approximations, thus solving the general CPPG. △ Less

Submitted 24 April, 2019; v1 submitted 21 September, 2017; originally announced September 2017.

arXiv:1709.03856 [pdf, ps, other]

StarSpace: Embed All The Things!

Authors: Ledell Wu, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes, Jason Weston

Abstract: We present StarSpace, a general-purpose neural embedding model that can solve a wide variety of problems: labeling tasks such as text classification, ranking tasks such as information retrieval/web search, collaborative filtering-based or content-based recommendation, embedding of multi-relational graphs, and learning word, sentence or document level embeddings. In each case the model works by emb… ▽ More We present StarSpace, a general-purpose neural embedding model that can solve a wide variety of problems: labeling tasks such as text classification, ranking tasks such as information retrieval/web search, collaborative filtering-based or content-based recommendation, embedding of multi-relational graphs, and learning word, sentence or document level embeddings. In each case the model works by embedding those entities comprised of discrete features and comparing them against each other -- learning similarities dependent on the task. Empirical results on a number of tasks show that StarSpace is highly competitive with existing methods, whilst also being generally applicable to new cases where those methods are not. △ Less

Submitted 20 November, 2017; v1 submitted 12 September, 2017; originally announced September 2017.

arXiv:1702.04770 [pdf, other]

Training Language Models Using Target-Propagation

Authors: Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache

Abstract: While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps. We investigate whether Target Propagation (TPROP) style approaches can address these shortcomings. Unfortunately, extensive experim… ▽ More While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps. We investigate whether Target Propagation (TPROP) style approaches can address these shortcomings. Unfortunately, extensive experiments suggest that TPROP generally underperforms BPTT, and we end with an analysis of this phenomenon, and suggestions for future work. △ Less

Submitted 15 February, 2017; originally announced February 2017.

arXiv:1612.04936 [pdf, other]

Learning through Dialogue Interactions by Asking Questions

Authors: Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston

Abstract: A good dialogue agent should have the ability to interact with users by both responding to questions and by asking questions, and importantly to learn from both types of interaction. In this work, we explore this direction by designing a simulator and a set of synthetic tasks in the movie domain that allow such interactions between a learner and a teacher. We investigate how a learner can benefit… ▽ More A good dialogue agent should have the ability to interact with users by both responding to questions and by asking questions, and importantly to learn from both types of interaction. In this work, we explore this direction by designing a simulator and a set of synthetic tasks in the movie domain that allow such interactions between a learner and a teacher. We investigate how a learner can benefit from asking questions in both offline and online reinforcement learning settings, and demonstrate that the learner improves when asking questions. Finally, real experiments with Mechanical Turk validate the approach. Our work represents a first step in develo** such end-to-end learned interactive dialogue agents. △ Less

Submitted 13 February, 2017; v1 submitted 15 December, 2016; originally announced December 2016.

arXiv:1611.09823 [pdf, other]

Dialogue Learning With Human-In-The-Loop

Authors: Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston

Abstract: An important aspect of develo** conversational agents is to give a bot the ability to improve through communicating with humans and to learn from the mistakes that it makes. Most research has focused on learning from fixed training sets of labeled data rather than interacting with a dialogue partner in an online fashion. In this paper we explore this direction in a reinforcement learning setting… ▽ More An important aspect of develo** conversational agents is to give a bot the ability to improve through communicating with humans and to learn from the mistakes that it makes. Most research has focused on learning from fixed training sets of labeled data rather than interacting with a dialogue partner in an online fashion. In this paper we explore this direction in a reinforcement learning setting where the bot improves its question-answering ability from feedback a teacher gives following its generated responses. We build a simulator that tests various aspects of such learning in a synthetic environment, and introduce models that work in this regime. Finally, real experiments with Mechanical Turk validate the approach. △ Less

Submitted 13 January, 2017; v1 submitted 29 November, 2016; originally announced November 2016.

arXiv:1604.08667 [pdf, other]

A Bio-Inspired Tensegrity Manipulator with Multi-DOF, Structurally Compliant Joints

Authors: Steven Lessard, Dennis Castro, William Asper, Shaurya Deep Chopra, Leya Breanna Baltaxe-Admony, Mircea Teodorescu, Vytas SunSpiral, Adrian Agogino

Abstract: Most traditional robotic mechanisms feature inelastic joints that are unable to robustly handle large deformations and off-axis moments. As a result, the applied loads are transferred rigidly throughout the entire structure. The disadvantage of this approach is that the exerted leverage is magnified at each subsequent joint possibly damaging the mechanism. In this paper, we present two lightweight… ▽ More Most traditional robotic mechanisms feature inelastic joints that are unable to robustly handle large deformations and off-axis moments. As a result, the applied loads are transferred rigidly throughout the entire structure. The disadvantage of this approach is that the exerted leverage is magnified at each subsequent joint possibly damaging the mechanism. In this paper, we present two lightweight, elastic, bio-inspired tensegrity robotics arms which mitigate this danger while improving their mechanism's functionality. Our solutions feature modular tensegrity structures that function similarly to the human elbow and the human shoulder when connected. Like their biological counterparts, the proposed robotic joints are flexible and comply with unanticipated forces. Both proposed structures have multiple passive degrees of freedom and four active degrees of freedom (two from the shoulder and two from the elbow). The structural advantages demonstrated by the joints in these manipulators illustrate a solution to the fundamental issue of elegantly handling off-axis compliance. △ Less

Submitted 1 September, 2016; v1 submitted 28 April, 2016; originally announced April 2016.

Comments: IROS 2016

arXiv:1511.06931 [pdf, ps, other]

Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems

Authors: Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, Alexander Miller, Arthur Szlam, Jason Weston

Abstract: A long-term goal of machine learning is to build intelligent conversational agents. One recent popular approach is to train end-to-end models on a large amount of real dialog transcripts between humans (Sordoni et al., 2015; Vinyals & Le, 2015; Shang et al., 2015). However, this approach leaves many questions unanswered as an understanding of the precise successes and shortcomings of each model is… ▽ More A long-term goal of machine learning is to build intelligent conversational agents. One recent popular approach is to train end-to-end models on a large amount of real dialog transcripts between humans (Sordoni et al., 2015; Vinyals & Le, 2015; Shang et al., 2015). However, this approach leaves many questions unanswered as an understanding of the precise successes and shortcomings of each model is hard to assess. A contrasting recent proposal are the bAbI tasks (Weston et al., 2015b) which are synthetic data that measure the ability of learning machines at various reasoning tasks over toy language. Unfortunately, those tests are very small and hence may encourage methods that do not scale. In this work, we propose a suite of new tasks of a much larger scale that attempt to bridge the gap between the two regimes. Choosing the domain of movies, we provide tasks that test the ability of models to answer factual questions (utilizing OMDB), provide personalization (utilizing MovieLens), carry short conversations about the two, and finally to perform on natural dialogs from Reddit. We provide a dataset covering 75k movie entities and with 3.5M training examples. We present results of various models on these tasks, and evaluate their performance. △ Less

Submitted 19 April, 2016; v1 submitted 21 November, 2015; originally announced November 2015.

arXiv:1511.06732 [pdf, other]

Sequence Level Training with Recurrent Neural Networks

Authors: Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba

Abstract: Many natural language processing applications use language models to generate text. These models are typically trained to predict the next word in a sequence, given the previous words and some context such as an image. However, at test time the model is expected to generate the entire sequence from scratch. This discrepancy makes generation brittle, as errors may accumulate along the way. We addre… ▽ More Many natural language processing applications use language models to generate text. These models are typically trained to predict the next word in a sequence, given the previous words and some context such as an image. However, at test time the model is expected to generate the entire sequence from scratch. This discrepancy makes generation brittle, as errors may accumulate along the way. We address this issue by proposing a novel sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE. On three different tasks, our approach outperforms several strong baselines for greedy generation. The method is also competitive when these baselines employ beam search, while being several times faster. △ Less

Submitted 6 May, 2016; v1 submitted 20 November, 2015; originally announced November 2015.

arXiv:1511.02301 [pdf, other]

The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations

Authors: Felix Hill, Antoine Bordes, Sumit Chopra, Jason Weston

Abstract: We introduce a new test of how well language models capture meaning in children's books. Unlike standard language modelling benchmarks, it distinguishes the task of predicting syntactic function words from that of predicting lower-frequency words, which carry greater semantic content. We compare a range of state-of-the-art models, each with a different way of encoding what has been previously read… ▽ More We introduce a new test of how well language models capture meaning in children's books. Unlike standard language modelling benchmarks, it distinguishes the task of predicting syntactic function words from that of predicting lower-frequency words, which carry greater semantic content. We compare a range of state-of-the-art models, each with a different way of encoding what has been previously read. We show that models which store explicit representations of long-term contexts outperform state-of-the-art neural language models at predicting semantic content words, although this advantage is not observed for syntactic function words. Interestingly, we find that the amount of text encoded in a single memory representation is highly influential to the performance: there is a sweet-spot, not too big and not too small, between single words and full sentences that allows the most meaningful information in a text to be effectively retained and recalled. Further, the attention over such window-based memories can be trained effectively through self-supervision. We then assess the generality of this principle by applying it to the CNN QA benchmark, which involves identifying named entities in paraphrased summaries of news articles, and achieve state-of-the-art performance. △ Less

Submitted 1 April, 2016; v1 submitted 6 November, 2015; originally announced November 2015.

arXiv:1509.00685 [pdf, other]

A Neural Attention Model for Abstractive Sentence Summarization

Authors: Alexander M. Rush, Sumit Chopra, Jason Weston

Abstract: Summarization based on text extraction is inherently limited, but generation-style abstractive methods have proven challenging to build. In this work, we propose a fully data-driven approach to abstractive sentence summarization. Our method utilizes a local attention-based model that generates each word of the summary conditioned on the input sentence. While the model is structurally simple, it ca… ▽ More Summarization based on text extraction is inherently limited, but generation-style abstractive methods have proven challenging to build. In this work, we propose a fully data-driven approach to abstractive sentence summarization. Our method utilizes a local attention-based model that generates each word of the summary conditioned on the input sentence. While the model is structurally simple, it can easily be trained end-to-end and scales to a large amount of training data. The model shows significant performance gains on the DUC-2004 shared task compared with several strong baselines. △ Less

Submitted 3 September, 2015; v1 submitted 2 September, 2015; originally announced September 2015.

Comments: Proceedings of EMNLP 2015

arXiv:1506.02075 [pdf, ps, other]

Large-scale Simple Question Answering with Memory Networks

Authors: Antoine Bordes, Nicolas Usunier, Sumit Chopra, Jason Weston

Abstract: Training large-scale question answering systems is complicated because training sources usually cover a small portion of the range of possible questions. This paper studies the impact of multitask and transfer learning for simple question answering; a setting for which the reasoning required to answer is quite easy, as long as one can retrieve the correct evidence given a question, which can be di… ▽ More Training large-scale question answering systems is complicated because training sources usually cover a small portion of the range of possible questions. This paper studies the impact of multitask and transfer learning for simple question answering; a setting for which the reasoning required to answer is quite easy, as long as one can retrieve the correct evidence given a question, which can be difficult in large-scale conditions. To this end, we introduce a new dataset of 100k questions that we use in conjunction with existing benchmarks. We conduct our study within the framework of Memory Networks (Weston et al., 2015) because this perspective allows us to eventually scale up to more complex reasoning, and show that Memory Networks can be successfully trained to achieve excellent performance. △ Less

Submitted 5 June, 2015; originally announced June 2015.

arXiv:1502.05698 [pdf, ps, other]

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks

Authors: Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merriënboer, Armand Joulin, Tomas Mikolov

Abstract: One long-term goal of machine learning research is to produce methods that are applicable to reasoning and natural language, in particular building an intelligent dialogue agent. To measure progress towards that goal, we argue for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering. Our tasks measure understanding in several ways: whether a system is a… ▽ More One long-term goal of machine learning research is to produce methods that are applicable to reasoning and natural language, in particular building an intelligent dialogue agent. To measure progress towards that goal, we argue for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering. Our tasks measure understanding in several ways: whether a system is able to answer questions via chaining facts, simple induction, deduction and many more. The tasks are designed to be prerequisites for any system that aims to be capable of conversing with a human. We believe many existing learning systems can currently not solve them, and hence our aim is to classify these tasks into skill sets, so that researchers can identify (and then rectify) the failings of their systems. We also extend and improve the recently introduced Memory Networks model, and show it is able to solve some, but not all, of the tasks. △ Less

Submitted 31 December, 2015; v1 submitted 19 February, 2015; originally announced February 2015.

arXiv:1412.7753 [pdf, other]

Learning Longer Memory in Recurrent Neural Networks

Authors: Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu, Marc'Aurelio Ranzato

Abstract: Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers, such as stochastic gradient descent, due to the so-called vanishing gradient problem. In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfectly… ▽ More Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers, such as stochastic gradient descent, due to the so-called vanishing gradient problem. In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfectly possible using gradient descent. This is achieved by using a slight structural modification of the simple recurrent neural network architecture. We encourage some of the hidden units to change their state slowly by making part of the recurrent weight matrix close to identity, thus forming kind of a longer term memory. We evaluate our model in language modeling experiments, where we obtain similar performance to the much more complex Long Short Term Memory (LSTM) networks (Hochreiter & Schmidhuber, 1997). △ Less

Submitted 16 April, 2015; v1 submitted 24 December, 2014; originally announced December 2014.

arXiv:1412.6604 [pdf, ps, other]

Video (language) modeling: a baseline for generative models of natural videos

Authors: MarcAurelio Ranzato, Arthur Szlam, Joan Bruna, Michael Mathieu, Ronan Collobert, Sumit Chopra

Abstract: We propose a strong baseline model for unsupervised feature learning using video data. By learning to predict missing frames or extrapolate future frames from an input video sequence, the model discovers both spatial and temporal correlations which are useful to represent complex deformations and motion patterns. The models we propose are largely borrowed from the language modeling literature, and… ▽ More We propose a strong baseline model for unsupervised feature learning using video data. By learning to predict missing frames or extrapolate future frames from an input video sequence, the model discovers both spatial and temporal correlations which are useful to represent complex deformations and motion patterns. The models we propose are largely borrowed from the language modeling literature, and adapted to the vision domain by quantizing the space of image patches into a large dictionary. We demonstrate the approach on both a filling and a generation task. For the first time, we show that, after training on natural videos, such a model can predict non-trivial motions over short video sequences. △ Less

Submitted 4 May, 2016; v1 submitted 20 December, 2014; originally announced December 2014.

arXiv:1410.3916 [pdf, ps, other]

Memory Networks

Authors: Jason Weston, Sumit Chopra, Antoine Bordes

Abstract: We describe a new class of learning models called memory networks. Memory networks reason with inference components combined with a long-term memory component; they learn how to use these jointly. The long-term memory can be read and written to, with the goal of using it for prediction. We investigate these models in the context of question answering (QA) where the long-term memory effectively act… ▽ More We describe a new class of learning models called memory networks. Memory networks reason with inference components combined with a long-term memory component; they learn how to use these jointly. The long-term memory can be read and written to, with the goal of using it for prediction. We investigate these models in the context of question answering (QA) where the long-term memory effectively acts as a (dynamic) knowledge base, and the output is a textual response. We evaluate them on a large-scale QA task, and a smaller, but more complex, toy task generated from a simulated world. In the latter, we show the reasoning power of such models by chaining multiple supporting sentences to answer questions that require understanding the intension of verbs. △ Less

Submitted 29 November, 2015; v1 submitted 14 October, 2014; originally announced October 2014.

arXiv:1406.3676 [pdf, other]

Question Answering with Subgraph Embeddings

Authors: Antoine Bordes, Sumit Chopra, Jason Weston

Abstract: This paper presents a system which learns to answer questions on a broad range of topics from a knowledge base using few hand-crafted features. Our model learns low-dimensional embeddings of words and knowledge base constituents; these representations are used to score natural language questions against candidate answers. Training our system using pairs of questions and structured representations… ▽ More This paper presents a system which learns to answer questions on a broad range of topics from a knowledge base using few hand-crafted features. Our model learns low-dimensional embeddings of words and knowledge base constituents; these representations are used to score natural language questions against candidate answers. Training our system using pairs of questions and structured representations of their answers, and pairs of question paraphrases, yields competitive results on a competitive benchmark of the literature. △ Less

Submitted 3 September, 2014; v1 submitted 13 June, 2014; originally announced June 2014.

arXiv:cond-mat/0412666 [pdf, ps, other]

doi 10.1103/PhysRevE.72.031302

Packing Fractions and Maximum Angles of Stability of Granular Materials

Authors: J. Olson, M. Priester, J. Luo, S. Chopra, R. J. Zieve

Abstract: In two-dimensional rotating drum experiments, we find two separate influences of the packing fraction of a granular heap on its stability. For a fixed grain shape, the stability increases with packing fraction. However, in determining the relative stability of different grain shapes, those with the lowest average packing fractions tend to form the most stable heaps. We also show that only the co… ▽ More In two-dimensional rotating drum experiments, we find two separate influences of the packing fraction of a granular heap on its stability. For a fixed grain shape, the stability increases with packing fraction. However, in determining the relative stability of different grain shapes, those with the lowest average packing fractions tend to form the most stable heaps. We also show that only the configuration close to the surface of the pile figures prominently. △ Less

Submitted 23 December, 2004; originally announced December 2004.

Comments: 4 pages, 4 figures

arXiv:cs/0207038 [pdf, ps, other]

Iterated revision and the axiom of recovery: a unified treatment via epistemic states

Authors: Samir Chopra, Aditya Ghose, Thomas Meyer

Abstract: The axiom of recovery, while capturing a central intuition regarding belief change, has been the source of much controversy. We argue briefly against putative counterexamples to the axiom--while agreeing that some of their insight deserves to be preserved--and present additional recovery-like axioms in a framework that uses epistemic states, which encode preferences, as the object of revisions.… ▽ More The axiom of recovery, while capturing a central intuition regarding belief change, has been the source of much controversy. We argue briefly against putative counterexamples to the axiom--while agreeing that some of their insight deserves to be preserved--and present additional recovery-like axioms in a framework that uses epistemic states, which encode preferences, as the object of revisions. This provides a framework in which iterated revision becomes possible and makes explicit the connection between iterated belief change and the axiom of recovery. We provide a representation theorem that connects the semantic conditions that we impose on iterated revision and the additional syntactical properties mentioned. We also show some interesting similarities between our framework and that of Darwiche-Pearl. In particular, we show that the intuitions underlying the controversial (C2) postulate are captured by the recovery axiom and our recovery-like postulates (the latter can be seen as weakenings of (C2). △ Less

Submitted 9 July, 2002; originally announced July 2002.

ACM Class: I.2.3

arXiv:cs/0207037 [pdf, ps, other]

Some logics of belief and disbelief

Authors: Samir Chopra, Johannes Heidema, Thomas Meyer

Abstract: The introduction of explicit notions of rejection, or disbelief, into logics for knowledge representation can be justified in a number of ways. Motivations range from the need for versions of negation weaker than classical negation, to the explicit recording of classic belief contraction operations in the area of belief change, and the additional levels of expressivity obtained from an extended… ▽ More The introduction of explicit notions of rejection, or disbelief, into logics for knowledge representation can be justified in a number of ways. Motivations range from the need for versions of negation weaker than classical negation, to the explicit recording of classic belief contraction operations in the area of belief change, and the additional levels of expressivity obtained from an extended version of belief change which includes disbelief contraction. In this paper we present four logics of disbelief which address some or all of these intuitions. Soundness and completeness results are supplied and the logics are compared with respect to applicability and utility. △ Less

Submitted 9 July, 2002; originally announced July 2002.

ACM Class: I.2.3

arXiv:hep-ph/0008070 [pdf, ps, other]

Low-Scale and Gauge-Mediated Supersymmetry Breaking at the Fermilab Tevatron Run II

Authors: Ray Culbertson, Stephen P. Martin, Jianming Qian, Scott Thomas, Howard Baer, Wasiq Bokhari, Sailesh Chopra, Chih-Lung Chou, Amy Connolly, Dave Cutts, Regina Demina, Bhaskar Dutta, Gary Grim, Greg Landsberg, Konstantin Matchev, P. G. Mercadante, D. J. Muller, S. Nandi, Michael Peskin, Uri Sarid, David Stuart, Benn Tannenbaum, Xerxes Tata, Randy Thurman-Keup, Ming-Jer Wang , et al. (1 additional authors not shown)

Abstract: The prospects for discovering and studying signals of low-scale supersymmetry breaking models at the Tevatron Run II and beyond are explored. These models include gauge-mediated supersymmetry breaking as the most compelling and concrete realization, but more generally are distinguished by the presence of a nearly massless Goldstino as the lightest supersymmetric particle. The next-lightest super… ▽ More The prospects for discovering and studying signals of low-scale supersymmetry breaking models at the Tevatron Run II and beyond are explored. These models include gauge-mediated supersymmetry breaking as the most compelling and concrete realization, but more generally are distinguished by the presence of a nearly massless Goldstino as the lightest supersymmetric particle. The next-lightest supersymmetric particle(s) (NLSP) decays to its partner and the Goldstino. Depending on the supersymmetry breaking scale, these decays can occur promptly or on a scale comparable to or larger than the size of a detector. A systematic analysis based on a classification in terms of the identity of the NLSP and its decay length is presented. The various scenarios are discussed in terms of signatures and possible event selection criteria. The Run II and beyond discovery and exclusion reaches, including the effects of background, are detailed for the most compelling cases. In addition to standard event selection criteria based on missing energy and photons, leptons, jets, taus, tagged b-jets, or reconstructed Z-bosons, more exotic signals of metastable NLSPs such as displaced photons, large negative impact parameter tracks, kink tracks, both opposite and same-sign highly ionizing tracks, time of flight measurements, charge-changing tracks, charge-exchange tracks, and same-sign di-top events are investigated. The interesting possibility of observing a Higgs boson signal in events that are efficiently "tagged" by the unique signatures of low-scale supersymmetry breaking is also considered. △ Less

Submitted 16 October, 2000; v1 submitted 8 August, 2000; originally announced August 2000.

Comments: 98 pages. Preprint numbers added, style macros updated (resulting in 4 more pages due to typesetting), no other changes

Report number: FERMILAB-Pub-00/251-T, SLAC-PUB-8643

arXiv:cs/0003021 [pdf, ps, other]

Relevance Sensitive Non-Monotonic Inference on Belief Sequences

Authors: Samir Chopra, Konstantinos Georgatos, Rohit Parikh

Abstract: We present a method for relevance sensitive non-monotonic inference from belief sequences which incorporates insights pertaining to prioritized inference and relevance sensitive, inconsistency tolerant belief revision. Our model uses a finite, logically open sequence of propositional formulas as a representation for beliefs and defines a notion of inference from maxiconsistent subsets of formul… ▽ More We present a method for relevance sensitive non-monotonic inference from belief sequences which incorporates insights pertaining to prioritized inference and relevance sensitive, inconsistency tolerant belief revision. Our model uses a finite, logically open sequence of propositional formulas as a representation for beliefs and defines a notion of inference from maxiconsistent subsets of formulas guided by two orderings: a temporal sequencing and an ordering based on relevance relations between the conclusion and formulas in the sequence. The relevance relations are ternary (using context as a parameter) as opposed to standard binary axiomatizations. The inference operation thus defined easily handles iterated revision by maintaining a revision history, blocks the derivation of inconsistent answers from a possibly inconsistent sequence and maintains the distinction between explicit and implicit beliefs. In doing so, it provides a finitely presented formalism and a plausible model of reasoning for automated agents. △ Less

Submitted 7 March, 2000; originally announced March 2000.

ACM Class: I.2.3

Showing 1–49 of 49 results for author: Chopra, S