Search | arXiv e-print repository

arXiv:2309.01973 [pdf, other]

Linear Regression using Heterogeneous Data Batches

Authors: Ayush Jain, Rajat Sen, Weihao Kong, Abhimanyu Das, Alon Orlitsky

Abstract: In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and import… ▽ More In many learning applications, data are collected from multiple sources, each providing a \emph{batch} of samples that by itself is insufficient to learn its input-output relationship. A common approach assumes that the sources fall in one of several unknown subgroups, each with an unknown input distribution and input-output relationship. We consider one of this setup's most fundamental and important manifestations where the output is a noisy linear combination of the inputs, and there are $k$ subgroups, each with its own regression vector. Prior work~\cite{kong2020meta} showed that with abundant small-batches, the regression vectors can be learned with only few, $\tildeΩ( k^{3/2})$, batches of medium-size with $\tildeΩ(\sqrt k)$ samples each. However, the paper requires that the input distribution for all $k$ subgroups be isotropic Gaussian, and states that removing this assumption is an ``interesting and challenging problem". We propose a novel gradient-based algorithm that improves on the existing results in several ways. It extends the applicability of the algorithm by: (1) allowing the subgroups' underlying input distributions to be different, unknown, and heavy-tailed; (2) recovering all subgroups followed by a significant proportion of batches even for infinite $k$; (3) removing the separation requirement between the regression vectors; (4) reducing the number of batches and allowing smaller batch sizes. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2309.00511 [pdf, ps, other]

Schwinger-Keldysh effective field theory for stable and causal relativistic hydrodynamics

Authors: Akash Jain, Pavel Kovtun

Abstract: We construct stable and causal effective field theories (EFTs) for describing statistical fluctuations in relativistic diffusion and relativistic hydrodynamics. These EFTs are fully non-linear, including couplings to background sources, and enable us to compute n-point time-ordered correlation functions including the effects of statistical fluctuations. The EFTs we construct are inspired by the Ma… ▽ More We construct stable and causal effective field theories (EFTs) for describing statistical fluctuations in relativistic diffusion and relativistic hydrodynamics. These EFTs are fully non-linear, including couplings to background sources, and enable us to compute n-point time-ordered correlation functions including the effects of statistical fluctuations. The EFTs we construct are inspired by the Maxwell-Cattaneo model of relativistic diffusion and Müller-Israel-Stewart model of relativistic hydrodynamics respectively, and have been derived using both the Martin-Siggia-Rose and Schwinger-Keldysh formalisms. The EFTs non-linearly realise the dynamical Kubo-Martin-Schwinger (KMS) symmetry, which ensures that n-point correlation functions and interactions in the theory satisfy the appropriate fluctuation-dissipation theorems. Since these EFTs typically admit ultraviolet sectors that are not fixed by the low-energy infrared symmetries, we find that they simultaneously admit multiple realisations of the dynamical KMS symmetry. We also comment on certain obstructions to including statistical fluctuations in the recently-proposed stable and causal Bemfica-Disconzi-Noronha-Kovtun model of relativistic hydrodynamics. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: 47+1 pages

arXiv:2308.14920 [pdf, other]

Matbench Discovery -- A framework to evaluate machine learning crystal stability predictions

Authors: Janosh Riebesell, Rhys E. A. Goodall, Philipp Benner, Yuan Chiang, Bowen Deng, Alpha A. Lee, Anubhav Jain, Kristin A. Persson

Abstract: Matbench Discovery simulates the deployment of machine learning (ML) energy models in a high-throughput search for stable inorganic crystals. We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance. Alongside this paper, we publish a Python package to aid with future model submissions and a growing online leaderboard… ▽ More Matbench Discovery simulates the deployment of machine learning (ML) energy models in a high-throughput search for stable inorganic crystals. We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance. Alongside this paper, we publish a Python package to aid with future model submissions and a growing online leaderboard with further insights into trade-offs between various performance metrics. To answer the question which ML methodology performs best at materials discovery, our initial release explores a variety of models including random forests, graph neural networks (GNN), one-shot predictors, iterative Bayesian optimizers and universal interatomic potentials (UIP). Ranked best-to-worst by their test set F1 score on thermodynamic stability prediction, we find CHGNet > M3GNet > MACE > ALIGNN > MEGNet > CGCNN > CGCNN+P > Wrenformer > BOWSR > Voronoi tessellation fingerprints with random forest. The top 3 models are UIPs, the winning methodology for ML-guided materials discovery, achieving F1 scores of ~0.6 for crystal stability classification and discovery acceleration factors (DAF) of up to 5x on the first 10k most stable predictions compared to dummy selection from our test set. We also highlight a sharp disconnect between commonly used global regression metrics and more task-relevant classification metrics. Accurate regressors are susceptible to unexpectedly high false-positive rates if those accurate predictions lie close to the decision boundary at 0 eV/atom above the convex hull where most materials are. Our results highlight the need to focus on classification metrics that actually correlate with improved stability hit rate. △ Less

Submitted 4 February, 2024; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: 31 pages, 18 figures, 4 tables

arXiv:2308.10658 [pdf, other]

Learning Clothing and Pose Invariant 3D Shape Representation for Long-Term Person Re-Identification

Authors: Feng Liu, Minchul Kim, ZiAng Gu, Anil Jain, Xiaoming Liu

Abstract: Long-Term Person Re-Identification (LT-ReID) has become increasingly crucial in computer vision and biometrics. In this work, we aim to extend LT-ReID beyond pedestrian recognition to include a wider range of real-world human activities while still accounting for cloth-changing scenarios over large time gaps. This setting poses additional challenges due to the geometric misalignment and appearance… ▽ More Long-Term Person Re-Identification (LT-ReID) has become increasingly crucial in computer vision and biometrics. In this work, we aim to extend LT-ReID beyond pedestrian recognition to include a wider range of real-world human activities while still accounting for cloth-changing scenarios over large time gaps. This setting poses additional challenges due to the geometric misalignment and appearance ambiguity caused by the diversity of human pose and clothing. To address these challenges, we propose a new approach 3DInvarReID for (i) disentangling identity from non-identity components (pose, clothing shape, and texture) of 3D clothed humans, and (ii) reconstructing accurate 3D clothed body shapes and learning discriminative features of naked body shapes for person ReID in a joint manner. To better evaluate our study of LT-ReID, we collect a real-world dataset called CCDA, which contains a wide variety of human activities and clothing changes. Experimentally, we show the superior performance of our approach for person ReID. △ Less

Submitted 21 September, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

Comments: 10 pages, 7 figures, accepted by ICCV 2023

arXiv:2308.08825 [pdf, ps, other]

Controlling Federated Learning for Covertness

Authors: Adit Jain, Vikram Krishnamurthy

Abstract: A learner aims to minimize a function $f$ by repeatedly querying a distributed oracle that provides noisy gradient evaluations. At the same time, the learner seeks to hide $\arg\min f$ from a malicious eavesdropper that observes the learner's queries. This paper considers the problem of \textit{covert} or \textit{learner-private} optimization, where the learner has to dynamically choose between le… ▽ More A learner aims to minimize a function $f$ by repeatedly querying a distributed oracle that provides noisy gradient evaluations. At the same time, the learner seeks to hide $\arg\min f$ from a malicious eavesdropper that observes the learner's queries. This paper considers the problem of \textit{covert} or \textit{learner-private} optimization, where the learner has to dynamically choose between learning and obfuscation by exploiting the stochasticity. The problem of controlling the stochastic gradient algorithm for covert optimization is modeled as a Markov decision process, and we show that the dynamic programming operator has a supermodular structure implying that the optimal policy has a monotone threshold structure. A computationally efficient policy gradient algorithm is proposed to search for the optimal querying policy without knowledge of the transition probabilities. As a practical application, our methods are demonstrated on a hate speech classification task in a federated setting where an eavesdropper can use the optimal weights to generate toxic content, which is more easily misclassified. Numerical results show that when the learner uses the optimal policy, an eavesdropper can only achieve a validation accuracy of $52\%$ with no information and $69\%$ when it has a public dataset with 10\% positive samples compared to $83\%$ when the learner employs a greedy policy. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.08638 [pdf, other]

Fair GANs through model rebalancing for extremely imbalanced class distributions

Authors: Anubhav Jain, Nasir Memon, Julian Togelius

Abstract: Deep generative models require large amounts of training data. This often poses a problem as the collection of datasets can be expensive and difficult, in particular datasets that are representative of the appropriate underlying distribution (e.g. demographic). This introduces biases in datasets which are further propagated in the models. We present an approach to construct an unbiased generative… ▽ More Deep generative models require large amounts of training data. This often poses a problem as the collection of datasets can be expensive and difficult, in particular datasets that are representative of the appropriate underlying distribution (e.g. demographic). This introduces biases in datasets which are further propagated in the models. We present an approach to construct an unbiased generative adversarial network (GAN) from an existing biased GAN by rebalancing the model distribution. We do so by generating balanced data from an existing imbalanced deep generative model using an evolutionary algorithm and then using this data to train a balanced generative model. Additionally, we propose a bias mitigation loss function that minimizes the deviation of the learned class distribution from being equiprobable. We show results for the StyleGAN2 models while training on the Flickr Faces High Quality (FFHQ) dataset for racial fairness and see that the proposed approach improves on the fairness metric by almost 5 times, whilst maintaining image quality. We further validate our approach by applying it to an imbalanced CIFAR10 dataset where we show that we can obtain comparable fairness and image quality as when training on a balanced CIFAR10 dataset which is also twice as large. Lastly, we argue that the traditionally used image quality metrics such as Frechet inception distance (FID) are unsuitable for scenarios where the class distributions are imbalanced and a balanced reference set is not available. △ Less

Submitted 21 December, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

arXiv:2308.08515 [pdf]

Investigation of Magnesium Silicate as an Effective Gate Dielectric for AlGaN/GaN Metal Oxide High Electron Mobility Transistors (MOSHEMT)

Authors: Seshasainadh Pudi, Navneet Bhardwaj, Ritam Sarkar, V S Santhosh N Varma Bellamkonda, Umang Singh, Anshul Jain, Swagata Bhunia, Soumyadip Chatterjee, Apurba Laha

Abstract: In this study, a 6 nm layer of Magnesium Silicate (Mg-Silicate) was deposited on AlGaN/GaN heterostructure by sputtering of multiple stacks of MgO and SiO$_{2}$, followed by rapid thermal annealing in a nitrogen (N$_{2}$) environment. The X-ray photoelectron spectroscopy (XPS) analysis confirmed the stoichiometric Mg-Silicate (MgSiO$_{3}$) after being annealed at a temperature of 850 $^\circ$C for… ▽ More In this study, a 6 nm layer of Magnesium Silicate (Mg-Silicate) was deposited on AlGaN/GaN heterostructure by sputtering of multiple stacks of MgO and SiO$_{2}$, followed by rapid thermal annealing in a nitrogen (N$_{2}$) environment. The X-ray photoelectron spectroscopy (XPS) analysis confirmed the stoichiometric Mg-Silicate (MgSiO$_{3}$) after being annealed at a temperature of 850 $^\circ$C for 70 seconds. Atomic force microscopy (AFM) was employed to measure the root mean square (RMS) roughness (2.20 nm) of the Mg-Silicate. A significant reduction in reverse leakage current, by a factor of three orders of magnitude, was noted for the Mg-Silicate/AlGaN/GaN metal-oxide-semiconductor (MOS) diode in comparison to the Schottky diode. The dielectric constant of Mg-Silicate($\mathcal{E}_{Mg-Silicate}$) and the interface density of states (D$_{it}$) with AlGaN were approximated at $\sim$ 6.6 and 2.0 $\times$ 10$^{13}$ cm$^{-2}$eV$^{-1}$ respectively, utilizing capacitance-voltage (CV) characteristics. △ Less

Submitted 16 August, 2023; originally announced August 2023.

arXiv:2308.07021 [pdf, ps, other]

Weighted Szegő Kernels on Planar Domains

Authors: Aakanksha Jain, Kaushal Verma

Abstract: We study properties of weighted Szegő and Garabedian kernels on planar domains. Motivated by the unweighted case as explained in Bell's work, the starting point is a weighted Kerzman-Stein formula that yields boundary smoothness of the weighted Szegő kernel. This provides information on the dependence of the weighted Szegő kernel as a function of the weight. When the weights are close to the const… ▽ More We study properties of weighted Szegő and Garabedian kernels on planar domains. Motivated by the unweighted case as explained in Bell's work, the starting point is a weighted Kerzman-Stein formula that yields boundary smoothness of the weighted Szegő kernel. This provides information on the dependence of the weighted Szegő kernel as a function of the weight. When the weights are close to the constant function $1$ (which corresponds to the unweighted case), it is shown that some properties of the unweighted Szegő kernel propagate to the weighted Szegő kernel as well. Finally, it is shown that the reduced Bergman kernel and higher order reduced Bergman kernels can be written as a rational combination of three unweighted Szegő kernels and their conjugates, thereby extending Bell's list of kernel functions that are made up of simpler building blocks that involve the Szegő kernel. △ Less

Submitted 14 August, 2023; originally announced August 2023.

Comments: 26 pages

MSC Class: 2020 MSC class. Primary: 30C40; Secondary: 31A99

arXiv:2308.03180 [pdf, ps, other]

Cluster radioactivity from trans-tin to superheavy region using an improved empirical formula

Authors: G. Saxena, A. Jain

Abstract: A simple relation $(aZ_{c} + b)(Z_{d}/Q)^{1/2} + (cZ_{c} + d)$ of estimation of the half-life of cluster emission is further improved for cluster and $α$-decays, separately, by incorporating isospin of parent nucleus as well as angular momentum taken away by the emitted particle. This improved version is not only found robust in producing experimental half-lives belonging to the trans-tin and tran… ▽ More A simple relation $(aZ_{c} + b)(Z_{d}/Q)^{1/2} + (cZ_{c} + d)$ of estimation of the half-life of cluster emission is further improved for cluster and $α$-decays, separately, by incorporating isospin of parent nucleus as well as angular momentum taken away by the emitted particle. This improved version is not only found robust in producing experimental half-lives belonging to the trans-tin and trans-lead regions but also elucidates cluster emission in superheavy nuclei over the usual $α$-decay. Considering daughter nuclei around the doubly magic $^{100}$Sn and $^{208}$Pb nuclei for trans-tin and trans-lead (including superheavy) parents, respectively, a systematic and extensive study of 56$\leq$Z$\leq$120 isotopes is performed for the light and heavy cluster emissions. A fair competition among cluster emission, $α$-decay, spontaneous fission, and $β$-decay is observed in this broad range resulting in a substantial probability of C to Sr clusters from several nuclei, which demonstrates the adequacy of shell effects. The present article proposes a single, improved, latest-fitted, and effective formula of cluster radioactivity that can be used to estimate precise half-lives for a wide range of the periodic chart from trans-tin to superheavy nuclei. △ Less

Submitted 6 August, 2023; originally announced August 2023.

Comments: 12 pages, 2 Figures, 5 Tables, Accepted in European Physics Journal A

arXiv:2308.01741 [pdf, other]

Supply chain emission estimation using large language models

Authors: Ayush Jain, Manikandan Padmanaban, Jagabondhu Hazra, Shantanu Godbole, Kommy Weldemariam

Abstract: Large enterprises face a crucial imperative to achieve the Sustainable Development Goals (SDGs), especially goal 13, which focuses on combating climate change and its impacts. To mitigate the effects of climate change, reducing enterprise Scope 3 (supply chain emissions) is vital, as it accounts for more than 90\% of total emission inventories. However, tracking Scope 3 emissions proves challengin… ▽ More Large enterprises face a crucial imperative to achieve the Sustainable Development Goals (SDGs), especially goal 13, which focuses on combating climate change and its impacts. To mitigate the effects of climate change, reducing enterprise Scope 3 (supply chain emissions) is vital, as it accounts for more than 90\% of total emission inventories. However, tracking Scope 3 emissions proves challenging, as data must be collected from thousands of upstream and downstream suppliers.To address the above mentioned challenges, we propose a first-of-a-kind framework that uses domain-adapted NLP foundation models to estimate Scope 3 emissions, by utilizing financial transactions as a proxy for purchased goods and services. We compared the performance of the proposed framework with the state-of-art text classification models such as TF-IDF, word2Vec, and Zero shot learning. Our results show that the domain-adapted foundation model outperforms state-of-the-art text mining techniques and performs as well as a subject matter expert (SME). The proposed framework could accelerate the Scope 3 estimation at Enterprise scale and will help to take appropriate climate actions to achieve SDG 13. △ Less

Submitted 3 August, 2023; originally announced August 2023.

arXiv:2308.00106 [pdf, other]

Entropy Maximization in Sparse Matrix by Vector Multiplication ($\max_E SpMV$)

Authors: Paolo D'Alberto, Abhishek Jain, Ismail Bustany, Henri Fraisse, Mansimran Benipal

Abstract: The peak performance of any SpMV depends primarily on the available memory bandwidth and its effective use. GPUs, ASICs, and new FPGAs have higher and higher bandwidth; however, for large scale and highly sparse matrices, SpMV is still a hard problem because of its random access pattern and workload imbalance. Here, we show how to turn randomness to our advantage. We propose a matrix permutation p… ▽ More The peak performance of any SpMV depends primarily on the available memory bandwidth and its effective use. GPUs, ASICs, and new FPGAs have higher and higher bandwidth; however, for large scale and highly sparse matrices, SpMV is still a hard problem because of its random access pattern and workload imbalance. Here, we show how to turn randomness to our advantage. We propose a matrix permutation pre-processing step that aims to maximize the entropy of the distribution of the nonzero elements. We seek any permutation that uniformly distributes the non-zero elements' distribution, thereby generating a SpMV problem that is amenable to work load balancing or to speed up sort algorithms. We conjecture these permutations would be most effective for matrices with no dense rows or columns and, as in preconditioning, when the matrix is reused. We shall show that entropy maximization is an optimization that any architecture may take advantage although in different ways. Most importantly, any developer can consider and deploy. We shall present cases where we can improve performance by 15\% on AMD-based (GPU-CPU) systems. △ Less

Submitted 24 July, 2023; originally announced August 2023.

Comments: 26 pages

arXiv:2307.14744 [pdf, other]

Wait-Free Updates and Range Search using Uruv

Authors: Gaurav Bhardwaj, Abhay Jain, Bapi Chatterjee, Sathya Peri

Abstract: CRUD operations, along with range queries make a highly useful abstract data type (ADT), employed by many dynamic analytics tasks. Despite its wide applications, to our knowledge, no fully wait-free data structure is known to support this ADT. In this paper, we introduce Uruv, a proactive linearizable and practical wait-free concurrent data structure that implements the ADT mentioned above. Stru… ▽ More CRUD operations, along with range queries make a highly useful abstract data type (ADT), employed by many dynamic analytics tasks. Despite its wide applications, to our knowledge, no fully wait-free data structure is known to support this ADT. In this paper, we introduce Uruv, a proactive linearizable and practical wait-free concurrent data structure that implements the ADT mentioned above. Structurally, Uruv installs a balanced search index on the nodes of a linked list. Uruv is the first wait-free and proactive solution for concurrent B+tree. Experiments show that Uruv significantly outperforms previously proposed lock-free B+trees for dictionary operations and a recently proposed lock-free method to implement the ADT mentioned above. △ Less

Submitted 27 July, 2023; originally announced July 2023.

arXiv:2307.12549 [pdf, other]

Estimating Time to Clear Pendency of Cases in High Courts in India using Linear Regression

Authors: Kshitiz Verma, Anshu Musaddi, Ansh Mittal, Anshul Jain

Abstract: Indian Judiciary is suffering from burden of millions of cases that are lying pending in its courts at all the levels. The High Court National Judicial Data Grid (HC-NJDG) indexes all the cases pending in the high courts and publishes the data publicly. In this paper, we analyze the data that we have collected from the HC-NJDG portal on 229 randomly chosen days between August 31, 2017 to March 22,… ▽ More Indian Judiciary is suffering from burden of millions of cases that are lying pending in its courts at all the levels. The High Court National Judicial Data Grid (HC-NJDG) indexes all the cases pending in the high courts and publishes the data publicly. In this paper, we analyze the data that we have collected from the HC-NJDG portal on 229 randomly chosen days between August 31, 2017 to March 22, 2020, including these dates. Thus, the data analyzed in the paper spans a period of more than two and a half years. We show that: 1) the pending cases in most of the high courts is increasing linearly with time. 2) the case load on judges in various high courts is very unevenly distributed, making judges of some high courts hundred times more loaded than others. 3) for some high courts it may take even a hundred years to clear the pendency cases if proper measures are not taken. We also suggest some policy changes that may help clear the pendency within a fixed time of either five or fifteen years. Finally, we find that the rate of institution of cases in high courts can be easily handled by the current sanctioned strength. However, extra judges are needed only to clear earlier backlogs. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 12 pages, 9 figures, JURISIN 2022. arXiv admin note: text overlap with arXiv:2307.10615

arXiv:2307.10231 [pdf]

Automated Knowledge Modeling for Cancer Clinical Practice Guidelines

Authors: Pralaypati Ta, Bhumika Gupta, Arihant Jain, Sneha Sree C, Arunima Sarkar, Keerthi Ram, Mohanasankar Sivaprakasam

Abstract: Clinical Practice Guidelines (CPGs) for cancer diseases evolve rapidly due to new evidence generated by active research. Currently, CPGs are primarily published in a document format that is ill-suited for managing this develo** knowledge. A knowledge model of the guidelines document suitable for programmatic interaction is required. This work proposes an automated method for extraction of knowle… ▽ More Clinical Practice Guidelines (CPGs) for cancer diseases evolve rapidly due to new evidence generated by active research. Currently, CPGs are primarily published in a document format that is ill-suited for managing this develo** knowledge. A knowledge model of the guidelines document suitable for programmatic interaction is required. This work proposes an automated method for extraction of knowledge from National Comprehensive Cancer Network (NCCN) CPGs in Oncology and generating a structured model containing the retrieved knowledge. The proposed method was tested using two versions of NCCN Non-Small Cell Lung Cancer (NSCLC) CPG to demonstrate the effectiveness in faithful extraction and modeling of knowledge. Three enrichment strategies using Cancer staging information, Unified Medical Language System (UMLS) Metathesaurus & National Cancer Institute thesaurus (NCIt) concepts, and Node classification are also presented to enhance the model towards enabling programmatic traversal and querying of cancer care guidelines. The Node classification was performed using a Support Vector Machine (SVM) model, achieving a classification accuracy of 0.81 with 10-fold cross-validation. △ Less

Submitted 15 July, 2023; originally announced July 2023.

arXiv:2307.06930 [pdf, other]

mBLIP: Efficient Bootstrap** of Multilingual Vision-LLMs

Authors: Gregor Geigle, Abhay Jain, Radu Timofte, Goran Glavaš

Abstract: Modular vision-language models (Vision-LLMs) align pretrained image encoders with (frozen) large language models (LLMs) and post-hoc condition LLMs to `understand' the image input. With the abundance of readily available high-quality English image-text data as well as strong monolingual English LLMs, the research focus has been on English-only Vision-LLMs. Multilingual vision-language models are s… ▽ More Modular vision-language models (Vision-LLMs) align pretrained image encoders with (frozen) large language models (LLMs) and post-hoc condition LLMs to `understand' the image input. With the abundance of readily available high-quality English image-text data as well as strong monolingual English LLMs, the research focus has been on English-only Vision-LLMs. Multilingual vision-language models are still predominantly obtained via expensive end-to-end pretraining, resulting in comparatively smaller models, trained on limited multilingual image data supplemented with text-only multilingual corpora. We present mBLIP, the first Vision-LLM leveraging multilingual LLMs, which we obtain in a computationally efficient manner on consumer-level hardware. To this end, we \textit{re-align} an image encoder previously tuned to an English LLM to a new, multilingual LLM using only a few million multilingual training examples derived from a mix of vision-and-language tasks, which we obtain by machine-translating high-quality English data to 95 languages. On the IGLUE benchmark and XM3600, mBLIP yields results competitive with state-of-the-art models and it greatly outperforms strong English-only Vision-LLMs like Llava 1.5. We release our model, code, and train data at \url{https://github.com/gregor-ge/mBLIP}. △ Less

Submitted 20 June, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: ALVR Workshop 2024

arXiv:2306.17206 [pdf, other]

FarSight: A Physics-Driven Whole-Body Biometric System at Large Distance and Altitude

Authors: Feng Liu, Ryan Ashbaugh, Nicholas Chimitt, Najmul Hassan, Ali Hassani, Ajay Jaiswal, Minchul Kim, Zhiyuan Mao, Christopher Perry, Zhiyuan Ren, Yiyang Su, Pegah Varghaei, Kai Wang, Xingguang Zhang, Stanley Chan, Arun Ross, Humphrey Shi, Zhangyang Wang, Anil Jain, Xiaoming Liu

Abstract: Whole-body biometric recognition is an important area of research due to its vast applications in law enforcement, border security, and surveillance. This paper presents the end-to-end design, development and evaluation of FarSight, an innovative software system designed for whole-body (fusion of face, gait and body shape) biometric recognition. FarSight accepts videos from elevated platforms and… ▽ More Whole-body biometric recognition is an important area of research due to its vast applications in law enforcement, border security, and surveillance. This paper presents the end-to-end design, development and evaluation of FarSight, an innovative software system designed for whole-body (fusion of face, gait and body shape) biometric recognition. FarSight accepts videos from elevated platforms and drones as input and outputs a candidate list of identities from a gallery. The system is designed to address several challenges, including (i) low-quality imagery, (ii) large yaw and pitch angles, (iii) robust feature extraction to accommodate large intra-person variabilities and large inter-person similarities, and (iv) the large domain gap between training and test sets. FarSight combines the physics of imaging and deep learning models to enhance image restoration and biometric feature encoding. We test FarSight's effectiveness using the newly acquired IARPA Biometric Recognition and Identification at Altitude and Range (BRIAR) dataset. Notably, FarSight demonstrated a substantial performance increase on the BRIAR dataset, with gains of +11.82% Rank-20 identification and +11.3% TAR@1% FAR. △ Less

Submitted 6 September, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: 11 pages, 7 figures, accepted in WACV 2024

arXiv:2306.15917 [pdf, other]

Confidence-Calibrated Ensemble Dense Phrase Retrieval

Authors: William Yang, Noah Bergam, Arnav Jain, Nima Sheikhoslami

Abstract: In this paper, we consider the extent to which the transformer-based Dense Passage Retrieval (DPR) algorithm, developed by (Karpukhin et. al. 2020), can be optimized without further pre-training. Our method involves two particular insights: we apply the DPR context encoder at various phrase lengths (e.g. one-sentence versus five-sentence segments), and we take a confidence-calibrated ensemble pred… ▽ More In this paper, we consider the extent to which the transformer-based Dense Passage Retrieval (DPR) algorithm, developed by (Karpukhin et. al. 2020), can be optimized without further pre-training. Our method involves two particular insights: we apply the DPR context encoder at various phrase lengths (e.g. one-sentence versus five-sentence segments), and we take a confidence-calibrated ensemble prediction over all of these different segmentations. This somewhat exhaustive approach achieves start-of-the-art results on benchmark datasets such as Google NQ and SQuAD. We also apply our method to domain-specific datasets, and the results suggest how different granularities are optimal for different domains △ Less

Submitted 28 June, 2023; originally announced June 2023.

arXiv:2306.14808 [pdf, other]

Maximum State Entropy Exploration using Predecessor and Successor Representations

Authors: Arnav Kumar Jain, Lucas Lehnert, Irina Rish, Glen Berseth

Abstract: Animals have a developed ability to explore that aids them in important tasks such as locating food, exploring for shelter, and finding misplaced items. These exploration skills necessarily track where they have been so that they can plan for finding items with relative efficiency. Contemporary exploration algorithms often learn a less efficient exploration strategy because they either condition o… ▽ More Animals have a developed ability to explore that aids them in important tasks such as locating food, exploring for shelter, and finding misplaced items. These exploration skills necessarily track where they have been so that they can plan for finding items with relative efficiency. Contemporary exploration algorithms often learn a less efficient exploration strategy because they either condition only on the current state or simply rely on making random open-loop exploratory moves. In this work, we propose $ηψ$-Learning, a method to learn efficient exploratory policies by conditioning on past episodic experience to make the next exploratory move. Specifically, $ηψ$-Learning learns an exploration policy that maximizes the entropy of the state visitation distribution of a single trajectory. Furthermore, we demonstrate how variants of the predecessor representation and successor representations can be combined to predict the state visitation entropy. Our experiments demonstrate the efficacy of $ηψ$-Learning to strategically explore the environment and maximize the state coverage with limited samples. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.11581 [pdf, ps, other]

doi 10.1088/1402-4896/ace00d

Decay properties of undetected superheavy nuclei with Z>110

Authors: A. Jain, P. K. Sharma, S. K. Jain, Dashty T. Akrawy, G. Saxena

Abstract: A comprehensive study of favoured and unfavoured $α$-decay, cluster decay, weak-decay along with spontaneous fission in undetected superheavy nuclei within the range for proton number 111$\leq$Z$\leq$118 and neutron number 161$\leq$N$\leq$192 is performed. Half-lives for various mentioned decays are estimated with good accuracy on the basis of NUBASE2020 and are found in excellent match with the k… ▽ More A comprehensive study of favoured and unfavoured $α$-decay, cluster decay, weak-decay along with spontaneous fission in undetected superheavy nuclei within the range for proton number 111$\leq$Z$\leq$118 and neutron number 161$\leq$N$\leq$192 is performed. Half-lives for various mentioned decays are estimated with good accuracy on the basis of NUBASE2020 and are found in excellent match with the known half-lives. $α$-decay mode is found most probable in this wide range and correspondingly potential $α$-decay chains are reckoned. Peculiarly, the chances of cluster emission, as well as weak-decay, are also anticipated in this region of the periodic chart which open new pathways of detection of superheavy nuclei. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: 24 pages, 6 figures, 5 tables, Accepted in Physica Scripta

arXiv:2306.10717 [pdf, other]

A neuro-symbolic approach for multimodal reference expression comprehension

Authors: Aman Jain, Anirudh Reddy Kondapally, Kentaro Yamada, Hitomi Yanaka

Abstract: Human-Machine Interaction (HMI) systems have gained huge interest in recent years, with reference expression comprehension being one of the main challenges. Traditionally human-machine interaction has been mostly limited to speech and visual modalities. However, to allow for more freedom in interaction, recent works have proposed the integration of additional modalities, such as gestures in HMI sy… ▽ More Human-Machine Interaction (HMI) systems have gained huge interest in recent years, with reference expression comprehension being one of the main challenges. Traditionally human-machine interaction has been mostly limited to speech and visual modalities. However, to allow for more freedom in interaction, recent works have proposed the integration of additional modalities, such as gestures in HMI systems. We consider such an HMI system with pointing gestures and construct a table-top object picking scenario inside a simulated virtual reality (VR) environment to collect data. Previous works for such a task have used deep neural networks to classify the referred object, which lacks transparency. In this work, we propose an interpretable and compositional model, crucial to building robust HMI systems for real-world application, based on a neuro-symbolic approach to tackle this task. Finally we also show the generalizability of our model on unseen environments and report the results. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: Appeared in the 37th Annual Conference of the Japanese Society for Artificial Intelligence, 2023

arXiv:2306.09649 [pdf, other]

doi 10.1145/3613904.3642517

ReactGenie: A Development Framework for Complex Multimodal Interactions Using Large Language Models

Authors: Jackie Junrui Yang, Yingtian Shi, Yuhan Zhang, Karina Li, Daniel Wan Rosli, Anisha Jain, Shuning Zhang, Tianshi Li, James A. Landay, Monica S. Lam

Abstract: By combining voice and touch interactions, multimodal interfaces can surpass the efficiency of either modality alone. Traditional multimodal frameworks require laborious developer work to support rich multimodal commands where the user's multimodal command involves possibly exponential combinations of actions/function invocations. This paper presents ReactGenie, a programming framework that better… ▽ More By combining voice and touch interactions, multimodal interfaces can surpass the efficiency of either modality alone. Traditional multimodal frameworks require laborious developer work to support rich multimodal commands where the user's multimodal command involves possibly exponential combinations of actions/function invocations. This paper presents ReactGenie, a programming framework that better separates multimodal input from the computational model to enable developers to create efficient and capable multimodal interfaces with ease. ReactGenie translates multimodal user commands into NLPL (Natural Language Programming Language), a programming language we created, using a neural semantic parser based on large-language models. The ReactGenie runtime interprets the parsed NLPL and composes primitives in the computational model to implement complex user commands. As a result, ReactGenie allows easy implementation and unprecedented richness in commands for end-users of multimodal apps. Our evaluation showed that 12 developers can learn and build a nontrivial ReactGenie application in under 2.5 hours on average. In addition, compared with a traditional GUI, end-users can complete tasks faster and with less task load using ReactGenie apps. △ Less

Submitted 2 May, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

arXiv:2306.09247 [pdf, other]

ATLAS: Automatically Detecting Discrepancies Between Privacy Policies and Privacy Labels

Authors: Akshath Jain, David Rodriguez, Jose M. del Alamo, Norman Sadeh

Abstract: Privacy policies are long, complex documents that end-users seldom read. Privacy labels aim to ameliorate these issues by providing succinct summaries of salient data practices. In December 2020, Apple began requiring that app developers submit privacy labels describing their apps' data practices. Yet, research suggests that app developers often struggle to do so. In this paper, we automatically i… ▽ More Privacy policies are long, complex documents that end-users seldom read. Privacy labels aim to ameliorate these issues by providing succinct summaries of salient data practices. In December 2020, Apple began requiring that app developers submit privacy labels describing their apps' data practices. Yet, research suggests that app developers often struggle to do so. In this paper, we automatically identify possible discrepancies between mobile app privacy policies and their privacy labels. Such discrepancies could be indicators of potential privacy compliance issues. We introduce the Automated Privacy Label Analysis System (ATLAS). ATLAS includes three components: a pipeline to systematically retrieve iOS App Store listings and privacy policies; an ensemble-based classifier capable of predicting privacy labels from the text of privacy policies with 91.3% accuracy using state-of-the-art NLP techniques; and a discrepancy analysis mechanism that enables a large-scale privacy analysis of the iOS App Store. Our system has enabled us to analyze 354,725 iOS apps. We find several interesting trends. For example, only 40.3% of apps in the App Store provide easily accessible privacy policies, and only 29.6% of apps provide both accessible privacy policies and privacy labels. Among apps that provide both, 88.0% have at least one possible discrepancy between the text of their privacy policy and their privacy label, which could be indicative of a potential compliance issue. We find that, on average, apps have 5.32 such potential compliance issues. We hope that ATLAS will help app developers, researchers, regulators, and mobile app stores alike. For example, app developers could use our classifier to check for discrepancies between their privacy policies and privacy labels, and regulators could use our system to help review apps at scale for potential compliance issues. △ Less

Submitted 24 May, 2023; originally announced June 2023.

Comments: 14 pages, 13 figures

arXiv:2306.04597 [pdf, other]

Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions

Authors: Himanshu Thakur, Atishay Jain, Praneetha Vaddamanu, Paul Pu Liang, Louis-Philippe Morency

Abstract: Societal biases present in pre-trained large language models are a critical issue as these models have been shown to propagate biases in countless downstream applications, rendering them unfair towards specific groups of people. Since large-scale retraining of these models from scratch is both time and compute-expensive, a variety of approaches have been previously proposed that de-bias a pre-trai… ▽ More Societal biases present in pre-trained large language models are a critical issue as these models have been shown to propagate biases in countless downstream applications, rendering them unfair towards specific groups of people. Since large-scale retraining of these models from scratch is both time and compute-expensive, a variety of approaches have been previously proposed that de-bias a pre-trained model. While the majority of current state-of-the-art debiasing methods focus on changes to the training regime, in this paper, we propose data intervention strategies as a powerful yet simple technique to reduce gender bias in pre-trained models. Specifically, we empirically show that by fine-tuning a pre-trained model on only 10 de-biased (intervened) training examples, the tendency to favor any gender is significantly reduced. Since our proposed method only needs a few training examples, our few-shot debiasing approach is highly feasible and practical. Through extensive experimentation, we show that our debiasing technique performs better than competitive state-of-the-art baselines with minimal loss in language modeling ability. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: Accepted to ACL 2023 Main Conference

arXiv:2306.02617 [pdf, other]

Permutation Decision Trees

Authors: Harikrishnan N B, Arham Jain, Nithin Nagaraj

Abstract: Decision Tree is a well understood Machine Learning model that is based on minimizing impurities in the internal nodes. The most common impurity measures are Shannon entropy and Gini impurity. These impurity measures are insensitive to the order of training data and hence the final tree obtained is invariant to any permutation of the data. This is a limitation in terms of modeling when there are t… ▽ More Decision Tree is a well understood Machine Learning model that is based on minimizing impurities in the internal nodes. The most common impurity measures are Shannon entropy and Gini impurity. These impurity measures are insensitive to the order of training data and hence the final tree obtained is invariant to any permutation of the data. This is a limitation in terms of modeling when there are temporal order dependencies between data instances. In this research, we propose the adoption of Effort-To-Compress (ETC) - a complexity measure, for the first time, as an alternative impurity measure. Unlike Shannon entropy and Gini impurity, structural impurity based on ETC is able to capture order dependencies in the data, thus obtaining potentially different decision trees for different permutations of the same data instances, a concept we term as Permutation Decision Trees (PDT). We then introduce the notion of Permutation Bagging achieved using permutation decision trees without the need for random feature selection and sub-sampling. We conduct a performance comparison between Permutation Decision Trees and classical decision trees across various real-world datasets, including Appendicitis, Breast Cancer Wisconsin, Diabetes Pima Indian, Ionosphere, Iris, Sonar, and Wine. Our findings reveal that PDT demonstrates comparable performance to classical decision trees across most datasets. Remarkably, in certain instances, PDT even slightly surpasses the performance of classical decision trees. In comparing Permutation Bagging with Random Forest, we attain comparable performance to Random Forest models consisting of 50 to 1000 trees, using merely 21 trees. This highlights the efficiency and effectiveness of Permutation Bagging in achieving comparable performance outcomes with significantly fewer trees. △ Less

Submitted 31 May, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 15 pages, 8 figures

arXiv:2306.00942 [pdf, other]

Train Offline, Test Online: A Real Robot Learning Benchmark

Authors: Gaoyue Zhou, Victoria Dean, Mohan Kumar Srirama, Aravind Rajeswaran, Jyothish Pari, Kyle Hatch, Aryan Jain, Tianhe Yu, Pieter Abbeel, Lerrel Pinto, Chelsea Finn, Abhinav Gupta

Abstract: Three challenges limit the progress of robot learning research: robots are expensive (few labs can participate), everyone uses different robots (findings do not generalize across labs), and we lack internet-scale robotics data. We take on these challenges via a new benchmark: Train Offline, Test Online (TOTO). TOTO provides remote users with access to shared robotic hardware for evaluating methods… ▽ More Three challenges limit the progress of robot learning research: robots are expensive (few labs can participate), everyone uses different robots (findings do not generalize across labs), and we lack internet-scale robotics data. We take on these challenges via a new benchmark: Train Offline, Test Online (TOTO). TOTO provides remote users with access to shared robotic hardware for evaluating methods on common tasks and an open-source dataset of these tasks for offline training. Its manipulation task suite requires challenging generalization to unseen objects, positions, and lighting. We present initial results on TOTO comparing five pretrained visual representations and four offline policy learning baselines, remotely contributed by five institutions. The real promise of TOTO, however, lies in the future: we release the benchmark for additional submissions from any user, enabling easy, direct comparison to several methods without the need to obtain hardware or collect data. △ Less

Submitted 30 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

Comments: Accepted to ICRA 2023

arXiv:2306.00272 [pdf, other]

Accelerated Fingerprint Enhancement: A GPU-Optimized Mixed Architecture Approach

Authors: André Brasil Vieira Wyzykowski, Anil K. Jain

Abstract: This document presents a preliminary approach to latent fingerprint enhancement, fundamentally designed around a mixed Unet architecture. It combines the capabilities of the Resnet-101 network and Unet encoder, aiming to form a potentially powerful composite. This combination, enhanced with attention mechanisms and forward skip connections, is intended to optimize the enhancement of ridge and minu… ▽ More This document presents a preliminary approach to latent fingerprint enhancement, fundamentally designed around a mixed Unet architecture. It combines the capabilities of the Resnet-101 network and Unet encoder, aiming to form a potentially powerful composite. This combination, enhanced with attention mechanisms and forward skip connections, is intended to optimize the enhancement of ridge and minutiae features in fingerprints. One innovative element of this approach includes a novel Fingerprint Enhancement Gabor layer, specifically designed for GPU computations. This illustrates how modern computational resources might be harnessed to expedite enhancement. Given its potential functionality as either a CNN or Transformer layer, this Gabor layer could offer improved agility and processing speed to the system. However, it is important to note that this approach is still in the early stages of development and has not yet been fully validated through rigorous experiments. As such, it may require additional time and testing to establish its robustness and usability in the field of latent fingerprint enhancement. This includes improvements in processing speed, enhancement adaptability with distinct latent fingerprint types, and full validation in experimental approaches such as open-set (identification 1:N) and open-set validation, fingerprint quality evaluation, among others. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2306.00231 [pdf, other]

A Universal Latent Fingerprint Enhancer Using Transformers

Authors: Andre Brasil Vieira Wyzykowski, Anil K. Jain

Abstract: Forensic science heavily relies on analyzing latent fingerprints, which are crucial for criminal investigations. However, various challenges, such as background noise, overlap** prints, and contamination, make the identification process difficult. Moreover, limited access to real crime scene and laboratory-generated databases hinders the development of efficient recognition algorithms. This stud… ▽ More Forensic science heavily relies on analyzing latent fingerprints, which are crucial for criminal investigations. However, various challenges, such as background noise, overlap** prints, and contamination, make the identification process difficult. Moreover, limited access to real crime scene and laboratory-generated databases hinders the development of efficient recognition algorithms. This study aims to develop a fast method, which we call ULPrint, to enhance various latent fingerprint types, including those obtained from real crime scenes and laboratory-created samples, to boost fingerprint recognition system performance. In closed-set identification accuracy experiments, the enhanced image was able to improve the performance of the MSU-AFIS from 61.56\% to 75.19\% in the NIST SD27 database, from 67.63\% to 77.02\% in the MSP Latent database, and from 46.90\% to 52.12\% in the NIST SD302 database. Our contributions include (1) the development of a two-step latent fingerprint enhancement method that combines Ridge Segmentation with UNet and Mix Visual Transformer (MiT) SegFormer-B5 encoder architecture, (2) the implementation of multiple dilated convolutions in the UNet architecture to capture intricate, non-local patterns better and enhance ridge segmentation, and (3) the guided blending of the predicted ridge mask with the latent fingerprint. This novel approach, ULPrint, streamlines the enhancement process, addressing challenges across diverse latent fingerprint types to improve forensic investigations and criminal justice outcomes. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2305.18341 [pdf, other]

Coarse-Tuning Models of Code with Reinforcement Learning Feedback

Authors: Abhinav Jain, Chima Adiole, Swarat Chaudhuri, Thomas Reps, Chris Jermaine

Abstract: Large Language Models (LLMs) pre-trained on code have recently emerged as the dominant approach to program synthesis. However, these models are trained using next-token prediction, which ignores the syntax and semantics of code. We propose RLCF, that further trains a pre-trained LLM via reinforcement learning, using feedback from a grounding function that scores the quality of the code. The ground… ▽ More Large Language Models (LLMs) pre-trained on code have recently emerged as the dominant approach to program synthesis. However, these models are trained using next-token prediction, which ignores the syntax and semantics of code. We propose RLCF, that further trains a pre-trained LLM via reinforcement learning, using feedback from a grounding function that scores the quality of the code. The grounding function uses (i) compiler-derived feedback on whether the code it generates passes a set of correctness checks; and (ii) feedback from a different LLM that compares the generated code to a reference code. RLCF is model- and language-agnostic. We empirically evaluate it on the MBJP and MathQA tasks for Java. Our experiments show that RLCF raises the odds that an LLM-generated program compiles, is executable, and produces the right output on tests, often allowing LLMs to match the performance of 2x-8x larger LLMs. △ Less

Submitted 23 December, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Comments: 23 pages

arXiv:2305.14343 [pdf, other]

Video Prediction Models as Rewards for Reinforcement Learning

Authors: Alejandro Escontrela, Ademi Adeniji, Wilson Yan, Ajay Jain, Xue Bin Peng, Ken Goldberg, Youngwoon Lee, Danijar Hafner, Pieter Abbeel

Abstract: Specifying reward signals that allow agents to learn complex behaviors is a long-standing challenge in reinforcement learning. A promising approach is to extract preferences for behaviors from unlabeled videos, which are widely available on the internet. We present Video Prediction Rewards (VIPER), an algorithm that leverages pretrained video prediction models as action-free reward signals for rei… ▽ More Specifying reward signals that allow agents to learn complex behaviors is a long-standing challenge in reinforcement learning. A promising approach is to extract preferences for behaviors from unlabeled videos, which are widely available on the internet. We present Video Prediction Rewards (VIPER), an algorithm that leverages pretrained video prediction models as action-free reward signals for reinforcement learning. Specifically, we first train an autoregressive transformer on expert videos and then use the video prediction likelihoods as reward signals for a reinforcement learning agent. VIPER enables expert-level control without programmatic task rewards across a wide range of DMC, Atari, and RLBench tasks. Moreover, generalization of the video prediction model allows us to derive rewards for an out-of-distribution environment where no expert data is available, enabling cross-embodiment generalization for tabletop manipulation. We see our work as starting point for scalable reward specification from unlabeled videos that will benefit from the rapid advances in generative modeling. Source code and datasets are available on the project website: https://escontrela.me/viper △ Less

Submitted 30 May, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

Comments: 22 pages, 18 figures, 4 tables. under review

arXiv:2305.07710 [pdf, other]

Zero-shot racially balanced dataset generation using an existing biased StyleGAN2

Authors: Anubhav Jain, Nasir Memon, Julian Togelius

Abstract: Facial recognition systems have made significant strides thanks to data-heavy deep learning models, but these models rely on large privacy-sensitive datasets. Further, many of these datasets lack diversity in terms of ethnicity and demographics, which can lead to biased models that can have serious societal and security implications. To address these issues, we propose a methodology that leverages… ▽ More Facial recognition systems have made significant strides thanks to data-heavy deep learning models, but these models rely on large privacy-sensitive datasets. Further, many of these datasets lack diversity in terms of ethnicity and demographics, which can lead to biased models that can have serious societal and security implications. To address these issues, we propose a methodology that leverages the biased generative model StyleGAN2 to create demographically diverse images of synthetic individuals. The synthetic dataset is created using a novel evolutionary search algorithm that targets specific demographic groups. By training face recognition models with the resulting balanced dataset containing 50,000 identities per race (13.5 million images in total), we can improve their performance and minimize biases that might have been present in a model trained on a real dataset. △ Less

Submitted 18 September, 2023; v1 submitted 12 May, 2023; originally announced May 2023.

arXiv:2305.07602 [pdf, other]

ViT Unified: Joint Fingerprint Recognition and Presentation Attack Detection

Authors: Steven A. Grosz, Kanishka P. Wijewardena, Anil K. Jain

Abstract: A secure fingerprint recognition system must contain both a presentation attack (i.e., spoof) detection and recognition module in order to protect users against unwanted access by malicious users. Traditionally, these tasks would be carried out by two independent systems; however, recent studies have demonstrated the potential to have one unified system architecture in order to reduce the computat… ▽ More A secure fingerprint recognition system must contain both a presentation attack (i.e., spoof) detection and recognition module in order to protect users against unwanted access by malicious users. Traditionally, these tasks would be carried out by two independent systems; however, recent studies have demonstrated the potential to have one unified system architecture in order to reduce the computational burdens on the system, while maintaining high accuracy. In this work, we leverage a vision transformer architecture for joint spoof detection and matching and report competitive results with state-of-the-art (SOTA) models for both a sequential system (two ViT models operating independently) and a unified architecture (a single ViT model for both tasks). ViT models are particularly well suited for this task as the ViT's global embedding encodes features useful for recognition, whereas the individual, local embeddings are useful for spoof detection. We demonstrate the capability of our unified model to achieve an average integrated matching (IM) accuracy of 98.87% across LivDet 2013 and 2015 CrossMatch sensors. This is comparable to IM accuracy of 98.95% of our sequential dual-ViT system, but with ~50% of the parameters and ~58% of the latency. △ Less

Submitted 12 May, 2023; originally announced May 2023.

arXiv:2305.07552 [pdf, other]

Dish detection in food platters: A framework for automated diet logging and nutrition management

Authors: Mansi Goel, Shashank Dargar, Shounak Ghatak, Nidhi Verma, Pratik Chauhan, Anushka Gupta, Nikhila Vishnumolakala, Hareesh Amuru, Ekta Gambhir, Ronak Chhajed, Meenal Jain, Astha Jain, Samiksha Garg, Nitesh Narwade, Nikhilesh Verhwani, Abhuday Tiwari, Kirti Vashishtha, Ganesh Bagler

Abstract: Diet is central to the epidemic of lifestyle disorders. Accurate and effortless diet logging is one of the significant bottlenecks for effective diet management and calorie restriction. Dish detection from food platters is a challenging problem due to a visually complex food layout. We present an end-to-end computational framework for diet management, from data compilation, annotation, and state-o… ▽ More Diet is central to the epidemic of lifestyle disorders. Accurate and effortless diet logging is one of the significant bottlenecks for effective diet management and calorie restriction. Dish detection from food platters is a challenging problem due to a visually complex food layout. We present an end-to-end computational framework for diet management, from data compilation, annotation, and state-of-the-art model identification to its mobile app implementation. As a case study, we implement the framework in the context of Indian food platters known for their complex presentation that poses a challenge for the automated detection of dishes. Starting with the 61 most popular Indian dishes, we identify the state-of-the-art model through a comparative analysis of deep-learning-based object detection architectures. Rooted in a meticulous compilation of 68,005 platter images with 134,814 manual dish annotations, we first compare ten architectures for multi-label classification to identify ResNet152 (mAP=84.51%) as the best model. YOLOv8x (mAP=87.70%) emerged as the best model architecture for dish detection among the eight deep-learning models implemented after a thorough performance evaluation. By comparing with the state-of-the-art model for the IndianFood10 dataset, we demonstrate the superior object detection performance of YOLOv8x for this subset and establish Resnet152 as the best architecture for multi-label classification. The models thus trained on richly annotated data can be extended to include dishes from across global cuisines. The proposed framework is demonstrated through a proof-of-concept mobile application with diverse applications for diet logging, food recommendation systems, nutritional interventions, and mitigation of lifestyle disorders. △ Less

Submitted 12 May, 2023; originally announced May 2023.

Comments: 11 pages, 5 figures, 5 tables. Submitted to the 8th International Conference on Computer Vision & Image Processing (CVIP-2023)

ACM Class: I.4.9; I.5.4; J.3

arXiv:2305.05161 [pdf, other]

Child Palm-ID: Contactless Palmprint Recognition for Children

Authors: Akash Godbole, Steven A. Grosz, Anil K. Jain

Abstract: Effective distribution of nutritional and healthcare aid for children, particularly infants and toddlers, in some of the least developed and most impoverished countries of the world, is a major problem due to the lack of reliable identification documents. Biometric authentication technology has been investigated to address child recognition in the absence of reliable ID documents. We present a mob… ▽ More Effective distribution of nutritional and healthcare aid for children, particularly infants and toddlers, in some of the least developed and most impoverished countries of the world, is a major problem due to the lack of reliable identification documents. Biometric authentication technology has been investigated to address child recognition in the absence of reliable ID documents. We present a mobile-based contactless palmprint recognition system, called Child Palm-ID, which meets the requirements of usability, hygiene, cost, and accuracy for child recognition. Using a contactless child palmprint database, Child-PalmDB1, consisting of 19,158 images from 1,020 unique palms (in the age range of 6 mos. to 48 mos.), we report a TAR=94.11% @ FAR=0.1%. The proposed Child Palm-ID system is also able to recognize adults, achieving a TAR=99.4% on the CASIA contactless palmprint database and a TAR=100% on the COEP contactless adult palmprint database, both @ FAR=0.1%. These accuracies are competitive with the SOTA provided by COTS systems. Despite these high accuracies, we show that the TAR for time-separated child-palmprints is only 78.1% @ FAR=0.1%. △ Less

Submitted 9 May, 2023; originally announced May 2023.

arXiv:2304.14999 [pdf, other]

Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs

Authors: George Pu, Anirudh Jain, Jihan Yin, Russell Kaplan

Abstract: As foundation models continue to exponentially scale in size, efficient methods of adaptation become increasingly critical. Parameter-efficient fine-tuning (PEFT), a recent class of techniques that require only modifying a small percentage of the model parameters, is currently the most popular method for adapting large language models (LLMs). Several PEFT techniques have recently been proposed wit… ▽ More As foundation models continue to exponentially scale in size, efficient methods of adaptation become increasingly critical. Parameter-efficient fine-tuning (PEFT), a recent class of techniques that require only modifying a small percentage of the model parameters, is currently the most popular method for adapting large language models (LLMs). Several PEFT techniques have recently been proposed with varying tradeoffs. We provide a comprehensive and uniform benchmark of various PEFT techniques across a representative LLM, the FLAN-T5 model, and evaluate model performance across different data scales of classification and generation datasets. Based on this, we provide a framework for choosing the optimal fine-tuning techniques given the task type and data availability. Contrary to popular belief, we also empirically prove that PEFT techniques converge slower than full tuning in low data scenarios, and posit the amount of data required for PEFT methods to both perform well and converge efficiently. Lastly, we further optimize these PEFT techniques by selectively choosing which parts of the model to train, and find that these techniques can be applied with significantly fewer parameters while maintaining and even improving performance. △ Less

Submitted 28 April, 2023; originally announced April 2023.

Comments: Short paper, ICLR '23 Workshop on Understanding Foundation Models

arXiv:2304.14391 [pdf, other]

Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement

Authors: Nikolaos Gkanatsios, Ayush Jain, Zhou Xian, Yunchu Zhang, Christopher Atkeson, Katerina Fragkiadaki

Abstract: Language is compositional; an instruction can express multiple relation constraints to hold among objects in a scene that a robot is tasked to rearrange. Our focus in this work is an instructable scene-rearranging framework that generalizes to longer instructions and to spatial concept compositions never seen at training time. We propose to represent language-instructed spatial concepts with energ… ▽ More Language is compositional; an instruction can express multiple relation constraints to hold among objects in a scene that a robot is tasked to rearrange. Our focus in this work is an instructable scene-rearranging framework that generalizes to longer instructions and to spatial concept compositions never seen at training time. We propose to represent language-instructed spatial concepts with energy functions over relative object arrangements. A language parser maps instructions to corresponding energy functions and an open-vocabulary visual-language model grounds their arguments to relevant objects in the scene. We generate goal scene configurations by gradient descent on the sum of energy functions, one per language predicate in the instruction. Local vision-based policies then re-locate objects to the inferred goal locations. We test our model on established instruction-guided manipulation benchmarks, as well as benchmarks of compositional instructions we introduce. We show our model can execute highly compositional instructions zero-shot in simulation and in the real world. It outperforms language-to-action reactive policies and Large Language Model planners by a large margin, especially for long instructions that involve compositions of multiple spatial concepts. Simulation and real-world robot execution videos, as well as our code and datasets are publicly available on our website: https://ebmplanner.github.io. △ Less

Submitted 23 January, 2024; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: First two authors contributed equally | RSS 2023

arXiv:2304.13846 [pdf, other]

Extracting Structured Seed-Mediated Gold Nanorod Growth Procedures from Literature with GPT-3

Authors: Nicholas Walker, John Dagdelen, Kevin Cruse, Sanghoon Lee, Samuel Gleason, Alexander Dunn, Gerbrand Ceder, A. Paul Alivisatos, Kristin A. Persson, Anubhav Jain

Abstract: Although gold nanorods have been the subject of much research, the pathways for controlling their shape and thereby their optical properties remain largely heuristically understood. Although it is apparent that the simultaneous presence of and interaction between various reagents during synthesis control these properties, computational and experimental approaches for exploring the synthesis space… ▽ More Although gold nanorods have been the subject of much research, the pathways for controlling their shape and thereby their optical properties remain largely heuristically understood. Although it is apparent that the simultaneous presence of and interaction between various reagents during synthesis control these properties, computational and experimental approaches for exploring the synthesis space can be either intractable or too time-consuming in practice. This motivates an alternative approach leveraging the wealth of synthesis information already embedded in the body of scientific literature by develo** tools to extract relevant structured data in an automated, high-throughput manner. To that end, we present an approach using the powerful GPT-3 language model to extract structured multi-step seed-mediated growth procedures and outcomes for gold nanorods from unstructured scientific text. GPT-3 prompt completions are fine-tuned to predict synthesis templates in the form of JSON documents from unstructured text input with an overall accuracy of $86\%$. The performance is notable, considering the model is performing simultaneous entity recognition and relation extraction. We present a dataset of 11,644 entities extracted from 1,137 papers, resulting in 268 papers with at least one complete seed-mediated gold nanorod growth procedure and outcome for a total of 332 complete procedures. △ Less

Submitted 26 April, 2023; originally announced April 2023.

arXiv:2304.13800 [pdf, other]

Latent Fingerprint Recognition: Fusion of Local and Global Embeddings

Authors: Steven A. Grosz, Anil K. Jain

Abstract: One of the most challenging problems in fingerprint recognition continues to be establishing the identity of a suspect associated with partial and smudgy fingerprints left at a crime scene (i.e., latent prints or fingermarks). Despite the success of fixed-length embeddings for rolled and slap fingerprint recognition, the features learned for latent fingerprint matching have mostly been limited to… ▽ More One of the most challenging problems in fingerprint recognition continues to be establishing the identity of a suspect associated with partial and smudgy fingerprints left at a crime scene (i.e., latent prints or fingermarks). Despite the success of fixed-length embeddings for rolled and slap fingerprint recognition, the features learned for latent fingerprint matching have mostly been limited to local minutiae-based embeddings and have not directly leveraged global representations for matching. In this paper, we combine global embeddings with local embeddings for state-of-the-art latent to rolled matching accuracy with high throughput. The combination of both local and global representations leads to improved recognition accuracy across NIST SD 27, NIST SD 302, MSP, MOLF DB1/DB4, and MOLF DB2/DB4 latent fingerprint datasets for both closed-set (84.11%, 54.36%, 84.35%, 70.43%, 62.86% rank-1 retrieval rate, respectively) and open-set (0.50, 0.74, 0.44, 0.60, 0.68 FNIR at FPIR=0.02, respectively) identification scenarios on a gallery of 100K rolled fingerprints. Not only do we fuse the complimentary representations, we also use the local features to guide the global representations to focus on discriminatory regions in two fingerprint images to be compared. This leads to a multi-stage matching paradigm in which subsets of the retrieved candidate lists for each probe image are passed to subsequent stages for further processing, resulting in a considerable reduction in latency (requiring just 0.068 ms per latent to rolled comparison on a AMD EPYC 7543 32-Core Processor, roughly 15K comparisons per second). Finally, we show the generalizability of the fused representations for improving authentication accuracy across several rolled, plain, and contactless fingerprint datasets. △ Less

Submitted 7 September, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

arXiv:2304.10799 [pdf, other]

A scalable solution for the extended multi-channel facility location problem

Authors: Etika Agarwal, Karthik S. Gurumoorthy, Ankit Ajit Jain, Shantala Manchenahally

Abstract: We study the extended version of the non-uniform, capacitated facility location problem with multiple fulfilment channels between the facilities and clients, each with their own channel capacities and service cost. Though the problem has been extensively studied in the literature, all the prior works assume a single channel of fulfilment, and the existing methods based on linear programming, prima… ▽ More We study the extended version of the non-uniform, capacitated facility location problem with multiple fulfilment channels between the facilities and clients, each with their own channel capacities and service cost. Though the problem has been extensively studied in the literature, all the prior works assume a single channel of fulfilment, and the existing methods based on linear programming, primal-dual relationships, local search heuristics etc. do not scale for a large supply chain system involving millions of decision variables. Using the concepts of sub-modularity and optimal transport theory, we present a scalable algorithm for determining the set of facilities to be opened under a cardinality constraint. By introducing various schemes such as: (i) iterative facility selection using incremental gain, (ii) approximation of the linear program using novel multi-stage Sinkhorn iterations, (iii) creation of facilities one for each fulfilment channel etc., we develop a fast but a tight approximate solution, requiring $\mathcal{O}\left(\frac{3+k}{m}ln\left(\frac{1}ε\right)\right)$ instances of optimal transport problems to select k facilities from m options, each solvable in linear time. Our algorithm is implicitly endowed with all the theoretical guarantees enjoyed by submodular maximisation problems and the Sinkhorn distances. When compared against the state-of-the-art commercial MILP solvers, we obtain a 100-fold speedup in computation, while the difference in objective values lies within a narrow range of 3%. △ Less

Submitted 21 April, 2023; originally announced April 2023.

arXiv:2304.09852 [pdf, ps, other]

Dipole superfluid hydrodynamics

Authors: Akash Jain, Kristan Jensen, Ruochuan Liu, Eric Mefford

Abstract: We construct a theory of hydrodynamic transport for systems with conserved dipole moment, U(1) charge, energy, and momentum. These models have been considered in the context of fractons, since their elementary and isolated charges are immobile by symmetry, and have two known translation-invariant gapless phases: a "p-wave dipole superfluid" phase where the dipole symmetry is spontaneously broken a… ▽ More We construct a theory of hydrodynamic transport for systems with conserved dipole moment, U(1) charge, energy, and momentum. These models have been considered in the context of fractons, since their elementary and isolated charges are immobile by symmetry, and have two known translation-invariant gapless phases: a "p-wave dipole superfluid" phase where the dipole symmetry is spontaneously broken and a "s-wave dipole superfluid" phase where both the U(1) and dipole symmetries are spontaneously broken. We argue on grounds of symmetry and thermodynamics that there is no transitionally-invariant gapless fluid with unbroken dipole symmetry. In this work, we primarily focus on the hydrodynamic description of p-wave dipole superfluids, including leading dissipative corrections. That theory has, in a sense, a dynamical scaling exponent $z=2$, and its spectrum of fluctuations includes novel subdiffusive modes $ω\sim -i k^4$ in the shear sector and magnon-like sound mode $ω\sim \pm k^2 -i k^2$. By coupling the fluid to background fields, we find response functions of the various symmetry currents. We also present a preliminary generalization of our work to s-wave dipole superfluids, which resemble $z=1$ fluids and feature sound waves and diffusive shear modes, as in an ordinary fluid. However, the spectrum also contains a magnon-like second-sound mode $ω\sim \pm k^2 \pm k^4 -i k^4$ with subdiffusive attenuation. △ Less

Submitted 29 January, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: 53 pages plus appendices; we have included a Mathematica notebook to the arXiv submission which computes dispersion relations and response functions; v2: fixed typos

arXiv:2304.08769 [pdf, ps, other]

Cooperative Multi-Agent Reinforcement Learning for Inventory Management

Authors: Madhav Khirwar, Karthik S. Gurumoorthy, Ankit Ajit Jain, Shantala Manchenahally

Abstract: With Reinforcement Learning (RL) for inventory management (IM) being a nascent field of research, approaches tend to be limited to simple, linear environments with implementations that are minor modifications of off-the-shelf RL algorithms. Scaling these simplistic environments to a real-world supply chain comes with a few challenges such as: minimizing the computational requirements of the enviro… ▽ More With Reinforcement Learning (RL) for inventory management (IM) being a nascent field of research, approaches tend to be limited to simple, linear environments with implementations that are minor modifications of off-the-shelf RL algorithms. Scaling these simplistic environments to a real-world supply chain comes with a few challenges such as: minimizing the computational requirements of the environment, specifying agent configurations that are representative of dynamics at real world stores and warehouses, and specifying a reward framework that encourages desirable behavior across the whole supply chain. In this work, we present a system with a custom GPU-parallelized environment that consists of one warehouse and multiple stores, a novel architecture for agent-environment dynamics incorporating enhanced state and action spaces, and a shared reward specification that seeks to optimize for a large retailer's supply chain needs. Each vertex in the supply chain graph is an independent agent that, based on its own inventory, able to place replenishment orders to the vertex upstream. The warehouse agent, aside from placing orders from the supplier, has the special property of also being able to constrain replenishment to stores downstream, which results in it learning an additional allocation sub-policy. We achieve a system that outperforms standard inventory control policies such as a base-stock policy and other RL-based specifications for 1 product, and lay out a future direction of work for multiple products. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: 14 pages, 5 figures

arXiv:2304.07060 [pdf, other]

DCFace: Synthetic Face Generation with Dual Condition Diffusion Model

Authors: Minchul Kim, Feng Liu, Anil Jain, Xiaoming Liu

Abstract: Generating synthetic datasets for training face recognition models is challenging because dataset generation entails more than creating high fidelity images. It involves generating multiple images of same subjects under different factors (\textit{e.g.}, variations in pose, illumination, expression, aging and occlusion) which follows the real image conditional distribution. Previous works have stud… ▽ More Generating synthetic datasets for training face recognition models is challenging because dataset generation entails more than creating high fidelity images. It involves generating multiple images of same subjects under different factors (\textit{e.g.}, variations in pose, illumination, expression, aging and occlusion) which follows the real image conditional distribution. Previous works have studied the generation of synthetic datasets using GAN or 3D models. In this work, we approach the problem from the aspect of combining subject appearance (ID) and external factor (style) conditions. These two conditions provide a direct way to control the inter-class and intra-class variations. To this end, we propose a Dual Condition Face Generator (DCFace) based on a diffusion model. Our novel Patch-wise style extractor and Time-step dependent ID loss enables DCFace to consistently produce face images of the same subject under different styles with precise control. Face recognition models trained on synthetic images from the proposed DCFace provide higher verification accuracies compared to previous works by $6.11\%$ on average in $4$ out of $5$ test datasets, LFW, CFP-FP, CPLFW, AgeDB and CALFW. Code is available at https://github.com/mk-minchul/dcface △ Less

Submitted 14 April, 2023; originally announced April 2023.

Comments: To appear in CVPR 2023

arXiv:2304.06861 [pdf, other]

Evaluation of Social Biases in Recent Large Pre-Trained Models

Authors: Swapnil Sharma, Nikita Anand, Kranthi Kiran G. V., Alind Jain

Abstract: Large pre-trained language models are widely used in the community. These models are usually trained on unmoderated and unfiltered data from open sources like the Internet. Due to this, biases that we see in platforms online which are a reflection of those in society are in turn captured and learned by these models. These models are deployed in applications that affect millions of people and their… ▽ More Large pre-trained language models are widely used in the community. These models are usually trained on unmoderated and unfiltered data from open sources like the Internet. Due to this, biases that we see in platforms online which are a reflection of those in society are in turn captured and learned by these models. These models are deployed in applications that affect millions of people and their inherent biases are harmful to the targeted social groups. In this work, we study the general trend in bias reduction as newer pre-trained models are released. Three recent models ( ELECTRA, DeBERTa, and DistilBERT) are chosen and evaluated against two bias benchmarks, StereoSet and CrowS-Pairs. They are compared to the baseline of BERT using the associated metrics. We explore whether as advancements are made and newer, faster, lighter models are released: are they being developed responsibly such that their inherent social biases have been reduced compared to their older counterparts? The results are compiled and we find that all the models under study do exhibit biases but have generally improved as compared to BERT. △ Less

Submitted 13 April, 2023; originally announced April 2023.

Comments: 7 pages, 4 Tables

arXiv:2304.06321 [pdf, other]

doi 10.1109/EMBC40787.2023.10341052

EEG Cortical Source Feature based Hand Kinematics Decoding using Residual CNN-LSTM Neural Network

Authors: Anant Jain, Lalan Kumar

Abstract: Motor kinematics decoding (MKD) using brain signal is essential to develop Brain-computer interface (BCI) system for rehabilitation or prosthesis devices. Surface electroencephalogram (EEG) signal has been widely utilized for MKD. However, kinematic decoding from cortical sources is sparsely explored. In this work, the feasibility of hand kinematics decoding using EEG cortical source signals has b… ▽ More Motor kinematics decoding (MKD) using brain signal is essential to develop Brain-computer interface (BCI) system for rehabilitation or prosthesis devices. Surface electroencephalogram (EEG) signal has been widely utilized for MKD. However, kinematic decoding from cortical sources is sparsely explored. In this work, the feasibility of hand kinematics decoding using EEG cortical source signals has been explored for grasp and lift task. In particular, pre-movement EEG segment is utilized. A residual convolutional neural network (CNN) - long short-term memory (LSTM) based kinematics decoding model is proposed that utilizes motor neural information present in pre-movement brain activity. Various EEG windows at 50 ms prior to movement onset, are utilized for hand kinematics decoding. Correlation value (CV) between actual and predicted hand kinematics is utilized as performance metric for source and sensor domain. The performance of the proposed deep learning model is compared in sensor and source domain. The results demonstrate the viability of hand kinematics decoding using pre-movement EEG cortical source data. △ Less

Submitted 13 April, 2023; originally announced April 2023.

Journal ref: 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia, 2023

arXiv:2303.18240 [pdf, other]

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

Authors: Arjun Majumdar, Karmesh Yadav, Sergio Arnaud, Yecheng Jason Ma, Claire Chen, Sneha Silwal, Aryan Jain, Vincent-Pierre Berges, Pieter Abbeel, Jitendra Malik, Dhruv Batra, Yixin Lin, Oleksandr Maksymets, Aravind Rajeswaran, Franziska Meier

Abstract: We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate CortexBench, consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation. Next, we systematically evaluate existing PVRs and find that none are universally dominant. To study the effect of… ▽ More We present the largest and most comprehensive empirical study of pre-trained visual representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate CortexBench, consisting of 17 different tasks spanning locomotion, navigation, dexterous, and mobile manipulation. Next, we systematically evaluate existing PVRs and find that none are universally dominant. To study the effect of pre-training data size and diversity, we combine over 4,000 hours of egocentric videos from 7 different sources (over 4.3M images) and ImageNet to train different-sized vision transformers using Masked Auto-Encoding (MAE) on slices of this data. Contrary to inferences from prior work, we find that scaling dataset size and diversity does not improve performance universally (but does so on average). Our largest model, named VC-1, outperforms all prior PVRs on average but does not universally dominate either. Next, we show that task- or domain-specific adaptation of VC-1 leads to substantial gains, with VC-1 (adapted) achieving competitive or superior performance than the best known results on all of the benchmarks in CortexBench. Finally, we present real-world hardware experiments, in which VC-1 and VC-1 (adapted) outperform the strongest pre-existing PVR. Overall, this paper presents no new techniques but a rigorous systematic evaluation, a broad set of findings about PVRs (that in some cases, refute those made in narrow domains in prior work), and open-sourced code and models (that required over 10,000 GPU-hours to train) for the benefit of the research community. △ Less

Submitted 1 February, 2024; v1 submitted 31 March, 2023; originally announced March 2023.

Comments: Project website: https://eai-vc.github.io

arXiv:2303.11960 [pdf, other]

Preparing Unprepared Students For Future Learning

Authors: Mark Abdelshiheed, Mehak Maniktala, Song Ju, Ayush Jain, Tiffany Barnes, Min Chi

Abstract: Based on strategy-awareness (knowing which problem-solving strategy to use) and time-awareness (knowing when to use it), students are categorized into Rote (neither type of awareness), Dabbler (strategy-aware only) or Selective (both types of awareness). It was shown that Selective is often significantly more prepared for future learning than Rote and Dabbler (Abdelshiheed et al., 2020). In this w… ▽ More Based on strategy-awareness (knowing which problem-solving strategy to use) and time-awareness (knowing when to use it), students are categorized into Rote (neither type of awareness), Dabbler (strategy-aware only) or Selective (both types of awareness). It was shown that Selective is often significantly more prepared for future learning than Rote and Dabbler (Abdelshiheed et al., 2020). In this work, we explore the impact of explicit strategy instruction on Rote and Dabbler students across two domains: logic and probability. During the logic instruction, our logic tutor handles both Forward-Chaining (FC) and Backward-Chaining (BC) strategies, with FC being the default; the Experimental condition is taught how to use BC via worked examples and when to use it via prompts. Six weeks later, all students are trained on a probability tutor that supports BC only. Our results show that Experimental significantly outperforms Control in both domains, and Experimental Rote catches up with Selective. △ Less

Submitted 18 March, 2023; originally announced March 2023.

arXiv:2303.08595 [pdf, other]

Automatic Attention Pruning: Improving and Automating Model Pruning using Attentions

Authors: Kaiqi Zhao, Animesh Jain, Ming Zhao

Abstract: Pruning is a promising approach to compress deep learning models in order to deploy them on resource-constrained edge devices. However, many existing pruning solutions are based on unstructured pruning, which yields models that cannot efficiently run on commodity hardware; and they often require users to manually explore and tune the pruning process, which is time-consuming and often leads to sub-… ▽ More Pruning is a promising approach to compress deep learning models in order to deploy them on resource-constrained edge devices. However, many existing pruning solutions are based on unstructured pruning, which yields models that cannot efficiently run on commodity hardware; and they often require users to manually explore and tune the pruning process, which is time-consuming and often leads to sub-optimal results. To address these limitations, this paper presents Automatic Attention Pruning (AAP), an adaptive, attention-based, structured pruning approach to automatically generate small, accurate, and hardware-efficient models that meet user objectives. First, it proposes iterative structured pruning using activation-based attention maps to effectively identify and prune unimportant filters. Then, it proposes adaptive pruning policies for automatically meeting the pruning objectives of accuracy-critical, memory-constrained, and latency-sensitive tasks. A comprehensive evaluation shows that AAP substantially outperforms the state-of-the-art structured pruning works for a variety of model architectures. Our code is at: https://github.com/kaiqi123/Automatic-Attention-Pruning.git. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2201.10520

arXiv:2303.06846 [pdf, other]

doi 10.1103/PhysRevResearch.5.033049

Improved quantum error correction with randomized compiling

Authors: Aditya Jain, Pavithran Iyer, Stephen D. Bartlett, Joseph Emerson

Abstract: Current hardware for quantum computing suffers from high levels of noise, and so to achieve practical fault-tolerant quantum computing will require powerful and efficient methods to correct for errors in quantum circuits. Here, we explore the role and effectiveness of using noise tailoring techniques to improve the performance of error correcting codes. Noise tailoring methods such as randomized c… ▽ More Current hardware for quantum computing suffers from high levels of noise, and so to achieve practical fault-tolerant quantum computing will require powerful and efficient methods to correct for errors in quantum circuits. Here, we explore the role and effectiveness of using noise tailoring techniques to improve the performance of error correcting codes. Noise tailoring methods such as randomized compiling (RC) convert complex coherent noise processes to effective stochastic noise. While it is known that this can be leveraged to design efficient diagnostic tools, we explore its impact on the performance of error correcting codes. Of particular interest is the important class of coherent errors, arising from control errors, where RC has the maximum effect -- converting these into purely stochastic errors. For these errors, we show here that RC delivers an improvement in performance of the concatenated Steane code by several orders of magnitude. We also show that below a threshold rotation angle, the gains in logical fidelity can be arbitrarily magnified by increasing the size of the codes. These results suggest that using randomized compiling can lead to a significant reduction in the resource overhead required to achieve fault tolerance. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 7 pages + 8 page appendix, 8 figures

Journal ref: Phys. Rev. Research 5, 033049 (2023)

arXiv:2303.06274 [pdf]

CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting

Authors: Simon Graham, Quoc Dang Vu, Mostafa Jahanifar, Martin Weigert, Uwe Schmidt, Wenhua Zhang, Jun Zhang, Sen Yang, **xi Xiang, Xiyue Wang, Josef Lorenz Rumberger, Elias Baumann, Peter Hirsch, Lihao Liu, Chenyang Hong, Angelica I. Aviles-Rivero, Ayushi Jain, Heeyoung Ahn, Yiyu Hong, Hussam Azzuni, Min Xu, Mohammad Yaqub, Marie-Claire Blache, Benoît Piégu, Bertrand Vernay , et al. (64 additional authors not shown)

Abstract: Nuclear detection, segmentation and morphometric profiling are essential in hel** us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of repro… ▽ More Nuclear detection, segmentation and morphometric profiling are essential in hel** us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of reproducible algorithms for cellular recognition with real-time result inspection on public leaderboards. We conducted an extensive post-challenge analysis based on the top-performing models using 1,658 whole-slide images of colon tissue. With around 700 million detected nuclei per model, associated features were used for dysplasia grading and survival analysis, where we demonstrated that the challenge's improvement over the previous state-of-the-art led to significant boosts in downstream performance. Our findings also suggest that eosinophils and neutrophils play an important role in the tumour microevironment. We release challenge models and WSI-level results to foster the development of further methods for biomarker discovery. △ Less

Submitted 14 March, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

arXiv:2303.05662 [pdf]

Single-Step Synthesis of Shape-Controlled Polymeric Particles using Initiated Chemical Vapor Deposition in Liquid Crystals

Authors: Apoorva Jain, Soumyamouli Pal, Nicholas L. Abbott, Rong Yang

Abstract: The ability to synthesize shape-controlled polymer particles will benefit a wide range of applications including targeted drug delivery and metamaterials with reconfigurable structures, but existing synthesis approaches are commonly multistep and limited to a narrow size/shape range. Using a novel single-step synthesis technique, a variety of shapes including nanospheres, hemispherical micro-domes… ▽ More The ability to synthesize shape-controlled polymer particles will benefit a wide range of applications including targeted drug delivery and metamaterials with reconfigurable structures, but existing synthesis approaches are commonly multistep and limited to a narrow size/shape range. Using a novel single-step synthesis technique, a variety of shapes including nanospheres, hemispherical micro-domes, orientation-controlled microgels, microspheres, spheroids, and micro-discs were obtained. The shape-controlled particles were synthesized by polymerizing divinylbenzene (DVB) via initiated chemical vapor deposition (iCVD) in nematic liquid crystals (LC). iCVD continuously and precisely delivered vapor-phase reactants, thus avoiding disruption of the LC structure, a critical limitation in past LC-templated polymerization. That shape controllability was further enabled by leveraging LC as a real-time display of the polymerization conditions and progression, using a custom in-situ long-focal range microscope. Detailed image analysis unraveled key mechanisms in polymer synthesis in LC. Poor solubilization by nematic LC led to the formation of pDVB nanospheres, distinct from microspheres obtained in isotropic solvents. The nanospheres precipitated to the LC-solid interface and further aggregated into microgel clusters with controlled orientation that was guided by the LC molecular alignment. On further polymerization, microgel clusters phase separated to form microspheres, spheroids, and unique disc-shaped particles. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: 35 pages, 5 figures

arXiv:2303.01598 [pdf, other]

A Meta-Learning Approach to Predicting Performance and Data Requirements

Authors: Achin Jain, Gurumurthy Swaminathan, Paolo Favaro, Hao Yang, Avinash Ravichandran, Hrayr Harutyunyan, Alessandro Achille, Onkar Dabeer, Bernt Schiele, Ashwin Swaminathan, Stefano Soatto

Abstract: We propose an approach to estimate the number of samples required for a model to reach a target performance. We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset (e.g., 5 samples per class) for extrapolation. This is because the log-performance error against the log-dataset size follows a nonlinear progression in the few-… ▽ More We propose an approach to estimate the number of samples required for a model to reach a target performance. We find that the power law, the de facto principle to estimate model performance, leads to large error when using a small dataset (e.g., 5 samples per class) for extrapolation. This is because the log-performance error against the log-dataset size follows a nonlinear progression in the few-shot regime followed by a linear progression in the high-shot regime. We introduce a novel piecewise power law (PPL) that handles the two data regimes differently. To estimate the parameters of the PPL, we introduce a random forest regressor trained via meta learning that generalizes across classification/detection tasks, ResNet/ViT based architectures, and random/pre-trained initializations. The PPL improves the performance estimation on average by 37% across 16 classification and 33% across 10 detection datasets, compared to the power law. We further extend the PPL to provide a confidence bound and use it to limit the prediction horizon that reduces over-estimation of data by 76% on classification and 91% on detection datasets. △ Less

Submitted 2 March, 2023; originally announced March 2023.

Comments: CVPR 2023

Showing 101–150 of 717 results for author: Jain, A