Search | arXiv e-print repository

Fine-grained large-scale content recommendations for MSX sellers

Authors: Manpreet Singh, Ravdeep Pasricha, Ravi Prasad Kondapalli, Kiran R, Nitish Singh, Akshita Agarwalla, Manoj R, Manish Prabhakar, Laurent Boué

Abstract: One of the most critical tasks of Microsoft sellers is to meticulously track and nurture potential business opportunities through proactive engagement and tailored solutions. Recommender systems play a central role to help sellers achieve their goals. In this paper, we present a content recommendation model which surfaces various types of content (technical documentation, comparison with competito… ▽ More One of the most critical tasks of Microsoft sellers is to meticulously track and nurture potential business opportunities through proactive engagement and tailored solutions. Recommender systems play a central role to help sellers achieve their goals. In this paper, we present a content recommendation model which surfaces various types of content (technical documentation, comparison with competitor products, customer success stories etc.) that sellers can share with their customers or use for their own self-learning. The model operates at the opportunity level which is the lowest possible granularity and the most relevant one for sellers. It is based on semantic matching between metadata from the contents and carefully selected attributes of the opportunities. Considering the volume of seller-managed opportunities in organizations such as Microsoft, we show how to perform efficient semantic matching over a very large number of opportunity-content combinations. The main challenge is to ensure that the top-5 relevant contents for each opportunity are recommended out of a total of $\approx 40,000$ published contents. We achieve this target through an extensive comparison of different model architectures and feature selection. Finally, we further examine the quality of the recommendations in a quantitative manner using a combination of human domain experts as well as by using the recently proposed "LLM as a judge" framework. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Journal ref: Microsoft Journal of Applied Research, Volume 21, 2024

arXiv:2407.04440 [pdf, ps, other]

Wavelet-based Temporal Attention Improves Traffic Forecasting

Authors: Yash Jakhmola, Nitish Kumar Mishra, Kripabandhu Ghosh, Tanujit Chakraborty

Abstract: Spatio-temporal forecasting of traffic flow data represents a typical problem in the field of machine learning, impacting urban traffic management systems. Traditional statistical and machine learning methods cannot adequately handle both the temporal and spatial dependencies in these complex traffic flow datasets. A prevalent approach in the field is to combine graph convolutional networks and mu… ▽ More Spatio-temporal forecasting of traffic flow data represents a typical problem in the field of machine learning, impacting urban traffic management systems. Traditional statistical and machine learning methods cannot adequately handle both the temporal and spatial dependencies in these complex traffic flow datasets. A prevalent approach in the field is to combine graph convolutional networks and multi-head attention mechanisms for spatio-temporal processing. This paper proposes a wavelet-based temporal attention model, namely a wavelet-based dynamic spatio-temporal aware graph neural network (W-DSTAGNN), for tackling the traffic forecasting problem. Benchmark experiments using several statistical metrics confirm that our proposal efficiently captures spatio-temporal correlations and outperforms ten state-of-the-art models on three different real-world traffic datasets. Our proposed ensemble data-driven method can handle dynamic temporal and spatial dependencies and make long-term forecasts in an efficient manner. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2406.12158 [pdf, other]

LLMs Are Prone to Fallacies in Causal Inference

Authors: Nitish Joshi, Abulhair Saparov, Yixin Wang, He He

Abstract: Recent work shows that causal facts can be effectively extracted from LLMs through prompting, facilitating the creation of causal graphs for causal inference tasks. However, it is unclear if this success is limited to explicitly-mentioned causal facts in the pretraining data which the model can memorize. Thus, this work investigates: Can LLMs infer causal relations from other relational data in te… ▽ More Recent work shows that causal facts can be effectively extracted from LLMs through prompting, facilitating the creation of causal graphs for causal inference tasks. However, it is unclear if this success is limited to explicitly-mentioned causal facts in the pretraining data which the model can memorize. Thus, this work investigates: Can LLMs infer causal relations from other relational data in text? To disentangle the role of memorized causal facts vs inferred causal relations, we finetune LLMs on synthetic data containing temporal, spatial and counterfactual relations, and measure whether the LLM can then infer causal relations. We find that: (a) LLMs are susceptible to inferring causal relations from the order of two entity mentions in text (e.g. X mentioned before Y implies X causes Y); (b) if the order is randomized, LLMs still suffer from the post hoc fallacy, i.e. X occurs before Y (temporal relation) implies X causes Y. We also find that while LLMs can correctly deduce the absence of causal relations from temporal and spatial relations, they have difficulty inferring causal relations from counterfactuals, questioning their understanding of causality. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2405.12995 [pdf, other]

High-fidelity level-set modeling of diffusive solid-state phase transformations for polycrystalline materials

Authors: Nitish Chandrappa, Marc Bernacki

Abstract: The formation of microstructures in metallic alloys during hot metal forming involves simultaneous metallurgical complex phenomena. Traditional high-fidelity numerical frameworks used on the polycrystalline scale tend to focus on single-phase microstructures or isolate phase transformations from grain boundary migration mechanisms. The level-set method is highlighted as effective in proposing a gl… ▽ More The formation of microstructures in metallic alloys during hot metal forming involves simultaneous metallurgical complex phenomena. Traditional high-fidelity numerical frameworks used on the polycrystalline scale tend to focus on single-phase microstructures or isolate phase transformations from grain boundary migration mechanisms. The level-set method is highlighted as effective in proposing a global framework for modeling multiphase polycrystalline materials and diffusive solid-state phase transformations. This framework includes novel techniques for efficient large-scale microstructural representation, strong coupling with ThermoCalc software for real-time thermodynamic data, application for ternary alloys and beyond by taking solute drag aspects, and the use of advanced nucleation models. Numerous applications are then illustrated. △ Less

Submitted 1 May, 2024; originally announced May 2024.

arXiv:2405.09464 [pdf, other]

Scalable Scheduling Policies for Quantum Satellite Networks

Authors: Albert Williams, Nitish K. Panigrahy, Andrew McGregor, Don Towsley

Abstract: As Low Earth Orbit (LEO) satellite mega constellations continue to be deployed for satellite internet and recent successful experiments in satellite-based quantum entanglement distribution emerge, a natural question arises: How should we coordinate transmissions and design scalable scheduling policies for a quantum satellite internet? In this work, we consider the problem of transmission schedulin… ▽ More As Low Earth Orbit (LEO) satellite mega constellations continue to be deployed for satellite internet and recent successful experiments in satellite-based quantum entanglement distribution emerge, a natural question arises: How should we coordinate transmissions and design scalable scheduling policies for a quantum satellite internet? In this work, we consider the problem of transmission scheduling in quantum satellite networks subject to resource constraints at the satellites and ground stations. We show that the most general problem of assigning satellites to ground station pairs for entanglement distribution is NP-hard. We then propose four heuristic algorithms and evaluate their performance for Starlink mega constellation under various amount of resources and placements of the ground stations. We find that the maximum number of receivers necessary per ground station grows very slowly with the total number of deployed ground stations. Our proposed algorithms, leveraging optimal weighted b-matching and the global greedy heuristic, outperform others in entanglement distribution rate, entanglement fidelity, and handover cost metrics. While we develop these scheduling algorithms, we have also designed a software system to simulate, visualize, and evaluate satellite mega-constellations for entanglement distribution. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2404.16816 [pdf, other]

IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages

Authors: Harman Singh, Nitish Gupta, Shikhar Bharadwaj, Dinesh Tewari, Partha Talukdar

Abstract: As large language models (LLMs) see increasing adoption across the globe, it is imperative for LLMs to be representative of the linguistic diversity of the world. India is a linguistically diverse country of 1.4 Billion people. To facilitate research on multilingual LLM evaluation, we release IndicGenBench - the largest benchmark for evaluating LLMs on user-facing generation tasks across a diverse… ▽ More As large language models (LLMs) see increasing adoption across the globe, it is imperative for LLMs to be representative of the linguistic diversity of the world. India is a linguistically diverse country of 1.4 Billion people. To facilitate research on multilingual LLM evaluation, we release IndicGenBench - the largest benchmark for evaluating LLMs on user-facing generation tasks across a diverse set 29 of Indic languages covering 13 scripts and 4 language families. IndicGenBench is composed of diverse generation tasks like cross-lingual summarization, machine translation, and cross-lingual question answering. IndicGenBench extends existing benchmarks to many Indic languages through human curation providing multi-way parallel evaluation data for many under-represented Indic languages for the first time. We evaluate a wide range of proprietary and open-source LLMs including GPT-3.5, GPT-4, PaLM-2, mT5, Gemma, BLOOM and LLaMA on IndicGenBench in a variety of settings. The largest PaLM-2 models performs the best on most tasks, however, there is a significant performance gap in all languages compared to English showing that further research is needed for the development of more inclusive multilingual language models. IndicGenBench is released at www.github.com/google-research-datasets/indic-gen-bench △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.14248 [pdf, other]

NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi **, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, **g Lin, Alan Yuille, Ben Shao, ** Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlighting, extreme darkness, and night scenes. A notable total of 428 participants registered for the challenge, with 22 teams ultimately making valid submissions. This paper meticulously evaluates the state-of-the-art advancements in enhancing low-light images, reflecting the significant progress and creativity in this field. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: NTIRE 2024 Challenge Report

arXiv:2403.00781 [pdf, other]

ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework

Authors: Zhongqi Yang, Elahe Khatibi, Nitish Nagesh, Mahyar Abbasian, Iman Azimi, Ramesh Jain, Amir M. Rahmani

Abstract: The profound impact of food on health necessitates advanced nutrition-oriented food recommendation services. Conventional methods often lack the crucial elements of personalization, explainability, and interactivity. While Large Language Models (LLMs) bring interpretability and explainability, their standalone use falls short of achieving true personalization. In this paper, we introduce ChatDiet,… ▽ More The profound impact of food on health necessitates advanced nutrition-oriented food recommendation services. Conventional methods often lack the crucial elements of personalization, explainability, and interactivity. While Large Language Models (LLMs) bring interpretability and explainability, their standalone use falls short of achieving true personalization. In this paper, we introduce ChatDiet, a novel LLM-powered framework designed specifically for personalized nutrition-oriented food recommendation chatbots. ChatDiet integrates personal and population models, complemented by an orchestrator, to seamlessly retrieve and process pertinent information. The personal model leverages causal discovery and inference techniques to assess personalized nutritional effects for a specific user, whereas the population model provides generalized information on food nutritional content. The orchestrator retrieves, synergizes and delivers the output of both models to the LLM, providing tailored food recommendations designed to support targeted health outcomes. The result is a dynamic delivery of personalized and explainable food recommendations, tailored to individual user preferences. Our evaluation of ChatDiet includes a compelling case study, where we establish a causal personal model to estimate individual nutrition effects. Our assessments, including a food recommendation test showcasing a 92\% effectiveness rate, coupled with illustrative dialogue examples, underscore ChatDiet's strengths in explainability, personalization, and interactivity. △ Less

Submitted 16 March, 2024; v1 submitted 18 February, 2024; originally announced March 2024.

Comments: Accepted by The IEEE/ACM international conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE) 2024

arXiv:2402.10153 [pdf, other]

Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients

Authors: Mahyar Abbasian, Zhongqi Yang, Elahe Khatibi, Pengfei Zhang, Nitish Nagesh, Iman Azimi, Ramesh Jain, Amir M. Rahmani

Abstract: Effective diabetes management is crucial for maintaining health in diabetic patients. Large Language Models (LLMs) have opened new avenues for diabetes management, facilitating their efficacy. However, current LLM-based approaches are limited by their dependence on general sources and lack of integration with domain-specific knowledge, leading to inaccurate responses. In this paper, we propose a k… ▽ More Effective diabetes management is crucial for maintaining health in diabetic patients. Large Language Models (LLMs) have opened new avenues for diabetes management, facilitating their efficacy. However, current LLM-based approaches are limited by their dependence on general sources and lack of integration with domain-specific knowledge, leading to inaccurate responses. In this paper, we propose a knowledge-infused LLM-powered conversational health agent (CHA) for diabetic patients. We customize and leverage the open-source openCHA framework, enhancing our CHA with external knowledge and analytical capabilities. This integration involves two key components: 1) incorporating the American Diabetes Association dietary guidelines and the Nutritionix information and 2) deploying analytical tools that enable nutritional intake calculation and comparison with the guidelines. We compare the proposed CHA with GPT4. Our evaluation includes 100 diabetes-related questions on daily meal choices and assessing the potential risks associated with the suggested diet. Our findings show that the proposed agent demonstrates superior performance in generating responses to manage essential nutrients. △ Less

Submitted 28 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: 4 pages, 3 figures, and 2 tables, conference paper

arXiv:2402.07411 [pdf, other]

Potential-Based Reward Sha** For Intrinsic Motivation

Authors: Grant C. Forbes, Nitish Gupta, Leonardo Villalobos-Arias, Colin M. Potts, Arnav Jhala, David L. Roberts

Abstract: Recently there has been a proliferation of intrinsic motivation (IM) reward-sha** methods to learn in complex and sparse-reward environments. These methods can often inadvertently change the set of optimal policies in an environment, leading to suboptimal behavior. Previous work on mitigating the risks of reward sha**, particularly through potential-based reward sha** (PBRS), has not been ap… ▽ More Recently there has been a proliferation of intrinsic motivation (IM) reward-sha** methods to learn in complex and sparse-reward environments. These methods can often inadvertently change the set of optimal policies in an environment, leading to suboptimal behavior. Previous work on mitigating the risks of reward sha**, particularly through potential-based reward sha** (PBRS), has not been applicable to many IM methods, as they are often complex, trainable functions themselves, and therefore dependent on a wider set of variables than the traditional reward functions that PBRS was developed for. We present an extension to PBRS that we prove preserves the set of optimal policies under a more general set of functions than has been previously proven. We also present {\em Potential-Based Intrinsic Motivation} (PBIM), a method for converting IM rewards into a potential-based form that is useable without altering the set of optimal policies. Testing in the MiniGrid DoorKey and Cliff Walking environments, we demonstrate that PBIM successfully prevents the agent from converging to a suboptimal policy and can speed up training. △ Less

Submitted 12 February, 2024; originally announced February 2024.

Comments: Extended version of paper appearing in AAMAS 2024

ACM Class: I.2.6

arXiv:2401.17217 [pdf, other]

GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual AI for Smart Eyewear

Authors: Robert Konrad, Nitish Padmanaban, J. Gabriel Buckmaster, Kevin C. Boyle, Gordon Wetzstein

Abstract: Multimodal large language models (LMMs) excel in world knowledge and problem-solving abilities. Through the use of a world-facing camera and contextual AI, emerging smart accessories aim to provide a seamless interface between humans and LMMs. Yet, these wearable computing systems lack an understanding of the user's attention. We introduce GazeGPT as a new user interaction paradigm for contextual… ▽ More Multimodal large language models (LMMs) excel in world knowledge and problem-solving abilities. Through the use of a world-facing camera and contextual AI, emerging smart accessories aim to provide a seamless interface between humans and LMMs. Yet, these wearable computing systems lack an understanding of the user's attention. We introduce GazeGPT as a new user interaction paradigm for contextual AI. GazeGPT uses eye tracking to help the LMM understand which object in the world-facing camera view a user is paying attention to. Using extensive user evaluations, we show that this gaze-contingent mechanism is a faster and more accurate pointing mechanism than alternatives; that it augments human capabilities by significantly improving their accuracy in a dog-breed classification task; and that it is consistently ranked as more natural than head- or body-driven selection mechanisms for contextual AI. Moreover, we prototype a variety of application scenarios that suggest GazeGPT could be of significant value to users as part of future AI-driven personal assistants. △ Less

Submitted 31 January, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: Project video: https://youtu.be/AuDFHHTK_m8

arXiv:2401.10823 [pdf, ps, other]

Reconfigurable Intelligent Surface (RIS)-Assisted Entanglement Distribution in FSO Quantum Networks

Authors: Mahdi Chehimi, Mohamed Elhattab, Walid Saad, Gayane Vardoyan, Nitish K. Panigrahy, Chadi Assi, Don Towsley

Abstract: Quantum networks (QNs) relying on free-space optical (FSO) quantum channels can support quantum applications in environments wherein establishing an optical fiber infrastructure is challenging and costly. However, FSO-based QNs require a clear line-of-sight (LoS) between users, which is challenging due to blockages and natural obstacles. In this paper, a reconfigurable intelligent surface (RIS)-as… ▽ More Quantum networks (QNs) relying on free-space optical (FSO) quantum channels can support quantum applications in environments wherein establishing an optical fiber infrastructure is challenging and costly. However, FSO-based QNs require a clear line-of-sight (LoS) between users, which is challenging due to blockages and natural obstacles. In this paper, a reconfigurable intelligent surface (RIS)-assisted FSO-based QN is proposed as a cost-efficient framework providing a virtual LoS between users for entanglement distribution. A novel modeling of the quantum noise and losses experienced by quantum states over FSO channels defined by atmospheric losses, turbulence, and pointing errors is derived. Then, the joint optimization of entanglement distribution and RIS placement problem is formulated, under heterogeneous entanglement rate and fidelity constraints. This problem is solved using a simulated annealing metaheuristic algorithm. Simulation results show that the proposed framework effectively meets the minimum fidelity requirements of all users' quantum applications. This is in stark contrast to baseline algorithms that lead to a drop of at least 83% in users' end-to-end fidelities. The proposed framework also achieves a 64% enhancement in the fairness level between users compared to baseline rate maximizing frameworks. Finally, the weather conditions, e.g., rain, are observed to have a more significant effect than pointing errors and turbulence. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: 13 pages, 7 figures, 1 table

arXiv:2401.04732 [pdf, other]

A case study of Generative AI in MSX Sales Copilot: Improving seller productivity with a real-time question-answering system for content recommendation

Authors: Manpreet Singh, Ravdeep Pasricha, Nitish Singh, Ravi Prasad Kondapalli, Manoj R, Kiran R, Laurent Boué

Abstract: In this paper, we design a real-time question-answering system specifically targeted for hel** sellers get relevant material/documentation they can share live with their customers or refer to during a call. Taking the Seismic content repository as a relatively large scale example of a diverse dataset of sales material, we demonstrate how LLM embeddings of sellers' queries can be matched with the… ▽ More In this paper, we design a real-time question-answering system specifically targeted for hel** sellers get relevant material/documentation they can share live with their customers or refer to during a call. Taking the Seismic content repository as a relatively large scale example of a diverse dataset of sales material, we demonstrate how LLM embeddings of sellers' queries can be matched with the relevant content. We achieve this by engineering prompts in an elaborate fashion that makes use of the rich set of meta-features available for documents and sellers. Using a bi-encoder with cross-encoder re-ranker architecture, we show how the solution returns the most relevant content recommendations in just a few seconds even for large datasets. Our recommender system is deployed as an AML endpoint for real-time inferencing and has been integrated into a Copilot interface that is now deployed in the production version of the Dynamics CRM, known as MSX, used daily by Microsoft sellers. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Journal ref: Microsoft Journal of Applied Research, Volume 20, 2024

arXiv:2401.02412 [pdf, other]

LLM Augmented LLMs: Expanding Capabilities through Composition

Authors: Rachit Bansal, Bidisha Samanta, Siddharth Dalmia, Nitish Gupta, Shikhar Vashishth, Sriram Ganapathy, Abhishek Bapna, Prateek Jain, Partha Talukdar

Abstract: Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domai… ▽ More Foundational models with billions of parameters which have been trained on large corpora of data have demonstrated non-trivial skills in a variety of domains. However, due to their monolithic structure, it is challenging and expensive to augment them or impart new skills. On the other hand, due to their adaptation abilities, several new instances of these models are being trained towards new domains and tasks. In this work, we study the problem of efficient and practical composition of existing foundation models with more specific models to enable newer capabilities. To this end, we propose CALM -- Composition to Augment Language Models -- which introduces cross-attention between models to compose their representations and enable new capabilities. Salient features of CALM are: (i) Scales up LLMs on new tasks by 're-using' existing LLMs along with a few additional parameters and data, (ii) Existing model weights are kept intact, and hence preserves existing capabilities, and (iii) Applies to diverse domains and settings. We illustrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13\% on tasks like translation into English and arithmetic reasoning for low-resource languages. Similarly, when PaLM2-S is augmented with a code-specific model, we see a relative improvement of 40\% over the base model for code generation and explanation tasks -- on-par with fully fine-tuned counterparts. △ Less

Submitted 4 January, 2024; originally announced January 2024.

Comments: 17 pages, 2 figures, 8 tables

arXiv:2311.02630 [pdf]

doi 10.1109/ICT60153.2023.10374044

The New Frontier of Cybersecurity: Emerging Threats and Innovations

Authors: Daksh Dave, Gauransh Sawhney, Pushkar Aggarwal, Nitish Silswal, Dhruv Khut

Abstract: In today's digitally interconnected world, cybersecurity threats have reached unprecedented levels, presenting a pressing concern for individuals, organizations, and governments. This study employs a qualitative research approach to comprehensively examine the diverse threats of cybersecurity and their impacts across various sectors. Four primary categories of threats are identified and analyzed,… ▽ More In today's digitally interconnected world, cybersecurity threats have reached unprecedented levels, presenting a pressing concern for individuals, organizations, and governments. This study employs a qualitative research approach to comprehensively examine the diverse threats of cybersecurity and their impacts across various sectors. Four primary categories of threats are identified and analyzed, encompassing malware attacks, social engineering attacks, network vulnerabilities, and data breaches. The research delves into the consequences of these threats on individuals, organizations, and society at large. The findings reveal a range of key emerging threats in cybersecurity, including advanced persistent threats, ransomware attacks, Internet of Things (IoT) vulnerabilities, and social engineering exploits. Consequently, it is evident that emerging cybersecurity threats pose substantial risks to both organizations and individuals. The sophistication and diversity of these emerging threats necessitate a multi-layered approach to cybersecurity. This approach should include robust security measures, comprehensive employee training, and regular security audits. The implications of these emerging threats are extensive, with potential consequences such as financial loss, reputational damage, and compromised personal information. This study emphasizes the importance of implementing effective measures to mitigate these threats. It highlights the significance of using strong passwords, encryption methods, and regularly updating software to bolster cyber defenses. △ Less

Submitted 5 November, 2023; originally announced November 2023.

Comments: 6 pages, 2 Tables

Journal ref: 2023 29th International Conference on Telecommunications (ICT), pp. 1-6, 2023

arXiv:2310.18168 [pdf, other]

Personas as a Way to Model Truthfulness in Language Models

Authors: Nitish Joshi, Javier Rando, Abulhair Saparov, Najoung Kim, He He

Abstract: Large language models (LLMs) are trained on vast amounts of text from the internet, which contains both factual and misleading information about the world. While unintuitive from a classic view of LMs, recent work has shown that the truth value of a statement can be elicited from the model's representations. This paper presents an explanation for why LMs appear to know the truth despite not being… ▽ More Large language models (LLMs) are trained on vast amounts of text from the internet, which contains both factual and misleading information about the world. While unintuitive from a classic view of LMs, recent work has shown that the truth value of a statement can be elicited from the model's representations. This paper presents an explanation for why LMs appear to know the truth despite not being trained with truth labels. We hypothesize that the pretraining data is generated by groups of (un)truthful agents whose outputs share common features, and they form a (un)truthful persona. By training on this data, LMs can infer and represent the persona in its activation space. This allows the model to separate truth from falsehoods and controls the truthfulness of its generation. We show evidence for the persona hypothesis via two observations: (1) we can probe whether a model's answer will be truthful before it is generated; (2) finetuning a model on a set of facts improves its truthfulness on unseen topics. Next, using arithmetics as a synthetic environment, we show that structures of the pretraining data are crucial for the model to infer the truthful persona. Overall, our findings suggest that models can exploit hierarchical structures in the data to learn abstract concepts like truthfulness. △ Less

Submitted 6 February, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.11266 [pdf]

Emulating Human Cognitive Processes for Expert-Level Medical Question-Answering with Large Language Models

Authors: Khushboo Verma, Marina Moore, Stephanie Wottrich, Karla Robles López, Nishant Aggarwal, Zeel Bhatt, Aagamjit Singh, Bradford Unroe, Salah Basheer, Nitish Sachdeva, Prinka Arora, Harmanjeet Kaur, Tanupreet Kaur, Tevon Hood, Anahi Marquez, Tushar Varshney, Nanfu Deng, Azaan Ramani, Pawanraj Ishwara, Maimoona Saeed, Tatiana López Velarde Peña, Bryan Barksdale, Sushovan Guha, Satwant Kumar

Abstract: In response to the pressing need for advanced clinical problem-solving tools in healthcare, we introduce BooksMed, a novel framework based on a Large Language Model (LLM). BooksMed uniquely emulates human cognitive processes to deliver evidence-based and reliable responses, utilizing the GRADE (Grading of Recommendations, Assessment, Development, and Evaluations) framework to effectively quantify… ▽ More In response to the pressing need for advanced clinical problem-solving tools in healthcare, we introduce BooksMed, a novel framework based on a Large Language Model (LLM). BooksMed uniquely emulates human cognitive processes to deliver evidence-based and reliable responses, utilizing the GRADE (Grading of Recommendations, Assessment, Development, and Evaluations) framework to effectively quantify evidence strength. For clinical decision-making to be appropriately assessed, an evaluation metric that is clinically aligned and validated is required. As a solution, we present ExpertMedQA, a multispecialty clinical benchmark comprised of open-ended, expert-level clinical questions, and validated by a diverse group of medical professionals. By demanding an in-depth understanding and critical appraisal of up-to-date clinical literature, ExpertMedQA rigorously evaluates LLM performance. BooksMed outperforms existing state-of-the-art models Med-PaLM 2, Almanac, and ChatGPT in a variety of medical scenarios. Therefore, a framework that mimics human cognitive stages could be a useful tool for providing reliable and evidence-based responses to clinical inquiries. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.10606 [pdf, other]

BayRnTune: Adaptive Bayesian Domain Randomization via Strategic Fine-tuning

Authors: Tianle Huang, Nitish Sontakke, K. Niranjan Kumar, Irfan Essa, Stefanos Nikolaidis, Dennis W. Hong, Sehoon Ha

Abstract: Domain randomization (DR), which entails training a policy with randomized dynamics, has proven to be a simple yet effective algorithm for reducing the gap between simulation and the real world. However, DR often requires careful tuning of randomization parameters. Methods like Bayesian Domain Randomization (Bayesian DR) and Active Domain Randomization (Adaptive DR) address this issue by automatin… ▽ More Domain randomization (DR), which entails training a policy with randomized dynamics, has proven to be a simple yet effective algorithm for reducing the gap between simulation and the real world. However, DR often requires careful tuning of randomization parameters. Methods like Bayesian Domain Randomization (Bayesian DR) and Active Domain Randomization (Adaptive DR) address this issue by automating parameter range selection using real-world experience. While effective, these algorithms often require long computation time, as a new policy is trained from scratch every iteration. In this work, we propose Adaptive Bayesian Domain Randomization via Strategic Fine-tuning (BayRnTune), which inherits the spirit of BayRn but aims to significantly accelerate the learning processes by fine-tuning from previously learned policy. This idea leads to a critical question: which previous policy should we use as a prior during fine-tuning? We investigated four different fine-tuning strategies and compared them against baseline algorithms in five simulated environments, ranging from simple benchmark tasks to more complex legged robot environments. Our analysis demonstrates that our method yields better rewards in the same amount of timesteps compared to vanilla domain randomization or Bayesian DR. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.09723 [pdf, other]

A generalization of the achievable rate of a MISO system using Bode-Fano wideband matching theory

Authors: Nitish Deshpande, Miguel R. Castellanos, Saeed R. Khosravirad, **feng Du, Harish Viswanathan, Robert W. Heath Jr

Abstract: Impedance-matching networks affect power transfer from the radio frequency (RF) chains to the antennas. Their design impacts the signal to noise ratio (SNR) and the achievable rate. In this paper, we maximize the information-theoretic achievable rate of a multiple-input-single-output (MISO) system with wideband matching constraints. Using a multiport circuit theory approach with frequency-selectiv… ▽ More Impedance-matching networks affect power transfer from the radio frequency (RF) chains to the antennas. Their design impacts the signal to noise ratio (SNR) and the achievable rate. In this paper, we maximize the information-theoretic achievable rate of a multiple-input-single-output (MISO) system with wideband matching constraints. Using a multiport circuit theory approach with frequency-selective scattering parameters, we propose a general framework for optimizing the MISO achievable rate that incorporates Bode-Fano wideband matching theory. We express the solution to the achievable rate optimization problem in terms of the optimized transmission coefficient and the Lagrangian parameters corresponding to the Bode-Fano inequality constraints. We apply this framework to a single electric Chu's antenna and an array of two electric Chu's antennas. We compare the optimized achievable rate obtained numerically with other benchmarks like the ideal achievable rate computed by disregarding matching constraints and the achievable rate obtained by using sub-optimal matching strategies like conjugate matching and frequency-flat transmission. We also propose a practical methodology to approximate the achievable rate bound by using the optimal transmission coefficient to derive a physically realizable matching network through the ADS software. △ Less

Submitted 14 October, 2023; originally announced October 2023.

arXiv:2308.11673 [pdf, other]

WEARS: Wearable Emotion AI with Real-time Sensor data

Authors: Dhruv Limbani, Daketi Yatin, Nitish Chaturvedi, Vaishnavi Moorthy, Pushpalatha M, Harichandana BSS, Sumit Kumar

Abstract: Emotion prediction is the field of study to understand human emotions. Existing methods focus on modalities like text, audio, facial expressions, etc., which could be private to the user. Emotion can be derived from the subject's psychological data as well. Various approaches that employ combinations of physiological sensors for emotion recognition have been proposed. Yet, not all sensors are simp… ▽ More Emotion prediction is the field of study to understand human emotions. Existing methods focus on modalities like text, audio, facial expressions, etc., which could be private to the user. Emotion can be derived from the subject's psychological data as well. Various approaches that employ combinations of physiological sensors for emotion recognition have been proposed. Yet, not all sensors are simple to use and handy for individuals in their daily lives. Thus, we propose a system to predict user emotion using smartwatch sensors. We design a framework to collect ground truth in real-time utilizing a mix of English and regional language-based videos to invoke emotions in participants and collect the data. Further, we modeled the problem as binary classification due to the limited dataset size and experimented with multiple machine-learning models. We also did an ablation study to understand the impact of features including Heart Rate, Accelerometer, and Gyroscope sensor data on mood. From the experimental results, Multi-Layer Perceptron has shown a maximum accuracy of 93.75 percent for pleasant-unpleasant (high/low valence classification) moods. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.11442 [pdf, other]

SDeMorph: Towards Better Facial De-morphing from Single Morph

Authors: Nitish Shukla

Abstract: Face Recognition Systems (FRS) are vulnerable to morph attacks. A face morph is created by combining multiple identities with the intention to fool FRS and making it match the morph with multiple identities. Current Morph Attack Detection (MAD) can detect the morph but are unable to recover the identities used to create the morph with satisfactory outcomes. Existing work in de-morphing is mostly r… ▽ More Face Recognition Systems (FRS) are vulnerable to morph attacks. A face morph is created by combining multiple identities with the intention to fool FRS and making it match the morph with multiple identities. Current Morph Attack Detection (MAD) can detect the morph but are unable to recover the identities used to create the morph with satisfactory outcomes. Existing work in de-morphing is mostly reference-based, i.e. they require the availability of one identity to recover the other. Sudipta et al. \cite{ref9} proposed a reference-free de-morphing technique but the visual realism of outputs produced were feeble. In this work, we propose SDeMorph (Stably Diffused De-morpher), a novel de-morphing method that is reference-free and recovers the identities of bona fides. Our method produces feature-rich outputs that are of significantly high quality in terms of definition and facial fidelity. Our method utilizes Denoising Diffusion Probabilistic Models (DDPM) by destroying the input morphed signal and then reconstructing it back using a branched-UNet. Experiments on ASML, FRLL-FaceMorph, FRLL-MorDIFF, and SMDD datasets support the effectiveness of the proposed method. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2307.06492 [pdf, other]

Universal Quantum Walk Control Plane for Quantum Networks

Authors: Matheus Guedes de Andrade, Nitish K. Panigrahy, Wenhan Dai, Saikat Guha, Don Towsley

Abstract: Quantum networks are complex systems formed by the interaction among quantum processors through quantum channels. Analogous to classical computer networks, quantum networks allow for the distribution of quantum operations among quantum processors. In this work, we describe a Quantum Walk Control Protocol (QWCP) to perform distributed quantum operations in a quantum network. We consider a generaliz… ▽ More Quantum networks are complex systems formed by the interaction among quantum processors through quantum channels. Analogous to classical computer networks, quantum networks allow for the distribution of quantum operations among quantum processors. In this work, we describe a Quantum Walk Control Protocol (QWCP) to perform distributed quantum operations in a quantum network. We consider a generalization of the discrete-time coined quantum walk model that accounts for the interaction between quantum walks in the network graph with quantum registers inside the network nodes. QWCP allows for the implementation of networked quantum services, such as distributed quantum computing and entanglement distribution, abstracting hardware implementation and the transmission of quantum information through channels. Multiple interacting quantum walks can be used to propagate entangled control signals across the network in parallel. We demonstrate how to use QWCP to perform distributed multi-qubit controlled gates, which shows the universality of the protocol for distributed quantum computing. Furthermore, we apply the QWCP to the task of entanglement distribution in a quantum network. △ Less

Submitted 12 July, 2023; originally announced July 2023.

Comments: 27 pages; 2 figures. A preliminary version of this work was presented at IEEE International Conference on Quantum Computing and Engineering 2021 (QCE21). arXiv admin note: text overlap with arXiv:2106.09839

arXiv:2306.14846 [pdf, other]

ViNT: A Foundation Model for Visual Navigation

Authors: Dhruv Shah, Ajay Sridhar, Nitish Dashora, Kyle Stachowicz, Kevin Black, Noriaki Hirose, Sergey Levine

Abstract: General-purpose pre-trained models ("foundation models") have enabled practitioners to produce generalizable solutions for individual machine learning problems with datasets that are significantly smaller than those required for learning from scratch. Such models are typically trained on large and diverse datasets with weak supervision, consuming much more training data than is available for any i… ▽ More General-purpose pre-trained models ("foundation models") have enabled practitioners to produce generalizable solutions for individual machine learning problems with datasets that are significantly smaller than those required for learning from scratch. Such models are typically trained on large and diverse datasets with weak supervision, consuming much more training data than is available for any individual downstream application. In this paper, we describe the Visual Navigation Transformer (ViNT), a foundation model that aims to bring the success of general-purpose pre-trained models to vision-based robotic navigation. ViNT is trained with a general goal-reaching objective that can be used with any navigation dataset, and employs a flexible Transformer-based architecture to learn navigational affordances and enable efficient adaptation to a variety of downstream navigational tasks. ViNT is trained on a number of existing navigation datasets, comprising hundreds of hours of robotic navigation from a variety of different robotic platforms, and exhibits positive transfer, outperforming specialist models trained on singular datasets. ViNT can be augmented with diffusion-based subgoal proposals to explore novel environments, and can solve kilometer-scale navigation problems when equipped with long-range heuristics. ViNT can also be adapted to novel task specifications with a technique inspired by prompt-tuning, where the goal encoder is replaced by an encoding of another task modality (e.g., GPS waypoints or routing commands) embedded into the same space of goal tokens. This flexibility and ability to accommodate a variety of downstream problem domains establishes ViNT as an effective foundation model for mobile robotics. For videos, code, and model checkpoints, see our project page at https://visualnav-transformer.github.io. △ Less

Submitted 24 October, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

Comments: Accepted for oral presentation at CoRL 2023

arXiv:2305.15269 [pdf, other]

Testing the General Deductive Reasoning Capacity of Large Language Models Using OOD Examples

Authors: Abulhair Saparov, Richard Yuanzhe Pang, Vishakh Padmakumar, Nitish Joshi, Seyed Mehran Kazemi, Najoung Kim, He He

Abstract: Given the intractably large size of the space of proofs, any model that is capable of general deductive reasoning must generalize to proofs of greater complexity. Recent studies have shown that large language models (LLMs) possess some abstract deductive reasoning ability given chain-of-thought prompts. However, they have primarily been tested on proofs using modus ponens or of a specific size, an… ▽ More Given the intractably large size of the space of proofs, any model that is capable of general deductive reasoning must generalize to proofs of greater complexity. Recent studies have shown that large language models (LLMs) possess some abstract deductive reasoning ability given chain-of-thought prompts. However, they have primarily been tested on proofs using modus ponens or of a specific size, and from the same distribution as the in-context examples. To measure the general deductive reasoning ability of LLMs, we test on a broad set of deduction rules and measure their ability to generalize to more complex proofs from simpler demonstrations from multiple angles: depth-, width-, and compositional generalization. To facilitate systematic exploration, we construct a new synthetic and programmable reasoning dataset that enables control over deduction rules and proof complexity. Our experiments on four LLMs of various sizes and training objectives show that they are able to generalize to compositional proofs. However, they have difficulty generalizing to longer proofs, and they require explicit demonstrations to produce hypothetical subproofs, specifically in proof by cases and proof by contradiction. △ Less

Submitted 3 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Published as a conference paper at NeurIPS 2023

arXiv:2305.13299 [pdf, other]

Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations

Authors: Chenglei Si, Dan Friedman, Nitish Joshi, Shi Feng, Danqi Chen, He He

Abstract: In-context learning (ICL) is an important paradigm for adapting large language models (LLMs) to new tasks, but the generalization behavior of ICL remains poorly understood. We investigate the inductive biases of ICL from the perspective of feature bias: which feature ICL is more likely to use given a set of underspecified demonstrations in which two features are equally predictive of the labels. F… ▽ More In-context learning (ICL) is an important paradigm for adapting large language models (LLMs) to new tasks, but the generalization behavior of ICL remains poorly understood. We investigate the inductive biases of ICL from the perspective of feature bias: which feature ICL is more likely to use given a set of underspecified demonstrations in which two features are equally predictive of the labels. First, we characterize the feature biases of GPT-3 models by constructing underspecified demonstrations from a range of NLP datasets and feature combinations. We find that LLMs exhibit clear feature biases - for example, demonstrating a strong bias to predict labels according to sentiment rather than shallow lexical features, like punctuation. Second, we evaluate the effect of different interventions that are designed to impose an inductive bias in favor of a particular feature, such as adding a natural language instruction or using semantically relevant label words. We find that, while many interventions can influence the learner to prefer a particular feature, it can be difficult to overcome strong prior biases. Overall, our results provide a broader picture of the types of features that ICL may be more likely to exploit and how to impose inductive biases that are better aligned with the intended task. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: ACL 2023

arXiv:2305.11938 [pdf, other]

doi 10.18653/v1/2023.findings-emnlp.125

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

Authors: Sebastian Ruder, Jonathan H. Clark, Alexander Gutkin, Mihir Kale, Min Ma, Massimo Nicosia, Shruti Rijhwani, Parker Riley, Jean-Michel A. Sarr, Xinyi Wang, John Wieting, Nitish Gupta, Anna Katanova, Christo Kirov, Dana L. Dickinson, Brian Roark, Bidisha Samanta, Connie Tao, David I. Adelani, Vera Axelrod, Isaac Caswell, Colin Cherry, Dan Garrette, Reeve Ingle, Melvin Johnson , et al. (2 additional authors not shown)

Abstract: Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- languages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot;… ▽ More Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- languages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot; its focus on user-centric tasks -- tasks with broad adoption by speakers of high-resource languages; and its focus on under-represented languages where this scarce-data scenario tends to be most realistic. XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies including ASR, OCR, MT, and information access tasks that are of general utility. We create new datasets for OCR, autocomplete, semantic parsing, and transliteration, and build on and refine existing datasets for other tasks. XTREME-UP provides methodology for evaluating many modeling scenarios including text-only, multi-modal (vision, audio, and text),supervised parameter tuning, and in-context learning. We evaluate commonly used models on the benchmark. We release all code and scripts to train and evaluate models △ Less

Submitted 24 May, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

arXiv:2305.08696 [pdf, other]

Scaling Limits of Quantum Repeater Networks

Authors: Mahdi Chehimi, Shahrooz Pouryousef, Nitish K. Panigrahy, Don Towsley, Walid Saad

Abstract: Quantum networks (QNs) are a promising platform for secure communications, enhanced sensing, and efficient distributed quantum computing. However, due to the fragile nature of quantum states, these networks face significant challenges in terms of scalability. In this paper, the scaling limits of quantum repeater networks (QRNs) are analyzed. The goal of this work is to maximize the overall length,… ▽ More Quantum networks (QNs) are a promising platform for secure communications, enhanced sensing, and efficient distributed quantum computing. However, due to the fragile nature of quantum states, these networks face significant challenges in terms of scalability. In this paper, the scaling limits of quantum repeater networks (QRNs) are analyzed. The goal of this work is to maximize the overall length, or scalability of QRNs such that long-distance quantum communications is achieved while application-specific quality-of-service (QoS) requirements are satisfied. In particular, a novel joint optimization framework that aims at maximizing QRN scalability, while satisfying QoS constraints on the end-to-end fidelity and rate is proposed. The proposed approach optimizes the number of QRN repeater nodes, their separation distance, and the number of distillation rounds to be performed at both link and end-to-end levels. Extensive simulations are conducted to analyze the tradeoffs between QRN scalability, rate, and fidelity under gate and measurement errors. The obtained results characterize the QRN scaling limits for a given QoS requirement. The proposed approach offers a promising solution and design guidelines for future QRN deployments. △ Less

Submitted 26 July, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: 6 pages, 10 figures

arXiv:2305.03317 [pdf, other]

StarPlat: A Versatile DSL for Graph Analytics

Authors: Nibedita Behera, Ashwina Kumar, Ebenezer Rajadurai T, Sai Nitish, Rajesh Pandian M, Rupesh Nasre

Abstract: Graphs model several real-world phenomena. With the growth of unstructured and semi-structured data, parallelization of graph algorithms is inevitable. Unfortunately, due to inherent irregularity of computation, memory access, and communication, graph algorithms are traditionally challenging to parallelize. To tame this challenge, several libraries, frameworks, and domain-specific languages (DSLs)… ▽ More Graphs model several real-world phenomena. With the growth of unstructured and semi-structured data, parallelization of graph algorithms is inevitable. Unfortunately, due to inherent irregularity of computation, memory access, and communication, graph algorithms are traditionally challenging to parallelize. To tame this challenge, several libraries, frameworks, and domain-specific languages (DSLs) have been proposed to reduce the parallel programming burden of the users, who are often domain experts. However, existing frameworks to model graph algorithms typically target a single architecture. In this paper, we present a graph DSL, named StarPlat, that allows programmers to specify graph algorithms in a high-level format, but generates code for three different backends from the same algorithmic specification. In particular, the DSL compiler generates OpenMP for multi-core, MPI for distributed, and CUDA for many-core GPUs. Since these three are completely different parallel programming paradigms, binding them together under the same language is challenging. We share our experience with the language design. Central to our compiler is an intermediate representation which allows a common representation of the high-level program, from which individual backend code generations begin. We demonstrate the expressiveness of StarPlat by specifying four graph algorithms: betweenness centrality computation, page rank computation, single-source shortest paths, and triangle counting. We illustrate the effectiveness of our approach by comparing the performance of the generated codes with that obtained with hand-crafted library codes. We find that the generated code is competitive to library-based codes in many cases. More importantly, we show the feasibility to generate efficient codes for different target architectures from the same algorithmic specification of graph algorithms. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: 30 pages, 21 figures

arXiv:2305.03231 [pdf, other]

Resource Management in Quantum Virtual Private Networks

Authors: Shahrooz Pouryousef, Nitish K. Panigrahy, Monimoy Deb Purkayastha, Sabyasachi Mukhopadhyay, Gert Grammel, Domenico Di Mola, Don Towsley

Abstract: In this study, we develop a resource management framework for a quantum virtual private network (qVPN), which involves the sharing of an underlying public quantum network by multiple organizations for quantum entanglement distribution. Our approach involves resolving the issue of link entanglement resource allocation in a qVPN by utilizing a centralized optimization framework. We provide insights… ▽ More In this study, we develop a resource management framework for a quantum virtual private network (qVPN), which involves the sharing of an underlying public quantum network by multiple organizations for quantum entanglement distribution. Our approach involves resolving the issue of link entanglement resource allocation in a qVPN by utilizing a centralized optimization framework. We provide insights into the potential of genetic and learning-based algorithms for optimizing qVPNs, and emphasize the significance of path selection and distillation in enabling efficient and reliable quantum communication in multi-organizational settings. Our findings demonstrate that compared to traditional greedy based heuristics, genetic and learning-based algorithms can identify better paths. Furthermore, these algorithms can effectively identify good distillation strategies to mitigate potential noises in gates and quantum channels, while ensuring the necessary quality of service for end users. △ Less

Submitted 7 July, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

arXiv:2304.04386 [pdf, ps, other]

Generating Adversarial Attacks in the Latent Space

Authors: Nitish Shukla, Sudipta Banerjee

Abstract: Adversarial attacks in the input (pixel) space typically incorporate noise margins such as $L_1$ or $L_{\infty}$-norm to produce imperceptibly perturbed data that confound deep learning networks. Such noise margins confine the magnitude of permissible noise. In this work, we propose injecting adversarial perturbations in the latent (feature) space using a generative adversarial network, removing t… ▽ More Adversarial attacks in the input (pixel) space typically incorporate noise margins such as $L_1$ or $L_{\infty}$-norm to produce imperceptibly perturbed data that confound deep learning networks. Such noise margins confine the magnitude of permissible noise. In this work, we propose injecting adversarial perturbations in the latent (feature) space using a generative adversarial network, removing the need for margin-based priors. Experiments on MNIST, CIFAR10, Fashion-MNIST, CIFAR100 and Stanford Dogs datasets support the effectiveness of the proposed method in generating adversarial attacks in the latent space while ensuring a high degree of visual realism with respect to pixel-based adversarial attack methods. △ Less

Submitted 10 April, 2023; originally announced April 2023.

arXiv:2303.15331 [pdf, other]

Learning a Single Policy for Diverse Behaviors on a Quadrupedal Robot using Scalable Motion Imitation

Authors: Arnaud Klipfel, Nitish Sontakke, Ren Liu, Sehoon Ha

Abstract: Learning various motor skills for quadrupedal robots is a challenging problem that requires careful design of task-specific mathematical models or reward descriptions. In this work, we propose to learn a single capable policy using deep reinforcement learning by imitating a large number of reference motions, including walking, turning, pacing, jum**, sitting, and lying. On top of the existing mo… ▽ More Learning various motor skills for quadrupedal robots is a challenging problem that requires careful design of task-specific mathematical models or reward descriptions. In this work, we propose to learn a single capable policy using deep reinforcement learning by imitating a large number of reference motions, including walking, turning, pacing, jum**, sitting, and lying. On top of the existing motion imitation framework, we first carefully design the observation space, the action space, and the reward function to improve the scalability of the learning as well as the robustness of the final policy. In addition, we adopt a novel adaptive motion sampling (AMS) method, which maintains a balance between successful and unsuccessful behaviors. This technique allows the learning algorithm to focus on challenging motor skills and avoid catastrophic forgetting. We demonstrate that the learned policy can exhibit diverse behaviors in simulation by successfully tracking both the training dataset and out-of-distribution trajectories. We also validate the importance of the proposed learning formulation and the adaptive motion sampling scheme by conducting experiments. △ Less

Submitted 27 March, 2023; originally announced March 2023.

arXiv:2303.13974

Mixed-Type Wafer Classification For Low Memory Devices Using Knowledge Distillation

Authors: Nitish Shukla, Anurima Dey, Srivatsan K

Abstract: Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial for determining the root cause of production defects, which may further provide insight for yield improvement in wafer foundry. During manufacturing, various defects may appear standalone in the wafer or may appear as different combinations. Identifying multiple defects… ▽ More Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial for determining the root cause of production defects, which may further provide insight for yield improvement in wafer foundry. During manufacturing, various defects may appear standalone in the wafer or may appear as different combinations. Identifying multiple defects in a wafer is generally harder compared to identifying a single defect. Recently, deep learning methods have gained significant traction in mixed-type DPR. However, the complexity of defects requires complex and large models making them very difficult to operate on low-memory embedded devices typically used in fabrication labs. Another common issue is the unavailability of labeled data to train complex networks. In this work, we propose an unsupervised training routine to distill the knowledge of complex pre-trained models to lightweight deployment-ready models. We empirically show that this type of training compresses the model without sacrificing accuracy despite being up to 10 times smaller than the teacher model. The compressed model also manages to outperform contemporary state-of-the-art models. △ Less

Submitted 18 October, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: Study is not relevant

arXiv:2303.13827

Efficient Mixed-Type Wafer Defect Pattern Recognition Using Compact Deformable Convolutional Transformers

Authors: Nitish Shukla

Abstract: Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial to find the root cause of the issue and further improving the yield in the wafer foundry. Mixed-type DPR is much more complicated compared to single-type DPR due to varied spatial features, the uncertainty of defects, and the number of defects present. To accurately pre… ▽ More Manufacturing wafers is an intricate task involving thousands of steps. Defect Pattern Recognition (DPR) of wafer maps is crucial to find the root cause of the issue and further improving the yield in the wafer foundry. Mixed-type DPR is much more complicated compared to single-type DPR due to varied spatial features, the uncertainty of defects, and the number of defects present. To accurately predict the number of defects as well as the types of defects, we propose a novel compact deformable convolutional transformer (DC Transformer). Specifically, DC Transformer focuses on the global features present in the wafer map by virtue of learnable deformable kernels and multi-head attention to the global features. The proposed method succinctly models the internal relationship between the wafer maps and the defects. DC Transformer is evaluated on a real dataset containing 38 defect patterns. Experimental results show that DC Transformer performs exceptionally well in recognizing both single and mixed-type defects. The proposed method outperforms the current state of the models by a considerable margin △ Less

Submitted 16 October, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: Study is not relevant

arXiv:2303.11632

An Embarrassingly Simple Approach for Wafer Feature Extraction and Defect Pattern Recognition

Authors: Nitish Shukla

Abstract: Identifying defect patterns in a wafer map during manufacturing is crucial to find the root cause of the underlying issue and provides valuable insights on improving yield in the foundry. Currently used methods use deep neural networks to identify the defects. These methods are generally very huge and have significant inference time. They also require GPU support to efficiently operate. All these… ▽ More Identifying defect patterns in a wafer map during manufacturing is crucial to find the root cause of the underlying issue and provides valuable insights on improving yield in the foundry. Currently used methods use deep neural networks to identify the defects. These methods are generally very huge and have significant inference time. They also require GPU support to efficiently operate. All these issues make these models not fit for on-line prediction in the manufacturing foundry. In this paper, we propose an extremely simple yet effective technique to extract features from wafer images. The proposed method is extremely fast, intuitive, and non-parametric while being explainable. The experiment results show that the proposed pipeline outperforms conventional deep learning models. Our feature extraction requires no training or fine-tuning while preserving the relative shape and location of data points as revealed by our interpretability analysis. △ Less

Submitted 16 October, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: study is not relevant

arXiv:2303.09597 [pdf, other]

Residual Physics Learning and System Identification for Sim-to-real Transfer of Policies on Buoyancy Assisted Legged Robots

Authors: Nitish Sontakke, Hosik Chae, Sangjoon Lee, Tianle Huang, Dennis W. Hong, Sehoon Ha

Abstract: The light and soft characteristics of Buoyancy Assisted Lightweight Legged Unit (BALLU) robots have a great potential to provide intrinsically safe interactions in environments involving humans, unlike many heavy and rigid robots. However, their unique and sensitive dynamics impose challenges to obtaining robust control policies in the real world. In this work, we demonstrate robust sim-to-real tr… ▽ More The light and soft characteristics of Buoyancy Assisted Lightweight Legged Unit (BALLU) robots have a great potential to provide intrinsically safe interactions in environments involving humans, unlike many heavy and rigid robots. However, their unique and sensitive dynamics impose challenges to obtaining robust control policies in the real world. In this work, we demonstrate robust sim-to-real transfer of control policies on the BALLU robots via system identification and our novel residual physics learning method, Environment Mimic (EnvMimic). First, we model the nonlinear dynamics of the actuators by collecting hardware data and optimizing the simulation parameters. Rather than relying on standard supervised learning formulations, we utilize deep reinforcement learning to train an external force policy to match real-world trajectories, which enables us to model residual physics with greater fidelity. We analyze the improved simulation fidelity by comparing the simulation trajectories against the real-world ones. We finally demonstrate that the improved simulator allows us to learn better walking and turning policies that can be successfully deployed on the hardware of BALLU. △ Less

Submitted 16 March, 2023; originally announced March 2023.

arXiv:2303.08774 [pdf, other]

GPT-4 Technical Report

Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was develo** infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4. △ Less

Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: 100 pages; updated authors list; fixed author names and added citation

arXiv:2301.03996 [pdf, other]

Collaborative Semantic Communication for Edge Inference

Authors: Wing Fei Lo, Nitish Mital, Haotian Wu, Deniz Gündüz

Abstract: We study the collaborative image retrieval problem at the wireless edge, where multiple edge devices capture images of the same object from different angles and locations, which are then used jointly to retrieve similar images at the edge server over a shared multiple access channel (MAC). We propose two novel deep learning-based joint source and channel coding (JSCC) schemes for the task over bot… ▽ More We study the collaborative image retrieval problem at the wireless edge, where multiple edge devices capture images of the same object from different angles and locations, which are then used jointly to retrieve similar images at the edge server over a shared multiple access channel (MAC). We propose two novel deep learning-based joint source and channel coding (JSCC) schemes for the task over both additive white Gaussian noise (AWGN) and Rayleigh slow fading channels, with the aim of maximizing the retrieval accuracy under a total bandwidth constraint. The proposed schemes are evaluated on a wide range of channel signal-to-noise ratios (SNRs), and shown to outperform the single-device JSCC and the separation-based multiple-access benchmarks. We also propose two novel SNR-aware JSCC schemes with attention modules to improve the performance in the case of channel mismatch between training and test instances. △ Less

Submitted 12 February, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

MSC Class: 94A24 ACM Class: E.4

arXiv:2212.04549 [pdf, other]

Optimizing Real-Time Performances for Timed-Loop Racing under F1TENTH

Authors: Nitish Gupta, Kurt Wilson, Zhishan Guo

Abstract: Motion planning and control in autonomous car racing are one of the most challenging and safety-critical tasks due to high speed and dynamism. The lower-level control nodes are expected to be highly optimized due to resource constraints of onboard embedded processing units, although there are strict latency requirements. Some of these guarantees can be provided at the application level, such as us… ▽ More Motion planning and control in autonomous car racing are one of the most challenging and safety-critical tasks due to high speed and dynamism. The lower-level control nodes are expected to be highly optimized due to resource constraints of onboard embedded processing units, although there are strict latency requirements. Some of these guarantees can be provided at the application level, such as using ROS2's Real-Time executors. However, the performance can be far from satisfactory as many modern control algorithms (such as Model Predictive Control) rely on solving complicated online optimization problems at each iteration. In this paper, we present a simple yet effective multi-threading technique to optimize the throughput of online-control algorithms for resource-constrained autonomous racing platforms. We achieve this by maintaining a systematic pool of worker threads solving the optimization problem in parallel which can improve the system performance by reducing latency between control input commands. We further demonstrate the effectiveness of our method using the Model Predictive Contouring Control (MPCC) algorithm running on Nvidia's Xavier AGX platform. △ Less

Submitted 8 December, 2022; originally announced December 2022.

Journal ref: Proceedings of the 43rd IEEE Real-Time Systems Symposium (RTSS), Industry Challenge, Houston, US, Dec. 2022

arXiv:2212.01694 [pdf, other]

A Quantum Overlay Network for Efficient Entanglement Distribution

Authors: Shahrooz Pouryousef, Nitish K. Panigrahy, Don Towsley

Abstract: Distributing quantum entanglements over long distances is essential for the realization of a global scale quantum Internet. Most of the prior work and proposals assume an on-demand distribution of entanglements which may result in significant network resource under-utilization. In this work, we introduce Quantum Overlay Networks (QONs) for efficient entanglement distribution in quantum networks. W… ▽ More Distributing quantum entanglements over long distances is essential for the realization of a global scale quantum Internet. Most of the prior work and proposals assume an on-demand distribution of entanglements which may result in significant network resource under-utilization. In this work, we introduce Quantum Overlay Networks (QONs) for efficient entanglement distribution in quantum networks. When the demand to create end-to-end user entanglements is low, QONs can generate and store maximally entangled Bell pairs (EPR pairs) at specific overlay storage nodes of the network. Later, during peak demands, requests can be served by performing entanglement swaps either over a direct path from the network or over a path using the storage nodes. We solve the link entanglement and storage resource allocation problem in such a QON using a centralized optimization framework. We evaluate the performance of our proposed QON architecture over a wide number of network topologies under various settings using extensive simulation experiments. Our results demonstrate that QONs fare well by a factor of 40% with respect to meeting surge and changing demands compared to traditional non-overlay proposals. QONs also show significant improvement in terms of average entanglement request service delay over non-overlay approaches. △ Less

Submitted 3 December, 2022; originally announced December 2022.

arXiv:2212.01463 [pdf, other]

On the Capacity Region of a Quantum Switch with Entanglement Purification

Authors: Nitish K. Panigrahy, Thirupathaiah Vasantam, Don Towsley, Leandros Tassiulas

Abstract: Quantum switches are envisioned to be an integral component of future entanglement distribution networks. They can provide high quality entanglement distribution service to end-users by performing quantum operations such as entanglement swap** and entanglement purification. In this work, we characterize the capacity region of such a quantum switch under noisy channel transmissions and imperfect… ▽ More Quantum switches are envisioned to be an integral component of future entanglement distribution networks. They can provide high quality entanglement distribution service to end-users by performing quantum operations such as entanglement swap** and entanglement purification. In this work, we characterize the capacity region of such a quantum switch under noisy channel transmissions and imperfect quantum operations. We express the capacity region as a function of the channel and network parameters (link and entanglement swap success probability), entanglement purification yield and application level parameters (target fidelity threshold). In particular, we provide necessary conditions to verify if a set of request rates belong to the capacity region of the switch. We use these conditions to find the maximum achievable end-to-end user entanglement generation throughput by solving a set of linear optimization problems. We develop a max-weight scheduling policy and prove that the policy stabilizes the switch for all feasible request arrival rates. As we develop scheduling policies, we also generate new results for computing the conditional yield distribution of different classes of purification protocols. From numerical experiments, we discover that performing link-level entanglement purification followed by entanglement swaps provides a larger capacity region than doing entanglement swaps followed by end-to-end entanglement purification. The conclusions obtained in this work can yield useful guidelines for subsequent quantum switch designs. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: 10 pages, 4 figures, accepted for a talk at the IEEE International Conference on Computer Communications (INFOCOM), 2023

arXiv:2210.14011 [pdf, other]

Are All Spurious Features in Natural Language Alike? An Analysis through a Causal Lens

Authors: Nitish Joshi, Xiang Pan, He He

Abstract: The term `spurious correlations' has been used in NLP to informally denote any undesirable feature-label correlations. However, a correlation can be undesirable because (i) the feature is irrelevant to the label (e.g. punctuation in a review), or (ii) the feature's effect on the label depends on the context (e.g. negation words in a review), which is ubiquitous in language tasks. In case (i), we w… ▽ More The term `spurious correlations' has been used in NLP to informally denote any undesirable feature-label correlations. However, a correlation can be undesirable because (i) the feature is irrelevant to the label (e.g. punctuation in a review), or (ii) the feature's effect on the label depends on the context (e.g. negation words in a review), which is ubiquitous in language tasks. In case (i), we want the model to be invariant to the feature, which is neither necessary nor sufficient for prediction. But in case (ii), even an ideal model (e.g. humans) must rely on the feature, since it is necessary (but not sufficient) for prediction. Therefore, a more fine-grained treatment of spurious features is needed to specify the desired model behavior. We formalize this distinction using a causal model and probabilities of necessity and sufficiency, which delineates the causal relations between a feature and a label. We then show that this distinction helps explain results of existing debiasing methods on different spurious features, and demystifies surprising results such as the encoding of spurious features in model representations after debiasing. △ Less

Submitted 25 October, 2022; originally announced October 2022.

Comments: EMNLP 2022

arXiv:2210.07313 [pdf, other]

Bootstrap** Multilingual Semantic Parsers using Large Language Models

Authors: Abhijeet Awasthi, Nitish Gupta, Bidisha Samanta, Shachi Dave, Sunita Sarawagi, Partha Talukdar

Abstract: Despite cross-lingual generalization demonstrated by pre-trained multilingual models, the translate-train paradigm of transferring English datasets across multiple languages remains to be a key mechanism for training task-specific multilingual models. However, for many low-resource languages, the availability of a reliable translation service entails significant amounts of costly human-annotated t… ▽ More Despite cross-lingual generalization demonstrated by pre-trained multilingual models, the translate-train paradigm of transferring English datasets across multiple languages remains to be a key mechanism for training task-specific multilingual models. However, for many low-resource languages, the availability of a reliable translation service entails significant amounts of costly human-annotated translation pairs. Further, translation services may continue to be brittle due to domain mismatch between task-specific input text and general-purpose text used for training translation models. For multilingual semantic parsing, we demonstrate the effectiveness and flexibility offered by large language models (LLMs) for translating English datasets into several languages via few-shot prompting. Through extensive comparisons on two public datasets, MTOP and MASSIVE, spanning 50 languages and several domains, we show that our method of translating data using LLMs outperforms a strong translate-train baseline on 41 out of 50 languages. We study the key design choices that enable more effective multilingual data translation via prompted LLMs. △ Less

Submitted 11 February, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

Comments: EACL-23

arXiv:2210.01302 [pdf, other]

Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation

Authors: Aahlad Puli, Nitish Joshi, Yoav Wald, He He, Rajesh Ranganath

Abstract: In prediction tasks, there exist features that are related to the label in the same way across different settings for that task; these are semantic features or semantics. Features with varying relationships to the label are nuisances. For example, in detecting cows from natural images, the shape of the head is semantic but because images of cows often have grass backgrounds but not always, the bac… ▽ More In prediction tasks, there exist features that are related to the label in the same way across different settings for that task; these are semantic features or semantics. Features with varying relationships to the label are nuisances. For example, in detecting cows from natural images, the shape of the head is semantic but because images of cows often have grass backgrounds but not always, the background is a nuisance. Models that exploit nuisance-label relationships face performance degradation when these relationships change. Building models robust to such changes requires additional knowledge beyond samples of the features and labels. For example, existing work uses annotations of nuisances or assumes ERM-trained models depend on nuisances. Approaches to integrate new kinds of additional knowledge enlarge the settings where robust models can be built. We develop an approach to use knowledge about the semantics by corrupting them in data, and then using the corrupted data to produce models which identify correlations between nuisances and the label. Once these correlations are identified, they can be used to adjust for where nuisances drive predictions. We study semantic corruptions in powering different spurious-correlation avoiding methods on multiple out-of-distribution (OOD) tasks like classifying waterbirds, natural language inference (NLI), and detecting cardiomegaly in chest X-rays. △ Less

Submitted 3 July, 2024; v1 submitted 3 October, 2022; originally announced October 2022.

arXiv:2210.00443 [pdf, other]

doi 10.1007/978-3-030-91560-5_24

ReAct: A Review Comment Dataset for Actionability (and more)

Authors: Gautam Choudhary, Natwar Modani, Nitish Maurya

Abstract: Review comments play an important role in the evolution of documents. For a large document, the number of review comments may become large, making it difficult for the authors to quickly grasp what the comments are about. It is important to identify the nature of the comments to identify which comments require some action on the part of document authors, along with identifying the types of these c… ▽ More Review comments play an important role in the evolution of documents. For a large document, the number of review comments may become large, making it difficult for the authors to quickly grasp what the comments are about. It is important to identify the nature of the comments to identify which comments require some action on the part of document authors, along with identifying the types of these comments. In this paper, we introduce an annotated review comment dataset ReAct. The review comments are sourced from OpenReview site. We crowd-source annotations for these reviews for actionability and type of comments. We analyze the properties of the dataset and validate the quality of annotations. We release the dataset (https://github.com/gtmdotme/ReAct) to the research community as a major contribution. We also benchmark our data with standard baselines for classification tasks and analyze their performance. △ Less

Submitted 2 October, 2022; originally announced October 2022.

Comments: Published at WISE 2021

arXiv:2209.02600 [pdf]

Domain Engineering for Applied Monocular Reconstruction of Parametric Faces

Authors: Igor Borovikov, Karine Levonyan, Jon Rein, Pawel Wrotek, Nitish Victor

Abstract: Many modern online 3D applications and video games rely on parametric models of human faces for creating believable avatars. However, manually reproducing someone's facial likeness with a parametric model is difficult and time-consuming. Machine Learning solution for that task is highly desirable but is also challenging. The paper proposes a novel approach to the so-called Face-to-Parameters probl… ▽ More Many modern online 3D applications and video games rely on parametric models of human faces for creating believable avatars. However, manually reproducing someone's facial likeness with a parametric model is difficult and time-consuming. Machine Learning solution for that task is highly desirable but is also challenging. The paper proposes a novel approach to the so-called Face-to-Parameters problem (F2P for short), aiming to reconstruct a parametric face from a single image. The proposed method utilizes synthetic data, domain decomposition, and domain adaptation to address multifaceted challenges in solving the F2P. The open-sourced codebase illustrates our key observations and provides means for quantitative evaluation. The presented approach proves practical in an industrial application; it improves accuracy and allows for more efficient models training. The techniques have the potential to extend to other types of parametric models. △ Less

Submitted 6 September, 2022; originally announced September 2022.

Comments: An extended SIPP 2022 conference paper. arXiv admin note: substantial text overlap with arXiv:2208.02935

Journal ref: Signal and Image Processing: An International Journal, August 2022, Volume 13, No 2/3/4, pages 33-51

arXiv:2208.09434 [pdf, other]

A level-set formulation to simulate diffusive solid/solid phase transformation in polycrystalline metallic materials -- Application to austenite decomposition in steels

Authors: Nitish Chandrappa, Marc Bernacki

Abstract: Numerous full-field numerical methods exist concerning the digital description of polycrystalline materials and the modeling of their evolution during thermomechanical treatments. However, these strategies are globally dedicated to the modeling of recrystallization and grain growth for single-phase materials, or to the modeling of phase transformations without considering recrystallization and rel… ▽ More Numerous full-field numerical methods exist concerning the digital description of polycrystalline materials and the modeling of their evolution during thermomechanical treatments. However, these strategies are globally dedicated to the modeling of recrystallization and grain growth for single-phase materials, or to the modeling of phase transformations without considering recrystallization and related phenomena. A generalized numerical framework capable of making predictions in a multi-phase polycrystalline context while respecting the concomitance of the different microstructural mechanisms is thus of prime interest. A novel finite element level-set based full-field numerical formulation is proposed to principally simulate diffusive solid-solid phase transformation at the mesoscopic scale in the context of two-phase metallic alloys. A global kinetic framework, capable of accounting for other concomitant mechanisms such as recrystallization and grain growth is considered in this numerical model. The proposed numerical framework is shown to be promising through a couple of illustrative 1D and 2D test cases in the context of austenite decomposition in steels and compared with ThermoCalc estimations. △ Less

Submitted 19 August, 2022; originally announced August 2022.

arXiv:2208.03645 [pdf, other]

Generating Negative Samples for Sequential Recommendation

Authors: Yongjun Chen, Jia Li, Zhiwei Liu, Nitish Shirish Keskar, Huan Wang, Julian McAuley, Caiming Xiong

Abstract: To make Sequential Recommendation (SR) successful, recent works focus on designing effective sequential encoders, fusing side information, and mining extra positive self-supervision signals. The strategy of sampling negative items at each time step is less explored. Due to the dynamics of users' interests and model updates during training, considering randomly sampled items from a user's non-inter… ▽ More To make Sequential Recommendation (SR) successful, recent works focus on designing effective sequential encoders, fusing side information, and mining extra positive self-supervision signals. The strategy of sampling negative items at each time step is less explored. Due to the dynamics of users' interests and model updates during training, considering randomly sampled items from a user's non-interacted item set as negatives can be uninformative. As a result, the model will inaccurately learn user preferences toward items. Identifying informative negatives is challenging because informative negative items are tied with both dynamically changed interests and model parameters (and sampling process should also be efficient). To this end, we propose to Generate Negative Samples (items) for SR (GenNi). A negative item is sampled at each time step based on the current SR model's learned user preferences toward items. An efficient implementation is proposed to further accelerate the generation process, making it scalable to large-scale recommendation tasks. Extensive experiments on four public datasets verify the importance of providing high-quality negative samples for SR and demonstrate the effectiveness and efficiency of GenNi. △ Less

Submitted 7 August, 2022; originally announced August 2022.

arXiv:2208.02935 [pdf]

Applied monocular reconstruction of parametric faces with domain engineering

Authors: Igor Borovikov, Karine Levonyan, Jon Rein, Pawel Wrotek, Nitish Victor

Abstract: Many modern online 3D applications and videogames rely on parametric models of human faces for creating believable avatars. However, manual reproduction of someone's facial likeness with a parametric model is difficult and time-consuming. Machine Learning solution for that task is highly desirable but is also challenging. The paper proposes a novel approach to the so-called Face-to-Parameters prob… ▽ More Many modern online 3D applications and videogames rely on parametric models of human faces for creating believable avatars. However, manual reproduction of someone's facial likeness with a parametric model is difficult and time-consuming. Machine Learning solution for that task is highly desirable but is also challenging. The paper proposes a novel approach to the so-called Face-to-Parameters problem (F2P for short), aiming to reconstruct a parametric face from a single image. The proposed method utilizes synthetic data, domain decomposition, and domain adaptation for addressing multifaceted challenges in solving the F2P. The open-sourced codebase illustrates our key observations and provides means for quantitative evaluation. The presented approach proves practical in an industrial application; it improves accuracy and allows for more efficient models training. The techniques have the potential to extend to other types of parametric models. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: 16 pages; SIPP 2022, 10th International Conference on Signal, Image Processing and Pattern Recognition (London, United Kingdom)

arXiv:2207.08489 [pdf, other]

Neural Distributed Image Compression with Cross-Attention Feature Alignment

Authors: Nitish Mital, Ezgi Ozyilkan, Ali Garjani, Deniz Gunduz

Abstract: We consider the problem of compressing an information source when a correlated one is available as side information only at the decoder side, which is a special case of the distributed source coding problem in information theory. In particular, we consider a pair of stereo images, which have overlap** fields of view, and are captured by a synchronized and calibrated pair of cameras as correlated… ▽ More We consider the problem of compressing an information source when a correlated one is available as side information only at the decoder side, which is a special case of the distributed source coding problem in information theory. In particular, we consider a pair of stereo images, which have overlap** fields of view, and are captured by a synchronized and calibrated pair of cameras as correlated image sources. In previously proposed methods, the encoder transforms the input image to a latent representation using a deep neural network, and compresses the quantized latent representation losslessly using entropy coding. The decoder decodes the entropy-coded quantized latent representation, and reconstructs the input image using this representation and the available side information. In the proposed method, the decoder employs a cross-attention module to align the feature maps obtained from the received latent representation of the input image and a latent representation of the side information. We argue that aligning the correlated patches in the feature maps allows better utilization of the side information. We empirically demonstrate the competitiveness of the proposed algorithm on KITTI and Cityscape datasets of stereo image pairs. Our experimental results show that the proposed architecture is able to exploit the decoder-only side information in a more efficient manner compared to previous works. △ Less

Submitted 5 January, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

Comments: 16 pages, 15 figures, presented in WACV 2023

arXiv:2207.00630 [pdf, other]

QA Is the New KR: Question-Answer Pairs as Knowledge Bases

Authors: Wenhu Chen, William W. Cohen, Michiel De Jong, Nitish Gupta, Alessandro Presta, Pat Verga, John Wieting

Abstract: In this position paper, we propose a new approach to generating a type of knowledge base (KB) from text, based on question generation and entity linking. We argue that the proposed type of KB has many of the key advantages of a traditional symbolic KB: in particular, it consists of small modular components, which can be combined compositionally to answer complex queries, including relational queri… ▽ More In this position paper, we propose a new approach to generating a type of knowledge base (KB) from text, based on question generation and entity linking. We argue that the proposed type of KB has many of the key advantages of a traditional symbolic KB: in particular, it consists of small modular components, which can be combined compositionally to answer complex queries, including relational queries and queries involving "multi-hop" inferences. However, unlike a traditional KB, this information store is well-aligned with common user information needs. △ Less

Submitted 1 July, 2022; originally announced July 2022.

Showing 1–50 of 201 results for author: Nitish