-
Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory
Authors:
Gordon Dai,
Weijia Zhang,
**han Li,
Siqi Yang,
Chidera Onochie lbe,
Srihas Rao,
Arthur Caetano,
Misha Sra
Abstract:
The emergence of Large Language Models (LLMs) and advancements in Artificial Intelligence (AI) offer an opportunity for computational social science research at scale. Building upon prior explorations of LLM agent design, our work introduces a simulated agent society where complex social relationships dynamically form and evolve over time. Agents are imbued with psychological drives and placed in…
▽ More
The emergence of Large Language Models (LLMs) and advancements in Artificial Intelligence (AI) offer an opportunity for computational social science research at scale. Building upon prior explorations of LLM agent design, our work introduces a simulated agent society where complex social relationships dynamically form and evolve over time. Agents are imbued with psychological drives and placed in a sandbox survival environment. We conduct an evaluation of the agent society through the lens of Thomas Hobbes's seminal Social Contract Theory (SCT). We analyze whether, as the theory postulates, agents seek to escape a brutish "state of nature" by surrendering rights to an absolute sovereign in exchange for order and security. Our experiments unveil an alignment: Initially, agents engage in unrestrained conflict, mirroring Hobbes's depiction of the state of nature. However, as the simulation progresses, social contracts emerge, leading to the authorization of an absolute sovereign and the establishment of a peaceful commonwealth founded on mutual cooperation. This congruence between our LLM agent society's evolutionary trajectory and Hobbes's theoretical account indicates LLMs' capability to model intricate social dynamics and potentially replicate forces that shape human societies. By enabling such insights into group behavior and emergent societal phenomena, LLM-driven multi-agent simulations, while unable to simulate all the nuances of human behavior, may hold potential for advancing our understanding of social structures, group dynamics, and complex human systems.
△ Less
Submitted 1 July, 2024; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Straight Through Gumbel Softmax Estimator based Bimodal Neural Architecture Search for Audio-Visual Deepfake Detection
Authors:
Aravinda Reddy PN,
Raghavendra Ramachandra,
Krothapalli Sreenivasa Rao,
Pabitra Mitra,
Vinod Rathod
Abstract:
Deepfakes are a major security risk for biometric authentication. This technology creates realistic fake videos that can impersonate real people, fooling systems that rely on facial features and voice patterns for identification. Existing multimodal deepfake detectors rely on conventional fusion methods, such as majority rule and ensemble voting, which often struggle to adapt to changing data char…
▽ More
Deepfakes are a major security risk for biometric authentication. This technology creates realistic fake videos that can impersonate real people, fooling systems that rely on facial features and voice patterns for identification. Existing multimodal deepfake detectors rely on conventional fusion methods, such as majority rule and ensemble voting, which often struggle to adapt to changing data characteristics and complex patterns. In this paper, we introduce the Straight-through Gumbel-Softmax (STGS) framework, offering a comprehensive approach to search multimodal fusion model architectures. Using a two-level search approach, the framework optimizes the network architecture, parameters, and performance. Initially, crucial features were efficiently identified from backbone networks, whereas within the cell structure, a weighted fusion operation integrated information from various sources. An architecture that maximizes the classification performance is derived by varying parameters such as temperature and sampling time. The experimental results on the FakeAVCeleb and SWAN-DF datasets demonstrated an impressive AUC value 94.4\% achieved with minimal model parameters.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
After the Breach: Incident Response within Enterprises
Authors:
Sumanth Rao
Abstract:
Enterprises are constantly under attack from sophisticated adversaries. These adversaries use a variety of techniques to first gain access to the enterprise, then spread laterally inside its networks, establish persistence, and finally exfiltrate sensitive data, or hold it for ransom. While historically, enterprises have used different Incident Response systems that monitor hosts, servers, or netw…
▽ More
Enterprises are constantly under attack from sophisticated adversaries. These adversaries use a variety of techniques to first gain access to the enterprise, then spread laterally inside its networks, establish persistence, and finally exfiltrate sensitive data, or hold it for ransom. While historically, enterprises have used different Incident Response systems that monitor hosts, servers, or network devices to detect and report threats, these systems often need many analysts to triage and respond to alerts. However, the immense quantity of alerts to sift through, combined with the potential risk of missing a valid threat makes the task of the analyst challenging. To ease this manual and laborious process, researchers have proposed a variety of systems that perform automated attack investigations. These systems collect data, track causally related events, and present the analyst with an interpretable summary of the attack. In this paper, we present a survey of systems that perform automated attack investigation, and compare them based on their designs, goals, and heuristics. We discuss the challenges faced by these systems, and present a comparison in terms of their effectiveness, practicality, and ability to address these challenges. We conclude by discussing the future of these systems, and the open problems in this area.
△ Less
Submitted 13 June, 2024; v1 submitted 30 April, 2024;
originally announced June 2024.
-
Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs
Authors:
Claire **,
Sudha Rao,
Xiangyu Peng,
Portia Botchway,
Jessica Quaye,
Chris Brockett,
Bill Dolan
Abstract:
Advancements in large language models (LLMs) are revolutionizing interactive game design, enabling dynamic plotlines and interactions between players and non-player characters (NPCs). However, LLMs may exhibit flaws such as hallucinations, forgetfulness, or misinterpretations of prompts, causing logical inconsistencies and unexpected deviations from intended designs. Automated techniques for detec…
▽ More
Advancements in large language models (LLMs) are revolutionizing interactive game design, enabling dynamic plotlines and interactions between players and non-player characters (NPCs). However, LLMs may exhibit flaws such as hallucinations, forgetfulness, or misinterpretations of prompts, causing logical inconsistencies and unexpected deviations from intended designs. Automated techniques for detecting such game bugs are still lacking. To address this, we propose a systematic LLM-based method for automatically identifying such bugs from player game logs, eliminating the need for collecting additional data such as post-play surveys. Applied to a text-based game DejaBoom!, our approach effectively identifies bugs inherent in LLM-powered interactive games, surpassing unstructured LLM-powered bug-catching methods and filling the gap in automated detection of logical and design flaws.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Jill Watson: A Virtual Teaching Assistant powered by ChatGPT
Authors:
Karan Taneja,
Pratyusha Maiti,
Sandeep Kakar,
Pranav Guruprasad,
Sanjeev Rao,
Ashok K. Goel
Abstract:
Conversational AI agents often require extensive datasets for training that are not publicly released, are limited to social chit-chat or handling a specific domain, and may not be easily extended to accommodate the latest advances in AI technologies. This paper introduces Jill Watson, a conversational Virtual Teaching Assistant (VTA) leveraging the capabilities of ChatGPT. Jill Watson based on Ch…
▽ More
Conversational AI agents often require extensive datasets for training that are not publicly released, are limited to social chit-chat or handling a specific domain, and may not be easily extended to accommodate the latest advances in AI technologies. This paper introduces Jill Watson, a conversational Virtual Teaching Assistant (VTA) leveraging the capabilities of ChatGPT. Jill Watson based on ChatGPT requires no prior training and uses a modular design to allow the integration of new APIs using a skill-based architecture inspired by XiaoIce. Jill Watson is also well-suited for intelligent textbooks as it can process and converse using multiple large documents. We exclusively utilize publicly available resources for reproducibility and extensibility. Comparative analysis shows that our system outperforms the legacy knowledge-based Jill Watson as well as the OpenAI Assistants service. We employ many safety measures that reduce instances of hallucinations and toxicity. The paper also includes real-world examples from a classroom setting that demonstrate different features of Jill Watson and its effectiveness.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Expanderizing Higher Order Random Walks
Authors:
Vedat Levi Alev,
Shravas Rao
Abstract:
We study a variant of the down-up and up-down walks over an $n$-partite simplicial complex, which we call expanderized higher order random walks -- where the sequence of updated coordinates correspond to the sequence of vertices visited by a random walk over an auxiliary expander graph $H$. When $H$ is the clique, this random walk reduces to the usual down-up walk and when $H$ is the directed cycl…
▽ More
We study a variant of the down-up and up-down walks over an $n$-partite simplicial complex, which we call expanderized higher order random walks -- where the sequence of updated coordinates correspond to the sequence of vertices visited by a random walk over an auxiliary expander graph $H$. When $H$ is the clique, this random walk reduces to the usual down-up walk and when $H$ is the directed cycle, this random walk reduces to the well-known systematic scan Glauber dynamics. We show that whenever the usual higher order random walks satisfy a log-Sobolev inequality or a Poincaré inequality, the expanderized walks satisfy the same inequalities with a loss of quality related to the two-sided expansion of the auxillary graph $H$. Our construction can be thought as a higher order random walk generalization of the derandomized squaring algorithm of Rozenman and Vadhan. We show that when initiated with an expander graph our expanderized random walks have mixing time $O(n \log n)$ for sampling a uniformly random list colorings of a graph $G$ of maximum degree $Δ= O(1)$ where each vertex has at least $(11/6 - ε) Δ$ and at most $O(Δ)$ colors and $O\left( \frac{n \log n}{(1 - \| J\|)^2}\right)$ for sampling the Ising model with a PSD interaction matrix $J \in R^{n \times n}$ satisfying $\| J \| \le 1$ and the external field $h \in R^n$-- here the $O(\bullet)$ notation hides a constant that depends linearly on the largest entry of $h$. As expander graphs can be very sparse, this decreases the amount of randomness required to simulate the down-up walks by a logarithmic factor. We also prove some simple results which enable us to argue about log-Sobolev constants of higher order random walks and provide a simple and self-contained analysis of local-to-global $Φ$-entropy contraction in simplicial complexes -- giving simpler proofs for many pre-existing results.
△ Less
Submitted 3 June, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
Player-Driven Emergence in LLM-Driven Game Narrative
Authors:
Xiangyu Peng,
Jessica Quaye,
Sudha Rao,
Weijia Xu,
Portia Botchway,
Chris Brockett,
Nebojsa Jojic,
Gabriel DesGarennes,
Ken Lobb,
Michael Xu,
Jorge Leandro,
Claire **,
Bill Dolan
Abstract:
We explore how interaction with large language models (LLMs) can give rise to emergent behaviors, empowering players to participate in the evolution of game narratives. Our testbed is a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise, but can freely interact with non-player characters generated by GPT-4, a large language model. We recruit 28 gamers t…
▽ More
We explore how interaction with large language models (LLMs) can give rise to emergent behaviors, empowering players to participate in the evolution of game narratives. Our testbed is a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise, but can freely interact with non-player characters generated by GPT-4, a large language model. We recruit 28 gamers to play the game and use GPT-4 to automatically convert the game logs into a node-graph representing the narrative in the player's gameplay. We find that through their interactions with the non-deterministic behavior of the LLM, players are able to discover interesting new emergent nodes that were not a part of the original narrative but have potential for being fun and engaging. Players that created the most emergent nodes tended to be those that often enjoy games that facilitate discovery, exploration and experimentation.
△ Less
Submitted 3 June, 2024; v1 submitted 25 April, 2024;
originally announced April 2024.
-
MLSD-GAN -- Generating Strong High Quality Face Morphing Attacks using Latent Semantic Disentanglement
Authors:
Aravinda Reddy PN,
Raghavendra Ramachandra,
Krothapalli Sreenivasa Rao,
Pabitra Mitra
Abstract:
Face-morphing attacks are a growing concern for biometric researchers, as they can be used to fool face recognition systems (FRS). These attacks can be generated at the image level (supervised) or representation level (unsupervised). Previous unsupervised morphing attacks have relied on generative adversarial networks (GANs). More recently, researchers have used linear interpolation of StyleGAN-en…
▽ More
Face-morphing attacks are a growing concern for biometric researchers, as they can be used to fool face recognition systems (FRS). These attacks can be generated at the image level (supervised) or representation level (unsupervised). Previous unsupervised morphing attacks have relied on generative adversarial networks (GANs). More recently, researchers have used linear interpolation of StyleGAN-encoded images to generate morphing attacks. In this paper, we propose a new method for generating high-quality morphing attacks using StyleGAN disentanglement. Our approach, called MLSD-GAN, spherically interpolates the disentangled latents to produce realistic and diverse morphing attacks. We evaluate the vulnerability of MLSD-GAN on two deep-learning-based FRS techniques. The results show that MLSD-GAN poses a significant threat to FRS, as it can generate morphing attacks that are highly effective at fooling these systems.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Map Optical Properties to Subwavelength Structures Directly via a Diffusion Model
Authors:
Shijie Rao,
Kaiyu Cui,
Yidong Huang,
Jiawei Yang,
Yali Li,
Sheng** Wang,
Xue Feng,
Fang Liu,
Wei Zhang
Abstract:
Subwavelength photonic structures and metamaterials provide revolutionary approaches for controlling light. The inverse design methods proposed for these subwavelength structures are vital to the development of new photonic devices. However, most of the existing inverse design methods cannot realize direct map** from optical properties to photonic structures but instead rely on forward simulatio…
▽ More
Subwavelength photonic structures and metamaterials provide revolutionary approaches for controlling light. The inverse design methods proposed for these subwavelength structures are vital to the development of new photonic devices. However, most of the existing inverse design methods cannot realize direct map** from optical properties to photonic structures but instead rely on forward simulation methods to perform iterative optimization. In this work, we exploit the powerful generative abilities of artificial intelligence (AI) and propose a practical inverse design method based on latent diffusion models. Our method maps directly the optical properties to structures without the requirement of forward simulation and iterative optimization. Here, the given optical properties can work as "prompts" and guide the constructed model to correctly "draw" the required photonic structures. Experiments show that our direct map**-based inverse design method can generate subwavelength photonic structures at high fidelity while following the given optical properties. This may change the method used for optical design and greatly accelerate the research on new photonic devices.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Learning Social Fairness Preferences from Non-Expert Stakeholder Opinions in Kidney Placement
Authors:
Mukund Telukunta,
Sukruth Rao,
Gabriella Stickney,
Venkata Sriram Siddardh Nadendla,
Casey Canfield
Abstract:
Modern kidney placement incorporates several intelligent recommendation systems which exhibit social discrimination due to biases inherited from training data. Although initial attempts were made in the literature to study algorithmic fairness in kidney placement, these methods replace true outcomes with surgeons' decisions due to the long delays involved in recording such outcomes reliably. Howev…
▽ More
Modern kidney placement incorporates several intelligent recommendation systems which exhibit social discrimination due to biases inherited from training data. Although initial attempts were made in the literature to study algorithmic fairness in kidney placement, these methods replace true outcomes with surgeons' decisions due to the long delays involved in recording such outcomes reliably. However, the replacement of true outcomes with surgeons' decisions disregards expert stakeholders' biases as well as social opinions of other stakeholders who do not possess medical expertise. This paper alleviates the latter concern and designs a novel fairness feedback survey to evaluate an acceptance rate predictor (ARP) that predicts a kidney's acceptance rate in a given kidney-match pair. The survey is launched on Prolific, a crowdsourcing platform, and public opinions are collected from 85 anonymous crowd participants. A novel social fairness preference learning algorithm is proposed based on minimizing social feedback regret computed using a novel logit-based fairness feedback model. The proposed model and learning algorithm are both validated using simulation experiments as well as Prolific data. Public preferences towards group fairness notions in the context of kidney placement have been estimated and discussed in detail. The specific ARP tested in the Prolific survey has been deemed fair by the participants.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
DiffSTOCK: Probabilistic relational Stock Market Predictions using Diffusion Models
Authors:
Divyanshu Daiya,
Monika Yadav,
Harshit Singh Rao
Abstract:
In this work, we propose an approach to generalize denoising diffusion probabilistic models for stock market predictions and portfolio management. Present works have demonstrated the efficacy of modeling interstock relations for market time-series forecasting and utilized Graph-based learning models for value prediction and portfolio management. Though convincing, these deterministic approaches st…
▽ More
In this work, we propose an approach to generalize denoising diffusion probabilistic models for stock market predictions and portfolio management. Present works have demonstrated the efficacy of modeling interstock relations for market time-series forecasting and utilized Graph-based learning models for value prediction and portfolio management. Though convincing, these deterministic approaches still fall short of handling uncertainties i.e., due to the low signal-to-noise ratio of the financial data, it is quite challenging to learn effective deterministic models. Since the probabilistic methods have shown to effectively emulate higher uncertainties for time-series predictions. To this end, we showcase effective utilisation of Denoising Diffusion Probabilistic Models (DDPM), to develop an architecture for providing better market predictions conditioned on the historical financial indicators and inter-stock relations. Additionally, we also provide a novel deterministic architecture MaTCHS which uses Masked Relational Transformer(MRT) to exploit inter-stock relations along with historical stock features. We demonstrate that our model achieves SOTA performance for movement predication and Portfolio management.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Cooperative Task Execution in Multi-Agent Systems
Authors:
Karishma,
Shrisha Rao
Abstract:
We propose a multi-agent system that enables groups of agents to collaborate and work autonomously to execute tasks. Groups can work in a decentralized manner and can adapt to dynamic changes in the environment. Groups of agents solve assigned tasks by exploring the solution space cooperatively based on the highest reward first. The tasks have a dependency structure associated with them. We rigoro…
▽ More
We propose a multi-agent system that enables groups of agents to collaborate and work autonomously to execute tasks. Groups can work in a decentralized manner and can adapt to dynamic changes in the environment. Groups of agents solve assigned tasks by exploring the solution space cooperatively based on the highest reward first. The tasks have a dependency structure associated with them. We rigorously evaluated the performance of the system and the individual group performance using centralized and decentralized control approaches for task distribution. Based on the results, the centralized approach is more efficient for systems with a less-dependent system $G_{18}$ (a well-known program graph that contains $18$ nodes with few links), while the decentralized approach performs better for systems with a highly-dependent system $G_{40}$ (a program graph that contains $40$ highly interlinked nodes). We also evaluated task allocation to groups that do not have interdependence. Our findings reveal that there was significantly less difference in the number of tasks allocated to each group in a less-dependent system than in a highly-dependent one. The experimental results showed that a large number of small-size cooperative groups of agents unequivocally improved the system's performance compared to a small number of large-size cooperative groups of agents. Therefore, it is essential to identify the optimal group size for a system to enhance its performance.
△ Less
Submitted 20 May, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
BOXREC: Recommending a Box of Preferred Outfits in Online Shop**
Authors:
Debopriyo Banerjee,
Krothapalli Sreenivasa Rao,
Shamik Sural,
Niloy Ganguly
Abstract:
Over the past few years, automation of outfit composition has gained much attention from the research community. Most of the existing outfit recommendation systems focus on pairwise item compatibility prediction (using visual and text features) to score an outfit combination having several items, followed by recommendation of top-n outfits or a capsule wardrobe having a collection of outfits based…
▽ More
Over the past few years, automation of outfit composition has gained much attention from the research community. Most of the existing outfit recommendation systems focus on pairwise item compatibility prediction (using visual and text features) to score an outfit combination having several items, followed by recommendation of top-n outfits or a capsule wardrobe having a collection of outfits based on user's fashion taste. However, none of these consider user's preference of price-range for individual clothing types or an overall shop** budget for a set of items. In this paper, we propose a box recommendation framework - BOXREC - which at first, collects user preferences across different item types (namely, top-wear, bottom-wear and foot-wear) including price-range of each type and a maximum shop** budget for a particular shop** session. It then generates a set of preferred outfits by retrieving all types of preferred items from the database (according to user specified preferences including price-ranges), creates all possible combinations of three preferred items (belonging to distinct item types) and verifies each combination using an outfit scoring framework - BOXREC-OSF. Finally, it provides a box full of fashion items, such that different combinations of the items maximize the number of outfits suitable for an occasion while satisfying maximum shop** budget. Empirical results show superior performance of BOXREC-OSF over the baseline methods.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Continuous Pushdown VASS in One Dimension are Easy
Authors:
Guillermo A. Perez,
Shrisha Rao
Abstract:
A pushdown vector addition system with states (PVASS) extends the model of vector addition systems with a pushdown stack. The algorithmic analysis of PVASS has applications such as static analysis of recursive programs manipulating integer variables. Unfortunately, reachability analysis, even for one-dimensional PVASS is not known to be decidable. We relax the model of one-dimensional PVASS to mak…
▽ More
A pushdown vector addition system with states (PVASS) extends the model of vector addition systems with a pushdown stack. The algorithmic analysis of PVASS has applications such as static analysis of recursive programs manipulating integer variables. Unfortunately, reachability analysis, even for one-dimensional PVASS is not known to be decidable. We relax the model of one-dimensional PVASS to make the counter updates continuous and show that in this case reachability, coverability, and boundedness are decidable in polynomial time. In addition, for the extension of the model with lower-bound guards on the states, we show that coverability and reachability are in NP, and boundedness is in coNP.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Good Teachers Explain: Explanation-Enhanced Knowledge Distillation
Authors:
Amin Parchami-Araghi,
Moritz Böhle,
Sukrut Rao,
Bernt Schiele
Abstract:
Knowledge Distillation (KD) has proven effective for compressing large teacher models into smaller student models. While it is well known that student models can achieve similar accuracies as the teachers, it has also been shown that they nonetheless often do not learn the same function. It is, however, often highly desirable that the student's and teacher's functions share similar properties such…
▽ More
Knowledge Distillation (KD) has proven effective for compressing large teacher models into smaller student models. While it is well known that student models can achieve similar accuracies as the teachers, it has also been shown that they nonetheless often do not learn the same function. It is, however, often highly desirable that the student's and teacher's functions share similar properties such as basing the prediction on the same input features, as this ensures that students learn the 'right features' from the teachers. In this work, we explore whether this can be achieved by not only optimizing the classic KD loss but also the similarity of the explanations generated by the teacher and the student. Despite the idea being simple and intuitive, we find that our proposed 'explanation-enhanced' KD (e$^2$KD) (1) consistently provides large gains in terms of accuracy and student-teacher agreement, (2) ensures that the student learns from the teacher to be right for the right reasons and to give similar explanations, and (3) is robust with respect to the model architectures, the amount of training data, and even works with 'approximate', pre-computed explanations.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning
Authors:
Debjyoti Mondal,
Suraj Modi,
Subhadarshi Panda,
Rituraj Singh,
Godawari Sudhakar Rao
Abstract:
Large Language Models (LLMs) have demonstrated impressive performance in natural language processing tasks by leveraging chain of thought (CoT) that enables step-by-step thinking. Extending LLMs with multimodal capabilities is the recent interest, but incurs computational cost and requires substantial hardware resources. To address these challenges, we propose KAM-CoT a framework that integrates C…
▽ More
Large Language Models (LLMs) have demonstrated impressive performance in natural language processing tasks by leveraging chain of thought (CoT) that enables step-by-step thinking. Extending LLMs with multimodal capabilities is the recent interest, but incurs computational cost and requires substantial hardware resources. To address these challenges, we propose KAM-CoT a framework that integrates CoT reasoning, Knowledge Graphs (KGs), and multiple modalities for a comprehensive understanding of multimodal tasks. KAM-CoT adopts a two-stage training process with KG grounding to generate effective rationales and answers. By incorporating external knowledge from KGs during reasoning, the model gains a deeper contextual understanding reducing hallucinations and enhancing the quality of answers. This knowledge-augmented CoT reasoning empowers the model to handle questions requiring external context, providing more informed answers. Experimental findings show KAM-CoT outperforms the state-of-the-art methods. On the ScienceQA dataset, we achieve an average accuracy of 93.87%, surpassing GPT-3.5 (75.17%) by 18% and GPT-4 (83.99%) by 10%. Remarkably, KAM-CoT achieves these results with only 280M trainable parameters at a time, demonstrating its cost-efficiency and effectiveness.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Deep Learning Meets Mechanism Design: Key Results and Some Novel Applications
Authors:
V. Udaya Sankar,
Vishisht Srihari Rao,
Y. Narahari
Abstract:
Mechanism design is essentially reverse engineering of games and involves inducing a game among strategic agents in a way that the induced game satisfies a set of desired properties in an equilibrium of the game. Desirable properties for a mechanism include incentive compatibility, individual rationality, welfare maximisation, revenue maximisation (or cost minimisation), fairness of allocation, et…
▽ More
Mechanism design is essentially reverse engineering of games and involves inducing a game among strategic agents in a way that the induced game satisfies a set of desired properties in an equilibrium of the game. Desirable properties for a mechanism include incentive compatibility, individual rationality, welfare maximisation, revenue maximisation (or cost minimisation), fairness of allocation, etc. It is known from mechanism design theory that only certain strict subsets of these properties can be simultaneously satisfied exactly by any given mechanism. Often, the mechanisms required by real-world applications may need a subset of these properties that are theoretically impossible to be simultaneously satisfied. In such cases, a prominent recent approach is to use a deep learning based approach to learn a mechanism that approximately satisfies the required properties by minimizing a suitably defined loss function. In this paper, we present, from relevant literature, technical details of using a deep learning approach for mechanism design and provide an overview of key results in this topic. We demonstrate the power of this approach for three illustrative case studies: (a) efficient energy management in a vehicular network (b) resource allocation in a mobile network (c) designing a volume discount procurement auction for agricultural inputs. Section 6 concludes the paper.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Deterministic Near-Linear Time Minimum Cut in Weighted Graphs
Authors:
Monika Henzinger,
Jason Li,
Satish Rao,
Di Wang
Abstract:
In 1996, Karger [Kar96] gave a startling randomized algorithm that finds a minimum-cut in a (weighted) graph in time $O(m\log^3n)$ which he termed near-linear time meaning linear (in the size of the input) times a polylogarthmic factor. In this paper, we give the first deterministic algorithm which runs in near-linear time for weighted graphs.
Previously, the breakthrough results of Kawarabayash…
▽ More
In 1996, Karger [Kar96] gave a startling randomized algorithm that finds a minimum-cut in a (weighted) graph in time $O(m\log^3n)$ which he termed near-linear time meaning linear (in the size of the input) times a polylogarthmic factor. In this paper, we give the first deterministic algorithm which runs in near-linear time for weighted graphs.
Previously, the breakthrough results of Kawarabayashi and Thorup [KT19] gave a near-linear time algorithm for simple graphs. The main technique here is a clustering procedure that perfectly preserves minimum cuts. Recently, Li [Li21] gave an $m^{1+o(1)}$ deterministic minimum-cut algorithm for weighted graphs; this form of running time has been termed "almost-linear''. Li uses almost-linear time deterministic expander decompositions which do not perfectly preserve minimum cuts, but he can use these clusterings to, in a sense, "derandomize'' the methods of Karger.
In terms of techniques, we provide a structural theorem that says there exists a sparse clustering that preserves minimum cuts in a weighted graph with $o(1)$ error. In addition, we construct it deterministically in near linear time. This was done exactly for simple graphs in [KT19, HRW20] and with polylogarithmic error for weighted graphs in [Li21]. Extending the techniques in [KT19, HRW20] to weighted graphs presents significant challenges, and moreover, the algorithm can only polylogarithmically approximately preserve minimum cuts. A remaining challenge is to reduce the polylogarithmic-approximate clusterings to $1+o(1/\log n)$-approximate so that they can be applied recursively as in [Li21] over $O(\log n)$ many levels. This is an additional challenge that requires building on properties of tree-packings in the presence of a wide range of edge weights to, for example, find sources for local flow computations which identify minimum cuts that cross clusters.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Photonics for Sustainable Computing
Authors:
Farbin Fayza,
Satyavolu Papa Rao,
Darius Bunandar,
Udit Gupta,
Ajay Joshi
Abstract:
Photonic integrated circuits are finding use in a variety of applications including optical transceivers, LIDAR, bio-sensing, photonic quantum computing, and Machine Learning (ML). In particular, with the exponentially increasing sizes of ML models, photonics-based accelerators are getting special attention as a sustainable solution because they can perform ML inferences with multiple orders of ma…
▽ More
Photonic integrated circuits are finding use in a variety of applications including optical transceivers, LIDAR, bio-sensing, photonic quantum computing, and Machine Learning (ML). In particular, with the exponentially increasing sizes of ML models, photonics-based accelerators are getting special attention as a sustainable solution because they can perform ML inferences with multiple orders of magnitude higher energy efficiency than CMOS-based accelerators. However, recent studies have shown that hardware manufacturing and infrastructure contribute significantly to the carbon footprint of computing devices, even surpassing the emissions generated during their use. For example, the manufacturing process accounts for 74% of the total carbon emissions from Apple in 2019. This prompts us to ask -- if we consider both the embodied (manufacturing) and operational carbon cost of photonics, is it indeed a viable avenue for a sustainable future? So, in this paper, we build a carbon footprint model for photonic chips and investigate the sustainability of photonics-based accelerators by conducting a case study on ADEPT, a photonics-based accelerator for deep neural network inference. Our analysis shows that photonics can reduce both operational and embodied carbon footprints with its high energy efficiency and at least 4$\times$ less fabrication carbon cost per unit area than 28 nm CMOS.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Efficient Indexing of Meta-Data (Extracted from Educational Videos)
Authors:
Shalika Kumbham,
Abhijit Debnath,
Krothapalli Sreenivasa Rao
Abstract:
Video lectures are becoming more popular and in demand as online classroom teaching is becoming more prevalent. Massive Open Online Courses (MOOCs), such as NPTEL, have been creating high-quality educational content that is freely accessible to students online. A large number of colleges across the country are now using NPTEL videos in their classrooms. So more video lectures are being recorded, m…
▽ More
Video lectures are becoming more popular and in demand as online classroom teaching is becoming more prevalent. Massive Open Online Courses (MOOCs), such as NPTEL, have been creating high-quality educational content that is freely accessible to students online. A large number of colleges across the country are now using NPTEL videos in their classrooms. So more video lectures are being recorded, maintained, and uploaded. These videos generally contain information about that video before the lecture begins. We generally observe that these educational videos have metadata containing five to six attributes: Institute Name, Publisher Name, Department Name, Professor Name, Subject Name, and Topic Name. It would be easy to maintain these videos if we could organize them according to their categories. The indexing of these videos based on this information is beneficial for students all around the world to efficiently utilise these videos. In this project, we are trying to get the metadata information mentioned above from the video lectures.
△ Less
Submitted 11 December, 2023;
originally announced January 2024.
-
Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA
Authors:
Kaiyuan Yang,
Fabio Musio,
Yihui Ma,
Norman Juchler,
Johannes C. Paetzold,
Rami Al-Maskari,
Luciano Höher,
Hongwei Bran Li,
Ibrahim Ethem Hamamci,
Anjany Sekuboyina,
Suprosanna Shit,
Hou**g Huang,
Chinmay Prabhakar,
Ezequiel de la Rosa,
Diana Waldmannstetter,
Florian Kofler,
Fernando Navarro,
Martin Menten,
Ivan Ezhov,
Daniel Rueckert,
Iris Vos,
Ynte Ruigrok,
Birgitta Velthuis,
Hugo Kuijf,
Julien Hämmerli
, et al. (59 additional authors not shown)
Abstract:
The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neuro-vascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two angiographic imaging modaliti…
▽ More
The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neuro-vascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two angiographic imaging modalities, magnetic resonance angiography (MRA) and computed tomography angiography (CTA), but there exist limited public datasets with annotations on CoW anatomy, especially for CTA. Therefore we organized the TopCoW Challenge in 2023 with the release of an annotated CoW dataset. The TopCoW dataset was the first public dataset with voxel-level annotations for thirteen possible CoW vessel components, enabled by virtual-reality (VR) technology. It was also the first large dataset with paired MRA and CTA from the same patients. TopCoW challenge formalized the CoW characterization problem as a multiclass anatomical segmentation task with an emphasis on topological metrics. We invited submissions worldwide for the CoW segmentation task, which attracted over 140 registered participants from four continents. The top performing teams managed to segment many CoW components to Dice scores around 90%, but with lower scores for communicating arteries and rare variants. There were also topological mistakes for predictions with high Dice scores. Additional topological analysis revealed further areas for improvement in detecting certain CoW components and matching CoW variant topology accurately. TopCoW represented a first attempt at benchmarking the CoW anatomical segmentation task for MRA and CTA, both morphologically and topologically.
△ Less
Submitted 29 April, 2024; v1 submitted 29 December, 2023;
originally announced December 2023.
-
COPD-FlowNet: Elevating Non-invasive COPD Diagnosis with CFD Simulations
Authors:
Aryan Tyagi,
Aryaman Rao,
Shubhanshu Rao,
Raj Kumar Singh
Abstract:
Chronic Obstructive Pulmonary Disorder (COPD) is a prevalent respiratory disease that significantly impacts the quality of life of affected individuals. This paper presents COPDFlowNet, a novel deep-learning framework that leverages a custom Generative Adversarial Network (GAN) to generate synthetic Computational Fluid Dynamics (CFD) velocity flow field images specific to the trachea of COPD patie…
▽ More
Chronic Obstructive Pulmonary Disorder (COPD) is a prevalent respiratory disease that significantly impacts the quality of life of affected individuals. This paper presents COPDFlowNet, a novel deep-learning framework that leverages a custom Generative Adversarial Network (GAN) to generate synthetic Computational Fluid Dynamics (CFD) velocity flow field images specific to the trachea of COPD patients. These synthetic images serve as a valuable resource for data augmentation and model training. Additionally, COPDFlowNet incorporates a custom Convolutional Neural Network (CNN) architecture to predict the location of the obstruction site.
△ Less
Submitted 17 December, 2023;
originally announced December 2023.
-
Learning Generalizable Perceptual Representations for Data-Efficient No-Reference Image Quality Assessment
Authors:
Suhas Srinath,
Shankhanil Mitra,
Shika Rao,
Rajiv Soundararajan
Abstract:
No-reference (NR) image quality assessment (IQA) is an important tool in enhancing the user experience in diverse visual applications. A major drawback of state-of-the-art NR-IQA techniques is their reliance on a large number of human annotations to train models for a target IQA application. To mitigate this requirement, there is a need for unsupervised learning of generalizable quality representa…
▽ More
No-reference (NR) image quality assessment (IQA) is an important tool in enhancing the user experience in diverse visual applications. A major drawback of state-of-the-art NR-IQA techniques is their reliance on a large number of human annotations to train models for a target IQA application. To mitigate this requirement, there is a need for unsupervised learning of generalizable quality representations that capture diverse distortions. We enable the learning of low-level quality features agnostic to distortion types by introducing a novel quality-aware contrastive loss. Further, we leverage the generalizability of vision-language models by fine-tuning one such model to extract high-level image quality information through relevant text prompts. The two sets of features are combined to effectively predict quality by training a simple regressor with very few samples on a target dataset. Additionally, we design zero-shot quality predictions from both pathways in a completely blind setting. Our experiments on diverse datasets encompassing various distortions show the generalizability of the features and their superior performance in the data-efficient and zero-shot settings. Code will be made available at https://github.com/suhas-srinath/GRepQ.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Design of Planar Collision-free Trochoidal Paths for a Multi-robot Swarm
Authors:
Adil Shiyas,
Sachit Rao
Abstract:
In the literature, a distributed consensus protocol by which a connected swarm of agents can generate artistic patterns in 2-dimensional space is proposed. Motivated by this protocol, in this paper, we design the parameters of this protocol for a 3-agent swarm of non-holonomic robots of finite size that results in the generation of periodic trochoidal trajectories that satisfy a set of geometric a…
▽ More
In the literature, a distributed consensus protocol by which a connected swarm of agents can generate artistic patterns in 2-dimensional space is proposed. Motivated by this protocol, in this paper, we design the parameters of this protocol for a 3-agent swarm of non-holonomic robots of finite size that results in the generation of periodic trochoidal trajectories that satisfy a set of geometric and speed constraints; this design also includes selecting the initial positions of the robots. This problem finds applications in persistent surveillance and coverage, guarding a region of interest, and target detection. While the trajectories may be self-intersecting, imposing geometric constraints i. eliminates collisions between robots; ii. ensures minimum and maximum separation distance between any robot and a fixed point, thus ensuring the robots are in communication range. Imposing speed constraints ensure that tracking these trajectories becomes feasible. It is also shown that robots can be injected to these paths at specific locations, in order to increase the refresh rate, without violating any of the geometric constraints. The designs are implemented in an indoor mobile robot platform.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
GENEVA: GENErating and Visualizing branching narratives using LLMs
Authors:
Jorge Leandro,
Sudha Rao,
Michael Xu,
Weijia Xu,
Nebosja Jojic,
Chris Brockett,
Bill Dolan
Abstract:
Dialogue-based Role Playing Games (RPGs) require powerful storytelling. The narratives of these may take years to write and typically involve a large creative team. In this work, we demonstrate the potential of large generative text models to assist this process. \textbf{GENEVA}, a prototype tool, generates a rich narrative graph with branching and reconverging storylines that match a high-level n…
▽ More
Dialogue-based Role Playing Games (RPGs) require powerful storytelling. The narratives of these may take years to write and typically involve a large creative team. In this work, we demonstrate the potential of large generative text models to assist this process. \textbf{GENEVA}, a prototype tool, generates a rich narrative graph with branching and reconverging storylines that match a high-level narrative description and constraints provided by the designer. A large language model (LLM), GPT-4, is used to generate the branching narrative and to render it in a graph format in a two-step process. We illustrate the use of GENEVA in generating new branching narratives for four well-known stories under different contextual constraints. This tool has the potential to assist in game development, simulations, and other applications with game-like properties.
△ Less
Submitted 5 June, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Analysis of sum-of-squares relaxations for the quantum rotor model
Authors:
Sujit Rao
Abstract:
The noncommutative sum-of-squares (ncSoS) hierarchy was introduced by Navascués-Pironio-Acín as a sequence of semidefinite programming relaxations for approximating values of noncommutative polynomial optimization problems, which were originally intended to generalize quantum values of nonlocal games. Recent work has started to analyze the hierarchy for approximating ground energies of local Hamil…
▽ More
The noncommutative sum-of-squares (ncSoS) hierarchy was introduced by Navascués-Pironio-Acín as a sequence of semidefinite programming relaxations for approximating values of noncommutative polynomial optimization problems, which were originally intended to generalize quantum values of nonlocal games. Recent work has started to analyze the hierarchy for approximating ground energies of local Hamiltonians, initially through rounding algorithms which output product states for degree-2 ncSoS applied to Quantum Max-Cut. Some rounding methods are known which output entangled states, but they use degree-4 ncSoS. Based on this, Hwang-Neeman-Parekh-Thompson-Wright conjectured that degree-2 ncSoS cannot beat product state approximations for Quantum Max-Cut and gave a partial proof relying on a conjectural generalization of Borrell's inequality. In this work we consider a family of Hamiltonians (called the quantum rotor model in condensed matter literature or lattice $O(k)$ vector model in quantum field theory) with infinite-dimensional local Hilbert space $L^{2}(S^{k - 1})$, and show that a degree-2 ncSoS relaxation approximates the ground state energy better than any product state.
△ Less
Submitted 29 February, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
Optimal RIP Matrices with Slightly Less Randomness
Authors:
Shravas Rao
Abstract:
A matrix $Φ\in \mathbb{R}^{Q \times N}$ satisfies the restricted isometry property if $\|Φx\|_2^2$ is approximately equal to $\|x\|_2^2$ for all $k$-sparse vectors $x$. We give a construction of RIP matrices with the optimal $Q = O(k \log(N/k))$ rows using $O(k\log(N/k)\log(k))$ bits of randomness. The main technical ingredient is an extension of the Hanson-Wright inequality to $ε$-biased distribu…
▽ More
A matrix $Φ\in \mathbb{R}^{Q \times N}$ satisfies the restricted isometry property if $\|Φx\|_2^2$ is approximately equal to $\|x\|_2^2$ for all $k$-sparse vectors $x$. We give a construction of RIP matrices with the optimal $Q = O(k \log(N/k))$ rows using $O(k\log(N/k)\log(k))$ bits of randomness. The main technical ingredient is an extension of the Hanson-Wright inequality to $ε$-biased distributions.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
ExtSwap: Leveraging Extended Latent Mapper for Generating High Quality Face Swap**
Authors:
Aravinda Reddy PN,
K. Sreenivasa Rao,
Raghavendra Ramachandra,
Pabitra mitra
Abstract:
We present a novel face swap** method using the progressively growing structure of a pre-trained StyleGAN. Previous methods use different encoder decoder structures, embedding integration networks to produce high-quality results, but their quality suffers from entangled representation. We disentangle semantics by deriving identity and attribute features separately. By learning to map the concate…
▽ More
We present a novel face swap** method using the progressively growing structure of a pre-trained StyleGAN. Previous methods use different encoder decoder structures, embedding integration networks to produce high-quality results, but their quality suffers from entangled representation. We disentangle semantics by deriving identity and attribute features separately. By learning to map the concatenated features into the extended latent space, we leverage the state-of-the-art quality and its rich semantic extended latent space. Extensive experiments suggest that the proposed method successfully disentangles identity and attribute features and outperforms many state-of-the-art face swap** methods, both qualitatively and quantitatively.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Near-optimal Differentially Private Client Selection in Federated Settings
Authors:
Syed Eqbal Alam,
Dhirendra Shukla,
Shrisha Rao
Abstract:
We develop an iterative differentially private algorithm for client selection in federated settings. We consider a federated network wherein clients coordinate with a central server to complete a task; however, the clients decide whether to participate or not at a time step based on their preferences -- local computation and probabilistic intent. The algorithm does not require client-to-client inf…
▽ More
We develop an iterative differentially private algorithm for client selection in federated settings. We consider a federated network wherein clients coordinate with a central server to complete a task; however, the clients decide whether to participate or not at a time step based on their preferences -- local computation and probabilistic intent. The algorithm does not require client-to-client information exchange. The developed algorithm provides near-optimal values to the clients over long-term average participation with a certain differential privacy guarantee. Finally, we present the experimental results to check the algorithm's efficacy.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Normality of I-V Measurements Using ML
Authors:
Anees Al-Najjar,
Nageswara S. V. Rao,
Craig A. Bridges,
Sheng Dai
Abstract:
Electrochemistry ecosystems are promising for accelerating the design and discovery of electrochemical systems for energy storage and conversion, by automating significant parts of workflows that combine synthesis and characterization experiments with computations. They require the integration of flow controllers, solvent containers, pumps, fraction collectors, and potentiostats, all connected to…
▽ More
Electrochemistry ecosystems are promising for accelerating the design and discovery of electrochemical systems for energy storage and conversion, by automating significant parts of workflows that combine synthesis and characterization experiments with computations. They require the integration of flow controllers, solvent containers, pumps, fraction collectors, and potentiostats, all connected to an electrochemical cell. These are specialized instruments with custom software that is not originally designed for network integration. We developed network and software solutions for electrochemical workflows that adapt system and instrument settings in real-time for multiple rounds of experiments. We demonstrate this automated workflow by remotely operating the instruments and collecting their measurements to generate a voltammogram (I-V profile) of an electrolyte solution in an electrochemical cell. These measurements are made available at the remote computing system and used for subsequent analysis. In this paper, we focus on a novel, analytically validated machine learning (ML) method for an electrochemistry ecosystem to ensure that I-V measurements are consistent with the normal experimental conditions, and to detect abnormal conditions, such as disconnected electrodes or low cell content volume.
△ Less
Submitted 28 September, 2023;
originally announced October 2023.
-
Two Timin': Repairing Smart Contracts With A Two-Layered Approach
Authors:
Abhinav Jain,
Ehan Masud,
Michelle Han,
Rohan Dhillon,
Sumukh Rao,
Arya Joshi,
Salar Cheema,
Saurav Kumar
Abstract:
Due to the modern relevance of blockchain technology, smart contracts present both substantial risks and benefits. Vulnerabilities within them can trigger a cascade of consequences, resulting in significant losses. Many current papers primarily focus on classifying smart contracts for malicious intent, often relying on limited contract characteristics, such as bytecode or opcode. This paper propos…
▽ More
Due to the modern relevance of blockchain technology, smart contracts present both substantial risks and benefits. Vulnerabilities within them can trigger a cascade of consequences, resulting in significant losses. Many current papers primarily focus on classifying smart contracts for malicious intent, often relying on limited contract characteristics, such as bytecode or opcode. This paper proposes a novel, two-layered framework: 1) classifying and 2) directly repairing malicious contracts. Slither's vulnerability report is combined with source code and passed through a pre-trained RandomForestClassifier (RFC) and Large Language Models (LLMs), classifying and repairing each suggested vulnerability. Experiments demonstrate the effectiveness of fine-tuned and prompt-engineered LLMs. The smart contract repair models, built from pre-trained GPT-3.5-Turbo and fine-tuned Llama-2-7B models, reduced the overall vulnerability count by 97.5% and 96.7% respectively. A manual inspection of repaired contracts shows that all retain functionality, indicating that the proposed method is appropriate for automatic batch classification and repair of vulnerabilities in smart contracts.
△ Less
Submitted 14 September, 2023;
originally announced September 2023.
-
Variations and Relaxations of Normalizing Flows
Authors:
Keegan Kelly,
Lorena Piedras,
Sukrit Rao,
David Roth
Abstract:
Normalizing Flows (NFs) describe a class of models that express a complex target distribution as the composition of a series of bijective transformations over a simpler base distribution. By limiting the space of candidate transformations to diffeomorphisms, NFs enjoy efficient, exact sampling and density evaluation, enabling NFs to flexibly behave as both discriminative and generative models. The…
▽ More
Normalizing Flows (NFs) describe a class of models that express a complex target distribution as the composition of a series of bijective transformations over a simpler base distribution. By limiting the space of candidate transformations to diffeomorphisms, NFs enjoy efficient, exact sampling and density evaluation, enabling NFs to flexibly behave as both discriminative and generative models. Their restriction to diffeomorphisms, however, enforces that input, output and all intermediary spaces share the same dimension, limiting their ability to effectively represent target distributions with complex topologies. Additionally, in cases where the prior and target distributions are not homeomorphic, Normalizing Flows can leak mass outside of the support of the target. This survey covers a selection of recent works that combine aspects of other generative model classes, such as VAEs and score-based diffusion, and in doing so loosen the strict bijectivity constraints of NFs to achieve a balance of expressivity, training speed, sample efficiency and likelihood tractability.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Blind Biological Sequence Denoising with Self-Supervised Set Learning
Authors:
Nathan Ng,
Ji Won Park,
Jae Hyeon Lee,
Ryan Lewis Kelly,
Stephen Ra,
Kyunghyun Cho
Abstract:
Biological sequence analysis relies on the ability to denoise the imprecise output of sequencing platforms. We consider a common setting where a short sequence is read out repeatedly using a high-throughput long-read platform to generate multiple subreads, or noisy observations of the same sequence. Denoising these subreads with alignment-based approaches often fails when too few subreads are avai…
▽ More
Biological sequence analysis relies on the ability to denoise the imprecise output of sequencing platforms. We consider a common setting where a short sequence is read out repeatedly using a high-throughput long-read platform to generate multiple subreads, or noisy observations of the same sequence. Denoising these subreads with alignment-based approaches often fails when too few subreads are available or error rates are too high. In this paper, we propose a novel method for blindly denoising sets of sequences without directly observing clean source sequence labels. Our method, Self-Supervised Set Learning (SSSL), gathers subreads together in an embedding space and estimates a single set embedding as the midpoint of the subreads in both the latent and sequence spaces. This set embedding represents the "average" of the subreads and can be decoded into a prediction of the clean sequence. In experiments on simulated long-read DNA data, SSSL methods denoise small reads of $\leq 6$ subreads with 17% fewer errors and large reads of $>6$ subreads with 8% fewer errors compared to the best baseline. On a real dataset of antibody sequences, SSSL improves over baselines on two self-supervised metrics, with a significant improvement on difficult small reads that comprise over 60% of the test set. By accurately denoising these reads, SSSL promises to better realize the potential of high-throughput DNA sequencing data for downstream scientific applications.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
BenchTemp: A General Benchmark for Evaluating Temporal Graph Neural Networks
Authors:
Qiang Huang,
Jiawei Jiang,
Xi Susie Rao,
Ce Zhang,
Zhichao Han,
Zitao Zhang,
Xin Wang,
Yongjun He,
Quanqing Xu,
Yang Zhao,
Chuang Hu,
Shuo Shang,
Bo Du
Abstract:
To handle graphs in which features or connectivities are evolving over time, a series of temporal graph neural networks (TGNNs) have been proposed. Despite the success of these TGNNs, the previous TGNN evaluations reveal several limitations regarding four critical issues: 1) inconsistent datasets, 2) inconsistent evaluation pipelines, 3) lacking workload diversity, and 4) lacking efficient compari…
▽ More
To handle graphs in which features or connectivities are evolving over time, a series of temporal graph neural networks (TGNNs) have been proposed. Despite the success of these TGNNs, the previous TGNN evaluations reveal several limitations regarding four critical issues: 1) inconsistent datasets, 2) inconsistent evaluation pipelines, 3) lacking workload diversity, and 4) lacking efficient comparison. Overall, there lacks an empirical study that puts TGNN models onto the same ground and compares them comprehensively. To this end, we propose BenchTemp, a general benchmark for evaluating TGNN models on various workloads. BenchTemp provides a set of benchmark datasets so that different TGNN models can be fairly compared. Further, BenchTemp engineers a standard pipeline that unifies the TGNN evaluation. With BenchTemp, we extensively compare the representative TGNN models on different tasks (e.g., link prediction and node classification) and settings (transductive and inductive), w.r.t. both effectiveness and efficiency metrics. We have made BenchTemp publicly available at https://github.com/qianghuangwhu/benchtemp.
△ Less
Submitted 30 August, 2023;
originally announced August 2023.
-
OpenProteinSet: Training data for structural biology at scale
Authors:
Gustaf Ahdritz,
Nazim Bouatta,
Sachin Kadyan,
Lukas Jarosch,
Daniel Berenberg,
Ian Fisk,
Andrew M. Watkins,
Stephen Ra,
Richard Bonneau,
Mohammed AlQuraishi
Abstract:
Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally…
▽ More
Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design and protein structure prediction for decades. Recent breakthroughs like AlphaFold2 that use transformers to attend directly over large quantities of raw MSAs have reaffirmed their importance. Generation of MSAs is highly computationally intensive, however, and no datasets comparable to those used to train AlphaFold2 have been made available to the research community, hindering progress in machine learning for proteins. To remedy this problem, we introduce OpenProteinSet, an open-source corpus of more than 16 million MSAs, associated structural homologs from the Protein Data Bank, and AlphaFold2 protein structure predictions. We have previously demonstrated the utility of OpenProteinSet by successfully retraining AlphaFold2 on it. We expect OpenProteinSet to be broadly useful as training and validation data for 1) diverse tasks focused on protein structure, function, and design and 2) large-scale multimodal machine learning research.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
ProtoX: A First Look
Authors:
Het Mankad,
Sanil Rao,
Brian Van Straalen,
Phillip Colella,
Franz Franchetti
Abstract:
We present a first look at ProtoX, a code generation framework for stencil and pointwise operations that occur frequently in the numerical solution of partial differential equations. ProtoX has Proto as its library frontend and SPIRAL as the backend. Proto is a C++ based domain specific library which optimizes the algorithms used to compute the numerical solution of partial differential equations.…
▽ More
We present a first look at ProtoX, a code generation framework for stencil and pointwise operations that occur frequently in the numerical solution of partial differential equations. ProtoX has Proto as its library frontend and SPIRAL as the backend. Proto is a C++ based domain specific library which optimizes the algorithms used to compute the numerical solution of partial differential equations. Meanwhile, SPIRAL is a code generation system that focuses on generating highly optimized target code. Although the current design layout of Proto and its high level of abstractions provide a user friendly set up, there is still a room for improving it's performance by applying various techniques either at a compiler level or at an algorithmic level. Hence, in this paper we propose adding SPIRAL as the library backend for Proto enabling abstraction fusion, which is usually difficult to perform by any compiler. We demonstrate the construction of ProtoX by considering the 2D Poisson equation as a model problem from Proto. We provide the final generated code for CPU, Multi-core CPU, and GPU as well as some performance numbers for CPU.
△ Less
Submitted 15 July, 2023;
originally announced July 2023.
-
Cyber Framework for Steering and Measurements Collection Over Instrument-Computing Ecosystems
Authors:
Anees Al-Najjar,
Nageswara S. V. Rao,
Ramanan Sankaran,
Helia Zandi,
Debangshu Mukherjee,
Maxim Ziatdinov,
Craig Bridges
Abstract:
We propose a framework to develop cyber solutions to support the remote steering of science instruments and measurements collection over instrument-computing ecosystems. It is based on provisioning separate data and control connections at the network level, and develo** software modules consisting of Python wrappers for instrument commands and Pyro server-client codes that make them available ac…
▽ More
We propose a framework to develop cyber solutions to support the remote steering of science instruments and measurements collection over instrument-computing ecosystems. It is based on provisioning separate data and control connections at the network level, and develo** software modules consisting of Python wrappers for instrument commands and Pyro server-client codes that make them available across the ecosystem network. We demonstrate automated measurement transfers and remote steering operations in a microscopy use case for materials research over an ecosystem of Nion microscopes and computing platforms connected over site networks. The proposed framework is currently under further refinement and being adopted to science workflows with automated remote experiments steering for autonomous chemistry laboratories and smart energy grid simulations.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Multi-Level Power Series Solution for Large Surface and Volume Electric Field Integral Equation
Authors:
Y. K. Negi,
N. Balakrishnan,
S. M. Rao
Abstract:
In this paper, we propose a new multilevel power series solution method for solving a large surface and volume electric field integral equation based H-Matrix. The proposed solution method converges in a fixed number of iterations and is solved at each level of the H-Matrix computation.The solution method avoids the computation of a full matrix, as it can be solved independently at each level, sta…
▽ More
In this paper, we propose a new multilevel power series solution method for solving a large surface and volume electric field integral equation based H-Matrix. The proposed solution method converges in a fixed number of iterations and is solved at each level of the H-Matrix computation.The solution method avoids the computation of a full matrix, as it can be solved independently at each level, starting from the leaf level. Solution at each level can be used as the final solution, thus saving the matrix computation time for full H-Matrix. The paper shows that the leaf level matrix computation and solution with power series gives an accurate results as full H-Matrix iterative solver method. The method results in considerable time and memory savings compared to the H-Matrix iterative solver. Further, the proposed method retains the O(NlogN) solution complexity
△ Less
Submitted 8 July, 2023;
originally announced July 2023.
-
A Diversity Analysis of Safety Metrics Comparing Vehicle Performance in the Lead-Vehicle Interaction Regime
Authors:
Harnarayan Singh,
Bowen Weng,
Sughosh J. Rao,
Devin Elsasser
Abstract:
Vehicle performance metrics analyze data sets consisting of subject vehicle's interactions with other road users in a nominal driving environment and provide certain performance measures as outputs. To the best of the authors' knowledge, the vehicle safety performance metrics research dates back to at least 1967. To date, there still does not exist a community-wide accepted metric or a set of metr…
▽ More
Vehicle performance metrics analyze data sets consisting of subject vehicle's interactions with other road users in a nominal driving environment and provide certain performance measures as outputs. To the best of the authors' knowledge, the vehicle safety performance metrics research dates back to at least 1967. To date, there still does not exist a community-wide accepted metric or a set of metrics for vehicle safety performance assessment and justification. This issue gets further amplified with the evolving interest in Advanced Driver Assistance Systems and Automated Driving Systems. In this paper, the authors seek to perform a unified study that facilitates an improved community-wide understanding of vehicle performance metrics using the lead-vehicle interaction operational design domain as a common means of performance comparison. In particular, the authors study the diversity (including constructive formulation discrepancies and empirical performance differences) among 33 base metrics with up to 51 metric variants (with different choices of hyper-parameters) in the existing literature, published between 1967 and 2022. Two data sets are adopted for the empirical performance diversity analysis, including vehicle trajectories from normal highway driving environment and relatively high-risk incidents with collisions and near-miss cases. The analysis further implies that (i) the conceptual acceptance of a safety metric proposal can be problematic if the assumptions, conditions, and types of outcome assurance are not justified properly, and (ii) the empirical performance justification of an acceptable metric can also be problematic as a dominant consensus is not observed among metrics empirically.
△ Less
Submitted 26 June, 2023;
originally announced June 2023.
-
Protein Discovery with Discrete Walk-Jump Sampling
Authors:
Nathan C. Frey,
Daniel Berenberg,
Karina Zadorozhny,
Joseph Kleinhenz,
Julien Lafrance-Vanasse,
Isidro Hotzel,
Yan Wu,
Stephen Ra,
Richard Bonneau,
Kyunghyun Cho,
Andreas Loukas,
Vladimir Gligorijevic,
Saeed Saremi
Abstract:
We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our Discrete Walk-Jump Sampling formalism combines the contrastive divergence training of an energy-based model and imp…
▽ More
We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our Discrete Walk-Jump Sampling formalism combines the contrastive divergence training of an energy-based model and improved sample quality of a score-based model, while simplifying training and sampling by requiring only a single noise level. We evaluate the robustness of our approach on generative modeling of antibody proteins and introduce the distributional conformity score to benchmark protein generative models. By optimizing and sampling from our models for the proposed distributional conformity score, 97-100% of generated samples are successfully expressed and purified and 70% of functional designs show equal or improved binding affinity compared to known functional antibodies on the first attempt in a single round of laboratory experiments. We also report the first demonstration of long-run fast-mixing MCMC chains where diverse antibody protein classes are visited in a single MCMC chain.
△ Less
Submitted 15 March, 2024; v1 submitted 8 June, 2023;
originally announced June 2023.
-
3HAN: A Deep Neural Network for Fake News Detection
Authors:
Sneha Singhania,
Nigel Fernandez,
Shrisha Rao
Abstract:
The rapid spread of fake news is a serious problem calling for AI solutions. We employ a deep learning based automated detector through a three level hierarchical attention network (3HAN) for fast, accurate detection of fake news. 3HAN has three levels, one each for words, sentences, and the headline, and constructs a news vector: an effective representation of an input news article, by processing…
▽ More
The rapid spread of fake news is a serious problem calling for AI solutions. We employ a deep learning based automated detector through a three level hierarchical attention network (3HAN) for fast, accurate detection of fake news. 3HAN has three levels, one each for words, sentences, and the headline, and constructs a news vector: an effective representation of an input news article, by processing an article in an hierarchical bottom-up manner. The headline is known to be a distinguishing feature of fake news, and furthermore, relatively few words and sentences in an article are more important than the rest. 3HAN gives a differential importance to parts of an article, on account of its three layers of attention. By experiments on a large real-world data set, we observe the effectiveness of 3HAN with an accuracy of 96.77%. Unlike some other deep learning models, 3HAN provides an understandable output through the attention weights given to different parts of an article, which can be visualized through a heatmap to enable further manual fact checking.
△ Less
Submitted 21 June, 2023;
originally announced June 2023.
-
Metasurface-based Spectral Convolutional Neural Network for Matter Meta-imaging
Authors:
Kaiyu Cui,
Shijie Rao,
Sheng Xu,
Yidong Huang,
Jiawei Yang,
Jian Xiong,
Chenxuan Wang,
Xue Feng,
Fang Liu,
Wei Zhang,
Yali Li,
Sheng** Wang
Abstract:
Convolutional neural networks (CNNs) are representative models of artificial neural networks (ANNs), that form the backbone of modern computer vision. However, the considerable power consumption and limited computing speed of electrical computing platforms restrict further development of CNNs. Optical neural networks are considered the next-generation physical implementations of ANNs to break the…
▽ More
Convolutional neural networks (CNNs) are representative models of artificial neural networks (ANNs), that form the backbone of modern computer vision. However, the considerable power consumption and limited computing speed of electrical computing platforms restrict further development of CNNs. Optical neural networks are considered the next-generation physical implementations of ANNs to break the bottleneck. This study proposes a spectral convolutional neural network (SCNN) with the function of matter meta-imaging, namely identifying the composition of matter and map** its distribution in space. This SCNN includes an optical convolutional layer (OCL) and a reconfigurable electrical backend. The OCL is implemented by integrating very large-scale, pixel-aligned metasurfaces on a CMOS image sensor, which accepts 3D raw datacubes of natural images, containing two-spatial and one-spectral dimensions, at megapixels directly as input to realize the matter meta-imaging. This unique optoelectronic framework empowers in-sensor optical analog computing at extremely high energy efficiency eliminating the need for coherent light sources and greatly reducing the computing load of the electrical backend. We employed the SCNN framework on several real-world complex tasks. It achieved accuracies of 96.4% and 100% for pathological diagnosis and real-time face anti-spoofing at video rate, respectively. The SCNN framework, with an unprecedented new function of substance identification, provides a feasible optoelectronic and integrated optical CNN implementation for edge devices or cellphones with limited computing capabilities, facilitating diverse applications, such as intelligent robotics, industrial automation, medical diagnosis, and astronomy.
△ Less
Submitted 27 June, 2023; v1 submitted 19 June, 2023;
originally announced June 2023.
-
3D molecule generation by denoising voxel grids
Authors:
Pedro O. Pinheiro,
Joshua Rackers,
Joseph Kleinhenz,
Michael Maser,
Omar Mahmood,
Andrew Martin Watkins,
Stephen Ra,
Vishnu Sresht,
Saeed Saremi
Abstract:
We propose a new score-based approach to generate 3D molecules represented as atomic densities on regular grids. First, we train a denoising neural network that learns to map from a smooth distribution of noisy molecules to the distribution of real molecules. Then, we follow the neural empirical Bayes framework (Saremi and Hyvarinen, 19) and generate molecules in two steps: (i) sample noisy densit…
▽ More
We propose a new score-based approach to generate 3D molecules represented as atomic densities on regular grids. First, we train a denoising neural network that learns to map from a smooth distribution of noisy molecules to the distribution of real molecules. Then, we follow the neural empirical Bayes framework (Saremi and Hyvarinen, 19) and generate molecules in two steps: (i) sample noisy density grids from a smooth distribution via underdamped Langevin Markov chain Monte Carlo, and (ii) recover the "clean" molecule by denoising the noisy grid with a single step. Our method, VoxMol, generates molecules in a fundamentally different way than the current state of the art (ie, diffusion models applied to atom point clouds). It differs in terms of the data representation, the noise model, the network architecture and the generative modeling algorithm. Our experiments show that VoxMol captures the distribution of drug-like molecules better than state of the art, while being faster to generate samples.
△ Less
Submitted 8 March, 2024; v1 submitted 12 June, 2023;
originally announced June 2023.
-
BOtied: Multi-objective Bayesian optimization with tied multivariate ranks
Authors:
Ji Won Park,
Nataša Tagasovska,
Michael Maser,
Stephen Ra,
Kyunghyun Cho
Abstract:
Many scientific and industrial applications require the joint optimization of multiple, potentially competing objectives. Multi-objective Bayesian optimization (MOBO) is a sample-efficient framework for identifying Pareto-optimal solutions. At the heart of MOBO is the acquisition function, which determines the next candidate to evaluate by navigating the best compromises among the objectives. In t…
▽ More
Many scientific and industrial applications require the joint optimization of multiple, potentially competing objectives. Multi-objective Bayesian optimization (MOBO) is a sample-efficient framework for identifying Pareto-optimal solutions. At the heart of MOBO is the acquisition function, which determines the next candidate to evaluate by navigating the best compromises among the objectives. In this paper, we show a natural connection between non-dominated solutions and the extreme quantile of the joint cumulative distribution function (CDF). Motivated by this link, we propose the Pareto-compliant CDF indicator and the associated acquisition function, BOtied. BOtied inherits desirable invariance properties of the CDF, and an efficient implementation with copulas allows it to scale to many objectives. Our experiments on a variety of synthetic and real-world problems demonstrate that BOtied outperforms state-of-the-art MOBO acquisition functions while being computationally efficient for many objectives.
△ Less
Submitted 7 June, 2024; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Investigating Agency of LLMs in Human-AI Collaboration Tasks
Authors:
Ashish Sharma,
Sudha Rao,
Chris Brockett,
Akanksha Malhotra,
Nebojsa Jojic,
Bill Dolan
Abstract:
Agency, the capacity to proactively shape events, is central to how humans interact and collaborate. While LLMs are being developed to simulate human behavior and serve as human-like agents, little attention has been given to the Agency that these models should possess in order to proactively manage the direction of interaction and collaboration. In this paper, we investigate Agency as a desirable…
▽ More
Agency, the capacity to proactively shape events, is central to how humans interact and collaborate. While LLMs are being developed to simulate human behavior and serve as human-like agents, little attention has been given to the Agency that these models should possess in order to proactively manage the direction of interaction and collaboration. In this paper, we investigate Agency as a desirable function of LLMs, and how it can be measured and managed. We build on social-cognitive theory to develop a framework of features through which Agency is expressed in dialogue - indicating what you intend to do (Intentionality), motivating your intentions (Motivation), having self-belief in intentions (Self-Efficacy), and being able to self-adjust (Self-Regulation). We collect a new dataset of 83 human-human collaborative interior design conversations containing 908 conversational snippets annotated for Agency features. Using this dataset, we develop methods for measuring Agency of LLMs. Automatic and human evaluations show that models that manifest features associated with high Intentionality, Motivation, Self-Efficacy, and Self-Regulation are more likely to be perceived as strongly agentive.
△ Less
Submitted 7 February, 2024; v1 submitted 22 May, 2023;
originally announced May 2023.
-
Graph-Based Reductions for Parametric and Weighted MDPs
Authors:
Kasper Engelen,
Guillermo A. Pérez,
Shrisha Rao
Abstract:
We study the complexity of reductions for weighted reachability in parametric Markov decision processes. That is, we say a state p is never worse than q if for all valuations of the polynomial indeterminates it is the case that the maximal expected weight that can be reached from p is greater than the same value from q. In terms of computational complexity, we establish that determining whether p…
▽ More
We study the complexity of reductions for weighted reachability in parametric Markov decision processes. That is, we say a state p is never worse than q if for all valuations of the polynomial indeterminates it is the case that the maximal expected weight that can be reached from p is greater than the same value from q. In terms of computational complexity, we establish that determining whether p is never worse than q is coETR-complete. On the positive side, we give a polynomial-time algorithm to compute the equivalence classes of the order we study for Markov chains. Additionally, we describe and implement two inference rules to under-approximate the never-worse relation and empirically show that it can be used as an efficient preprocessing step for the analysis of large Markov decision processes.
△ Less
Submitted 9 May, 2023;
originally announced May 2023.
-
Deep Learning for Automated Experimentation in Scanning Transmission Electron Microscopy
Authors:
Sergei V. Kalinin,
Debangshu Mukherjee,
Kevin M. Roccapriore,
Ben Blaiszik,
Ayana Ghosh,
Maxim A. Ziatdinov,
A. Al-Najjar,
Christina Doty,
Sarah Akers,
Nageswara S. Rao,
Joshua C. Agar,
Steven R. Spurgeon
Abstract:
Machine learning (ML) has become critical for post-acquisition data analysis in (scanning) transmission electron microscopy, (S)TEM, imaging and spectroscopy. An emerging trend is the transition to real-time analysis and closed-loop microscope operation. The effective use of ML in electron microscopy now requires the development of strategies for microscopy-centered experiment workflow design and…
▽ More
Machine learning (ML) has become critical for post-acquisition data analysis in (scanning) transmission electron microscopy, (S)TEM, imaging and spectroscopy. An emerging trend is the transition to real-time analysis and closed-loop microscope operation. The effective use of ML in electron microscopy now requires the development of strategies for microscopy-centered experiment workflow design and optimization. Here, we discuss the associated challenges with the transition to active ML, including sequential data analysis and out-of-distribution drift effects, the requirements for the edge operation, local and cloud data storage, and theory in the loop operations. Specifically, we discuss the relative contributions of human scientists and ML agents in the ideation, orchestration, and execution of experimental workflows and the need to develop universal hyper languages that can apply across multiple platforms. These considerations will collectively inform the operationalization of ML in next-generation experimentation.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
A Communication-efficient Local Differentially Private Algorithm in Federated Optimization
Authors:
Syed Eqbal Alam,
Dhirendra Shukla,
Shrisha Rao
Abstract:
Federated optimization, wherein several agents in a network collaborate with a central server to achieve optimal social cost over the network with no requirement for exchanging information among agents, has attracted significant interest from the research community. In this context, agents demand resources based on their local computation. Due to the exchange of optimization parameters such as sta…
▽ More
Federated optimization, wherein several agents in a network collaborate with a central server to achieve optimal social cost over the network with no requirement for exchanging information among agents, has attracted significant interest from the research community. In this context, agents demand resources based on their local computation. Due to the exchange of optimization parameters such as states, constraints, or objective functions with a central server, an adversary may infer sensitive information of agents. We develop a differentially-private additive-increase and multiplicative-decrease algorithm to allocate multiple divisible shared heterogeneous resources to agents in a network. The developed algorithm provides a differential privacy guarantee to each agent in the network. The algorithm does not require inter-agent communication, and the agents do not need to share their cost function or their derivatives with other agents or a central server; however, they share their allocation states with a central server that keeps track of the aggregate consumption of resources. The algorithm incurs very little communication overhead; for m heterogeneous resources in the system, the asymptotic upper bound on the communication complexity is O(m) bits at a time step. Furthermore, if the algorithm converges in K time steps, then the upper bound communication complexity will be O(mK) bits. The algorithm can find applications in several areas, including smart cities, smart energy systems, resource management in the sixth generation (6G) wireless networks with privacy guarantees, etc. We present experimental results to check the efficacy of the algorithm. Furthermore, we present empirical analyses for the trade-off between privacy and algorithm efficiency.
△ Less
Submitted 19 October, 2023; v1 submitted 3 April, 2023;
originally announced April 2023.
-
The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Authors:
Kyle Lo,
Joseph Chee Chang,
Andrew Head,
Jonathan Bragg,
Amy X. Zhang,
Cassidy Trier,
Chloe Anastasiades,
Tal August,
Russell Authur,
Danielle Bragg,
Erin Bransom,
Isabel Cachola,
Stefan Candra,
Yoganand Chandrasekhar,
Yen-Sung Chen,
Evie Yu-Yen Cheng,
Yvonne Chou,
Doug Downey,
Rob Evans,
Raymond Fok,
Fangzhou Hu,
Regan Huff,
Dongyeop Kang,
Tae Soo Kim,
Rodney Kinney
, et al. (30 additional authors not shown)
Abstract:
Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has chan…
▽ More
Scholarly publications are key to the transfer of knowledge from scholars to others. However, research papers are information-dense, and as the volume of the scientific literature grows, the need for new technology to support the reading process grows. In contrast to the process of finding papers, which has been transformed by Internet technology, the experience of reading research papers has changed little in decades. The PDF format for sharing research papers is widely used due to its portability, but it has significant downsides including: static content, poor accessibility for low-vision readers, and difficulty reading on mobile devices. This paper explores the question "Can recent advances in AI and HCI power intelligent, interactive, and accessible reading interfaces -- even for legacy PDFs?" We describe the Semantic Reader Project, a collaborative effort across multiple institutions to explore automatic creation of dynamic reading interfaces for research papers. Through this project, we've developed ten research prototype interfaces and conducted usability studies with more than 300 participants and real-world users showing improved reading experiences for scholars. We've also released a production reading interface for research papers that will incorporate the best features as they mature. We structure this paper around challenges scholars and the public face when reading research papers -- Discovery, Efficiency, Comprehension, Synthesis, and Accessibility -- and present an overview of our progress and remaining open challenges.
△ Less
Submitted 23 April, 2023; v1 submitted 24 March, 2023;
originally announced March 2023.
-
Using Explanations to Guide Models
Authors:
Sukrut Rao,
Moritz Böhle,
Amin Parchami-Araghi,
Bernt Schiele
Abstract:
Deep neural networks are highly performant, but might base their decision on spurious or background features that co-occur with certain classes, which can hurt generalization. To mitigate this issue, the usage of 'model guidance' has gained popularity recently: for this, models are guided to be "right for the right reasons" by regularizing the models' explanations to highlight the right features.…
▽ More
Deep neural networks are highly performant, but might base their decision on spurious or background features that co-occur with certain classes, which can hurt generalization. To mitigate this issue, the usage of 'model guidance' has gained popularity recently: for this, models are guided to be "right for the right reasons" by regularizing the models' explanations to highlight the right features. Experimental validation of these approaches has thus far however been limited to relatively simple and / or synthetic datasets. To gain a better understanding of which model-guiding approaches actually transfer to more challenging real-world datasets, in this work we conduct an in-depth evaluation across various loss functions, attribution methods, models, and 'guidance depths' on the PASCAL VOC 2007 and MS COCO 2014 datasets, and show that model guidance can sometimes even improve model performance. In this context, we further propose a novel energy loss, show its effectiveness in directing the model to focus on object features. We also show that these gains can be achieved even with a small fraction (e.g. 1%) of bounding box annotations, highlighting the cost effectiveness of this approach. Lastly, we show that this approach can also improve generalization under distribution shifts. Code will be made available.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.