-
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Authors:
Shengbang Tong,
Ellis Brown,
Penghao Wu,
Sanghyun Woo,
Manoj Middepogu,
Sai Charitha Akula,
Jihan Yang,
Shusheng Yang,
Adithya Iyer,
Xichen Pan,
Austin Wang,
Rob Fergus,
Yann LeCun,
Saining Xie
Abstract:
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and disconnected from visual representation learning research. This gap hinders accurate sensory grounding in real-world scenarios. Our study uses LLMs and…
▽ More
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and disconnected from visual representation learning research. This gap hinders accurate sensory grounding in real-world scenarios. Our study uses LLMs and visual instruction tuning as an interface to evaluate various visual representations, offering new insights into different models and architectures -- self-supervised, strongly supervised, or combinations thereof -- based on experiments with over 20 vision encoders. We critically examine existing MLLM benchmarks, addressing the difficulties involved in consolidating and interpreting results from various tasks, and introduce a new vision-centric benchmark, CV-Bench. To further improve visual grounding, we propose the Spatial Vision Aggregator (SVA), a dynamic and spatially-aware connector that integrates high-resolution vision features with LLMs while reducing the number of tokens. Additionally, we discuss the curation of high-quality visual instruction-tuning data from publicly available sources, emphasizing the importance of data source balancing and distribution ratio. Collectively, Cambrian-1 not only achieves state-of-the-art performance but also serves as a comprehensive, open cookbook for instruction-tuned MLLMs. We provide model weights, code, supporting tools, datasets, and detailed instruction-tuning and evaluation recipes. We hope our release will inspire and accelerate advancements in multimodal systems and visual representation learning.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Towards Exact Computation of Inductive Bias
Authors:
Akhilan Boopathy,
William Yue,
Jaedong Hwang,
Abhiram Iyer,
Ila Fiete
Abstract:
Much research in machine learning involves finding appropriate inductive biases (e.g. convolutional neural networks, momentum-based optimizers, transformers) to promote generalization on tasks. However, quantification of the amount of inductive bias associated with these architectures and hyperparameters has been limited. We propose a novel method for efficiently computing the inductive bias requi…
▽ More
Much research in machine learning involves finding appropriate inductive biases (e.g. convolutional neural networks, momentum-based optimizers, transformers) to promote generalization on tasks. However, quantification of the amount of inductive bias associated with these architectures and hyperparameters has been limited. We propose a novel method for efficiently computing the inductive bias required for generalization on a task with a fixed training data budget; formally, this corresponds to the amount of information required to specify well-generalizing models within a specific hypothesis space of models. Our approach involves modeling the loss distribution of random hypotheses drawn from a hypothesis space to estimate the required inductive bias for a task relative to these hypotheses. Unlike prior work, our method provides a direct estimate of inductive bias without using bounds and is applicable to diverse hypothesis spaces. Moreover, we derive approximation error bounds for our estimation approach in terms of the number of sampled hypotheses. Consistent with prior results, our empirical results demonstrate that higher dimensional tasks require greater inductive bias. We show that relative to other expressive model classes, neural networks as a model class encode large amounts of inductive bias. Furthermore, our measure quantifies the relative difference in inductive bias between different neural network architectures. Our proposed inductive bias metric provides an information-theoretic interpretation of the benefits of specific model architectures for certain tasks and provides a quantitative guide to develo** tasks requiring greater inductive bias, thereby encouraging the development of more powerful inductive biases.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models
Authors:
Sunny Duan,
Mikail Khona,
Abhiram Iyer,
Rylan Schaeffer,
Ila R Fiete
Abstract:
The proliferation of large language models has revolutionized natural language processing tasks, yet it raises profound concerns regarding data privacy and security. Language models are trained on extensive corpora including potentially sensitive or proprietary information, and the risk of data leakage -- where the model response reveals pieces of such information -- remains inadequately understoo…
▽ More
The proliferation of large language models has revolutionized natural language processing tasks, yet it raises profound concerns regarding data privacy and security. Language models are trained on extensive corpora including potentially sensitive or proprietary information, and the risk of data leakage -- where the model response reveals pieces of such information -- remains inadequately understood. This study examines susceptibility to data leakage by quantifying the phenomenon of memorization in machine learning models, focusing on the evolution of memorization patterns over training. We investigate how the statistical characteristics of training data influence the memories encoded within the model by evaluating how repetition influences memorization. We reproduce findings that the probability of memorizing a sequence scales logarithmically with the number of times it is present in the data. Furthermore, we find that sequences which are not apparently memorized after the first encounter can be uncovered throughout the course of training even without subsequent encounters. The presence of these latent memorized sequences presents a challenge for data privacy since they may be hidden at the final checkpoint of the model. To this end, we develop a diagnostic test for uncovering these latent memorized sequences by considering their cross entropy loss.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Expressive Symbolic Regression for Interpretable Models of Discrete-Time Dynamical Systems
Authors:
Adarsh Iyer,
Nibodh Boddupalli,
Jeff Moehlis
Abstract:
Interpretable mathematical expressions defining discrete-time dynamical systems (iterated maps) can model many phenomena of scientific interest, enabling a deeper understanding of system behaviors. Since formulating governing expressions from first principles can be difficult, it is of particular interest to identify expressions for iterated maps given only their data streams. In this work, we con…
▽ More
Interpretable mathematical expressions defining discrete-time dynamical systems (iterated maps) can model many phenomena of scientific interest, enabling a deeper understanding of system behaviors. Since formulating governing expressions from first principles can be difficult, it is of particular interest to identify expressions for iterated maps given only their data streams. In this work, we consider a modified Symbolic Artificial Neural Network-Trained Expressions (SymANNTEx) architecture for this task, an architecture more expressive than others in the literature. We make a modification to the model pipeline to optimize the regression, then characterize the behavior of the adjusted model in identifying several classical chaotic maps. With the goal of parsimony, sparsity-inducing weight regularization and information theory-informed simplification are implemented. We show that our modified SymANNTEx model properly identifies single-state maps and achieves moderate success in approximating a dual-state attractor. These performances offer significant promise for data-driven scientific discovery and interpretation.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository
Authors:
A**kya Deshpande,
Anmol Agarwal,
Shashank Shet,
Arun Iyer,
Aditya Kanade,
Ramakrishna Bairi,
Suresh Parthasarathy
Abstract:
LLMs have demonstrated significant potential in code generation tasks, achieving promising results at the function or statement level across various benchmarks. However, the complexities associated with creating code artifacts like classes, particularly within the context of real-world software repositories, remain underexplored. Prior research treats class-level generation as an isolated task, ne…
▽ More
LLMs have demonstrated significant potential in code generation tasks, achieving promising results at the function or statement level across various benchmarks. However, the complexities associated with creating code artifacts like classes, particularly within the context of real-world software repositories, remain underexplored. Prior research treats class-level generation as an isolated task, neglecting the intricate dependencies & interactions that characterize real-world software environments. To address this gap, we introduce RepoClassBench, a comprehensive benchmark designed to rigorously evaluate LLMs in generating complex, class-level code within real-world repositories. RepoClassBench includes "Natural Language to Class generation" tasks across Java, Python & C# from a selection of repositories. We ensure that each class in our dataset not only has cross-file dependencies within the repository but also includes corresponding test cases to verify its functionality. We find that current models struggle with the realistic challenges posed by our benchmark, primarily due to their limited exposure to relevant repository contexts. To address this shortcoming, we introduce Retrieve-Repotools-Reflect (RRR), a novel approach that equips LLMs with static analysis tools to iteratively navigate & reason about repository-level context in an agent-based framework. Our experiments demonstrate that RRR significantly outperforms existing baselines on RepoClassBench, showcasing its effectiveness across programming languages & under various settings. Our findings emphasize the critical need for code-generation benchmarks to incorporate repo-level dependencies to more accurately reflect the complexities of software development. Our work shows the benefits of leveraging specialized tools to enhance LLMs' understanding of repository context. We plan to make our dataset & evaluation harness public.
△ Less
Submitted 5 June, 2024; v1 submitted 21 April, 2024;
originally announced May 2024.
-
Resampling-free Particle Filters in High-dimensions
Authors:
Akhilan Boopathy,
Aneesh Muppidi,
Peggy Yang,
Abhiram Iyer,
William Yue,
Ila Fiete
Abstract:
State estimation is crucial for the performance and safety of numerous robotic applications. Among the suite of estimation techniques, particle filters have been identified as a powerful solution due to their non-parametric nature. Yet, in high-dimensional state spaces, these filters face challenges such as 'particle deprivation' which hinders accurate representation of the true posterior distribu…
▽ More
State estimation is crucial for the performance and safety of numerous robotic applications. Among the suite of estimation techniques, particle filters have been identified as a powerful solution due to their non-parametric nature. Yet, in high-dimensional state spaces, these filters face challenges such as 'particle deprivation' which hinders accurate representation of the true posterior distribution. This paper introduces a novel resampling-free particle filter designed to mitigate particle deprivation by forgoing the traditional resampling step. This ensures a broader and more diverse particle set, especially vital in high-dimensional scenarios. Theoretically, our proposed filter is shown to offer a near-accurate representation of the desired posterior distribution in high-dimensional contexts. Empirically, the effectiveness of our approach is underscored through a high-dimensional synthetic state estimation task and a 6D pose estimation derived from videos. We posit that as robotic systems evolve with greater degrees of freedom, particle filters tailored for high-dimensional state spaces will be indispensable.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls
Authors:
Amin Hosseiny Marani,
Ulie Schnaithmann,
Youngseo Son,
Akil Iyer,
Manas Paldhe,
Arushi Raghuvanshi
Abstract:
Current Conversational AI systems employ different machine learning pipelines, as well as external knowledge sources and business logic to predict the next action. Maintaining various components in dialogue managers' pipeline adds complexity in expansion and updates, increases processing time, and causes additive noise through the pipeline that can lead to incorrect next action prediction. This pa…
▽ More
Current Conversational AI systems employ different machine learning pipelines, as well as external knowledge sources and business logic to predict the next action. Maintaining various components in dialogue managers' pipeline adds complexity in expansion and updates, increases processing time, and causes additive noise through the pipeline that can lead to incorrect next action prediction. This paper investigates graph integration into language transformers to improve understanding the relationships between humans' utterances, previous, and next actions without the dependency on external sources or components. Experimental analyses on real calls indicate that the proposed Graph Integrated Language Transformer models can achieve higher performance compared to other production level conversational AI systems in driving interactive calls with human users in real-world settings.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Describe-and-Dissect: Interpreting Neurons in Vision Networks with Language Models
Authors:
Nicholas Bai,
Rahul A. Iyer,
Tuomas Oikarinen,
Tsui-Wei Weng
Abstract:
In this paper, we propose Describe-and-Dissect (DnD), a novel method to describe the roles of hidden neurons in vision networks. DnD utilizes recent advancements in multimodal deep learning to produce complex natural language descriptions, without the need for labeled training data or a predefined set of concepts to choose from. Additionally, DnD is training-free, meaning we don't train any new mo…
▽ More
In this paper, we propose Describe-and-Dissect (DnD), a novel method to describe the roles of hidden neurons in vision networks. DnD utilizes recent advancements in multimodal deep learning to produce complex natural language descriptions, without the need for labeled training data or a predefined set of concepts to choose from. Additionally, DnD is training-free, meaning we don't train any new models and can easily leverage more capable general purpose models in the future. We have conducted extensive qualitative and quantitative analysis to show that DnD outperforms prior work by providing higher quality neuron descriptions. Specifically, our method on average provides the highest quality labels and is more than 2 times as likely to be selected as the best explanation for a neuron than the best baseline.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
OPEN TEACH: A Versatile Teleoperation System for Robotic Manipulation
Authors:
Aadhithya Iyer,
Zhuoran Peng,
Yinlong Dai,
Irmak Guzey,
Siddhant Haldar,
Soumith Chintala,
Lerrel Pinto
Abstract:
Open-sourced, user-friendly tools form the bedrock of scientific advancement across disciplines. The widespread adoption of data-driven learning has led to remarkable progress in multi-fingered dexterity, bimanual manipulation, and applications ranging from logistics to home robotics. However, existing data collection platforms are often proprietary, costly, or tailored to specific robotic morphol…
▽ More
Open-sourced, user-friendly tools form the bedrock of scientific advancement across disciplines. The widespread adoption of data-driven learning has led to remarkable progress in multi-fingered dexterity, bimanual manipulation, and applications ranging from logistics to home robotics. However, existing data collection platforms are often proprietary, costly, or tailored to specific robotic morphologies. We present OPEN TEACH, a new teleoperation system leveraging VR headsets to immerse users in mixed reality for intuitive robot control. Built on the affordable Meta Quest 3, which costs $500, OPEN TEACH enables real-time control of various robots, including multi-fingered hands and bimanual arms, through an easy-to-use app. Using natural hand gestures and movements, users can manipulate robots at up to 90Hz with smooth visual feedback and interface widgets offering closeup environment views. We demonstrate the versatility of OPEN TEACH across 38 tasks on different robots. A comprehensive user study indicates significant improvement in teleoperation capability over the AnyTeleop framework. Further experiments exhibit that the collected data is compatible with policy learning on 10 dexterous and contact-rich manipulation tasks. Currently supporting Franka, xArm, Jaco, and Allegro platforms, OPEN TEACH is fully open-sourced to promote broader adoption. Videos are available at https://open-teach.github.io/.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1092 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 14 June, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
STENCIL: Submodular Mutual Information Based Weak Supervision for Cold-Start Active Learning
Authors:
Nathan Beck,
Adithya Iyer,
Rishabh Iyer
Abstract:
As supervised fine-tuning of pre-trained models within NLP applications increases in popularity, larger corpora of annotated data are required, especially with increasing parameter counts in large language models. Active learning, which attempts to mine and annotate unlabeled instances to improve model performance maximally fast, is a common choice for reducing the annotation cost; however, most m…
▽ More
As supervised fine-tuning of pre-trained models within NLP applications increases in popularity, larger corpora of annotated data are required, especially with increasing parameter counts in large language models. Active learning, which attempts to mine and annotate unlabeled instances to improve model performance maximally fast, is a common choice for reducing the annotation cost; however, most methods typically ignore class imbalance and either assume access to initial annotated data or require multiple rounds of active learning selection before improving rare classes. We present STENCIL, which utilizes a set of text exemplars and the recently proposed submodular mutual information to select a set of weakly labeled rare-class instances that are then strongly labeled by an annotator. We show that STENCIL improves overall accuracy by $10\%-24\%$ and rare-class F-1 score by $17\%-40\%$ on multiple text classification datasets over common active learning methods within the class-imbalanced cold-start setting.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Making Short-Form Videos Accessible with Hierarchical Video Summaries
Authors:
Tess Van Daele,
Akhil Iyer,
Yuning Zhang,
Jalyn C. Derry,
Mina Huh,
Amy Pavel
Abstract:
Short videos on platforms such as TikTok, Instagram Reels, and YouTube Shorts (i.e. short-form videos) have become a primary source of information and entertainment. Many short-form videos are inaccessible to blind and low vision (BLV) viewers due to their rapid visual changes, on-screen text, and music or meme-audio overlays. In our formative study, 7 BLV viewers who regularly watched short-form…
▽ More
Short videos on platforms such as TikTok, Instagram Reels, and YouTube Shorts (i.e. short-form videos) have become a primary source of information and entertainment. Many short-form videos are inaccessible to blind and low vision (BLV) viewers due to their rapid visual changes, on-screen text, and music or meme-audio overlays. In our formative study, 7 BLV viewers who regularly watched short-form videos reported frequently skip** such inaccessible content. We present ShortScribe, a system that provides hierarchical visual summaries of short-form videos at three levels of detail to support BLV viewers in selecting and understanding short-form videos. ShortScribe allows BLV users to navigate between video descriptions based on their level of interest. To evaluate ShortScribe, we assessed description accuracy and conducted a user study with 10 BLV participants comparing ShortScribe to a baseline interface. When using ShortScribe, participants reported higher comprehension and provided more accurate summaries of video content.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Continuous Treatment Effect Estimation Using Gradient Interpolation and Kernel Smoothing
Authors:
Lokesh Nagalapatti,
Akshay Iyer,
Abir De,
Sunita Sarawagi
Abstract:
We address the Individualized continuous treatment effect (ICTE) estimation problem where we predict the effect of any continuous-valued treatment on an individual using observational data. The main challenge in this estimation task is the potential confounding of treatment assignment with an individual's covariates in the training data, whereas during inference ICTE requires prediction on indepen…
▽ More
We address the Individualized continuous treatment effect (ICTE) estimation problem where we predict the effect of any continuous-valued treatment on an individual using observational data. The main challenge in this estimation task is the potential confounding of treatment assignment with an individual's covariates in the training data, whereas during inference ICTE requires prediction on independently sampled treatments. In contrast to prior work that relied on regularizers or unstable GAN training, we advocate the direct approach of augmenting training individuals with independently sampled treatments and inferred counterfactual outcomes. We infer counterfactual outcomes using a two-pronged strategy: a Gradient Interpolation for close-to-observed treatments, and a Gaussian Process based Kernel Smoothing which allows us to downweigh high variance inferences. We evaluate our method on five benchmarks and show that our method outperforms six state-of-the-art methods on the counterfactual estimation error. We analyze the superior performance of our method by showing that (1) our inferred counterfactual responses are more accurate, and (2) adding them to the training data reduces the distributional distance between the confounded training distribution and test distribution where treatment is independent of covariates. Our proposed method is model-agnostic and we show that it improves ICTE accuracy of several existing models.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
Gemini: A Family of Highly Capable Multimodal Models
Authors:
Gemini Team,
Rohan Anil,
Sebastian Borgeaud,
Jean-Baptiste Alayrac,
Jiahui Yu,
Radu Soricut,
Johan Schalkwyk,
Andrew M. Dai,
Anja Hauth,
Katie Millican,
David Silver,
Melvin Johnson,
Ioannis Antonoglou,
Julian Schrittwieser,
Amelia Glaese,
Jilin Chen,
Emily Pitler,
Timothy Lillicrap,
Angeliki Lazaridou,
Orhan Firat,
James Molloy,
Michael Isard,
Paul R. Barham,
Tom Hennigan,
Benjamin Lee
, et al. (1325 additional authors not shown)
Abstract:
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr…
▽ More
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
△ Less
Submitted 17 June, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Apparate: Rethinking Early Exits to Tame Latency-Throughput Tensions in ML Serving
Authors:
Yinwei Dai,
Rui Pan,
Anand Iyer,
Kai Li,
Ravi Netravali
Abstract:
Machine learning (ML) inference platforms are tasked with balancing two competing goals: ensuring high throughput given many requests, and delivering low-latency responses to support interactive applications. Unfortunately, existing platform knobs (e.g., batch sizes) fail to ease this fundamental tension, and instead only enable users to harshly trade off one property for the other. This paper exp…
▽ More
Machine learning (ML) inference platforms are tasked with balancing two competing goals: ensuring high throughput given many requests, and delivering low-latency responses to support interactive applications. Unfortunately, existing platform knobs (e.g., batch sizes) fail to ease this fundamental tension, and instead only enable users to harshly trade off one property for the other. This paper explores an alternate strategy to taming throughput-latency tradeoffs by changing the granularity at which inference is performed. We present Apparate, a system that automatically applies and manages early exits (EEs) in ML models, whereby certain inputs can exit with results at intermediate layers. To cope with the time-varying overhead and accuracy challenges that EEs bring, Apparate repurposes exits to provide continual feedback that powers several novel runtime monitoring and adaptation strategies. Apparate lowers median response latencies by 40.5-91.5% and 10.0-24.2% for diverse CV and NLP workloads, respectively, without affecting throughputs or violating tight accuracy constraints.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Measuring Distributional Shifts in Text: The Advantage of Language Model-Based Embeddings
Authors:
Gyandev Gupta,
Bashir Rastegarpanah,
Amalendu Iyer,
Joshua Rubin,
Krishnaram Kenthapadi
Abstract:
An essential part of monitoring machine learning models in production is measuring input and output data drift. In this paper, we present a system for measuring distributional shifts in natural language data and highlight and investigate the potential advantage of using large language models (LLMs) for this problem. Recent advancements in LLMs and their successful adoption in different domains ind…
▽ More
An essential part of monitoring machine learning models in production is measuring input and output data drift. In this paper, we present a system for measuring distributional shifts in natural language data and highlight and investigate the potential advantage of using large language models (LLMs) for this problem. Recent advancements in LLMs and their successful adoption in different domains indicate their effectiveness in capturing semantic relationships for solving various natural language processing problems. The power of LLMs comes largely from the encodings (embeddings) generated in the hidden layers of the corresponding neural network. First we propose a clustering-based algorithm for measuring distributional shifts in text data by exploiting such embeddings. Then we study the effectiveness of our approach when applied to text embeddings generated by both LLMs and classical embedding algorithms. Our experiments show that general-purpose LLM-based embeddings provide a high sensitivity to data drift compared to other embedding methods. We propose drift sensitivity as an important evaluation metric to consider when comparing language models. Finally, we present insights and lessons learned from deploying our framework as part of the Fiddler ML Monitoring platform over a period of 18 months.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
A Software-Hardware Co-Optimized Toolkit for Deep Reinforcement Learning on Heterogeneous Platforms
Authors:
Yuan Meng,
Michael Kinsner,
Deshanand Singh,
Mahesh A Iyer,
Viktor Prasanna
Abstract:
Deep Reinforcement Learning (DRL) is vital in various AI applications. DRL algorithms comprise diverse compute kernels, which may not be simultaneously optimized using a homogeneous architecture. However, even with available heterogeneous architectures, optimizing DRL performance remains a challenge due to the complexity of hardware and programming models employed in modern data centers. To addres…
▽ More
Deep Reinforcement Learning (DRL) is vital in various AI applications. DRL algorithms comprise diverse compute kernels, which may not be simultaneously optimized using a homogeneous architecture. However, even with available heterogeneous architectures, optimizing DRL performance remains a challenge due to the complexity of hardware and programming models employed in modern data centers. To address this, we introduce PEARL, a toolkit for composing parallel DRL systems on heterogeneous platforms consisting of general-purpose processors (CPUs) and accelerators (GPUs, FPGAs). Our innovations include: 1. A general training protocol agnostic of the underlying hardware, enabling portable implementations across various processors and accelerators. 2. Incorporation of DRL-specific scheduling optimizations within the protocol, facilitating parallelized training and enhancing the overall system performance. 3. High-level API for productive development using the toolkit. 4. Automatic optimization of DRL task-to-device assignments through performance estimation, supporting various optimization metrics including throughput and power efficiency.
We showcase our toolkit through experimentation with two widely used DRL algorithms, DQN and DDPG, on two diverse heterogeneous platforms. The generated implementations outperform state-of-the-art libraries for CPU-GPU platforms by throughput improvements of up to 2.1$\times$ and power efficiency improvements of up to 3.4$\times$.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations
Authors:
Chanakya Ekbote,
A**kya Pankaj Deshpande,
Arun Iyer,
Ramakrishna Bairi,
Sundararajan Sellamanickam
Abstract:
Unsupervised node representations learnt using contrastive learning-based methods have shown good performance on downstream tasks. However, these methods rely on augmentations that mimic low-pass filters, limiting their performance on tasks requiring different eigen-spectrum parts. This paper presents a simple filter-based augmentation method to capture different parts of the eigen-spectrum. We sh…
▽ More
Unsupervised node representations learnt using contrastive learning-based methods have shown good performance on downstream tasks. However, these methods rely on augmentations that mimic low-pass filters, limiting their performance on tasks requiring different eigen-spectrum parts. This paper presents a simple filter-based augmentation method to capture different parts of the eigen-spectrum. We show significant improvements using these augmentations. Further, we show that sharing the same weights across these different filter augmentations is possible, reducing the computational load. In addition, previous works have shown that good performance on downstream tasks requires high dimensional representations. Working with high dimensions increases the computations, especially when multiple augmentations are involved. We mitigate this problem and recover good performance through lower dimensional embeddings using simple random Fourier feature projections. Our method, FiGURe achieves an average gain of up to 4.4%, compared to the state-of-the-art unsupervised models, across all datasets in consideration, both homophilic and heterophilic. Our code can be found at: https://github.com/microsoft/figure.
△ Less
Submitted 4 October, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
CodePlan: Repository-level Coding using LLMs and Planning
Authors:
Ramakrishna Bairi,
Atharv Sonwane,
Aditya Kanade,
Vageesh D C,
Arun Iyer,
Suresh Parthasarathy,
Sriram Rajamani,
B. Ashok,
Shashank Shet
Abstract:
Software engineering activities such as package migration, fixing errors reports from static analysis or testing, and adding type annotations or other specifications to a codebase, involve pervasively editing the entire repository of code. We formulate these activities as repository-level coding tasks.
Recent tools like GitHub Copilot, which are powered by Large Language Models (LLMs), have succ…
▽ More
Software engineering activities such as package migration, fixing errors reports from static analysis or testing, and adding type annotations or other specifications to a codebase, involve pervasively editing the entire repository of code. We formulate these activities as repository-level coding tasks.
Recent tools like GitHub Copilot, which are powered by Large Language Models (LLMs), have succeeded in offering high-quality solutions to localized coding problems. Repository-level coding tasks are more involved and cannot be solved directly using LLMs, since code within a repository is inter-dependent and the entire repository may be too large to fit into the prompt. We frame repository-level coding as a planning problem and present a task-agnostic framework, called CodePlan to solve it. CodePlan synthesizes a multi-step chain of edits (plan), where each step results in a call to an LLM on a code location with context derived from the entire repository, previous code changes and task-specific instructions. CodePlan is based on a novel combination of an incremental dependency analysis, a change may-impact analysis and an adaptive planning algorithm.
We evaluate the effectiveness of CodePlan on two repository-level tasks: package migration (C#) and temporal code edits (Python). Each task is evaluated on multiple code repositories, each of which requires inter-dependent changes to many files (between 2-97 files). Coding tasks of this level of complexity have not been automated using LLMs before. Our results show that CodePlan has better match with the ground truth compared to baselines. CodePlan is able to get 5/6 repositories to pass the validity checks (e.g., to build without errors and make correct code edits) whereas the baselines (without planning but with the same type of contextual information as CodePlan) cannot get any of the repositories to pass them.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Efficient Q-Learning over Visit Frequency Maps for Multi-agent Exploration of Unknown Environments
Authors:
Xuyang Chen,
Ashvin N. Iyer,
Zixing Wang,
Ahmed H. Qureshi
Abstract:
The robot exploration task has been widely studied with applications spanning from novel environment map** to item delivery. For some time-critical tasks, such as rescue catastrophes, the agent is required to explore as efficiently as possible. Recently, Visit Frequency-based map representation achieved great success in such scenarios by discouraging repetitive visits with a frequency-based pena…
▽ More
The robot exploration task has been widely studied with applications spanning from novel environment map** to item delivery. For some time-critical tasks, such as rescue catastrophes, the agent is required to explore as efficiently as possible. Recently, Visit Frequency-based map representation achieved great success in such scenarios by discouraging repetitive visits with a frequency-based penalty. However, its relatively large size and single-agent settings hinder its further development. In this context, we propose Integrated Visit Frequency Map, which encodes identical information as Visit Frequency Map with a more compact size, and a visit frequency-based multi-agent information exchange and control scheme that is able to accommodate both representations. Through tests in diverse settings, the results indicate our proposed methods can achieve a comparable level of performance of VFM with lower bandwidth requirements and generalize well to different multi-agent setups including real-world environments.
△ Less
Submitted 30 July, 2023;
originally announced July 2023.
-
Autonomous Systems: Autonomous Systems: Indoor Drone Navigation
Authors:
Aswin Iyer,
Santosh Narayan,
Naren M,
Manoj kumar Rajagopal
Abstract:
Drones are a promising technology for autonomous data collection and indoor sensing. In situations when human-controlled UAVs may not be practical or dependable, such as in uncharted or dangerous locations, the usage of autonomous UAVs offers flexibility, cost savings, and reduced risk. The system creates a simulated quadcopter capable of autonomously travelling in an indoor environment using the…
▽ More
Drones are a promising technology for autonomous data collection and indoor sensing. In situations when human-controlled UAVs may not be practical or dependable, such as in uncharted or dangerous locations, the usage of autonomous UAVs offers flexibility, cost savings, and reduced risk. The system creates a simulated quadcopter capable of autonomously travelling in an indoor environment using the gazebo simulation tool and the ros navigation system framework known as Navigaation2. While Nav2 has successfully shown the functioning of autonomous navigation in terrestrial robots and vehicles, the same hasn't been accomplished with unmanned aerial vehicles and still has to be done. The goal is to use the slam toolbox for ROS and the Nav2 navigation system framework to construct a simulated drone that can move autonomously in an indoor (gps-less) environment.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
Open-Set Multi-Source Multi-Target Domain Adaptation
Authors:
Rohit Lal,
Arihant Gaur,
Aadhithya Iyer,
Muhammed Abdullah Shaikh,
Ritik Agrawal
Abstract:
Single-Source Single-Target Domain Adaptation (1S1T) aims to bridge the gap between a labelled source domain and an unlabelled target domain. Despite 1S1T being a well-researched topic, they are typically not deployed to the real world. Methods like Multi-Source Domain Adaptation and Multi-Target Domain Adaptation have evolved to model real-world problems but still do not generalise well. The fact…
▽ More
Single-Source Single-Target Domain Adaptation (1S1T) aims to bridge the gap between a labelled source domain and an unlabelled target domain. Despite 1S1T being a well-researched topic, they are typically not deployed to the real world. Methods like Multi-Source Domain Adaptation and Multi-Target Domain Adaptation have evolved to model real-world problems but still do not generalise well. The fact that most of these methods assume a common label-set between source and target is very restrictive. Recent Open-Set Domain Adaptation methods handle unknown target labels but fail to generalise in multiple domains. To overcome these difficulties, first, we propose a novel generic domain adaptation (DA) setting named Open-Set Multi-Source Multi-Target Domain Adaptation (OS-nSmT), with n and m being number of source and target domains respectively. Next, we propose a graph attention based framework named DEGAA which can capture information from multiple source and target domains without knowing the exact label-set of the target. We argue that our method, though offered for multiple sources and multiple targets, can also be agnostic to various other DA settings. To check the robustness and versatility of DEGAA, we put forward ample experiments and ablation studies.
△ Less
Submitted 3 February, 2023; v1 submitted 2 February, 2023;
originally announced February 2023.
-
Fully Bayesian inference for latent variable Gaussian process models
Authors:
Suraj Yerramilli,
Akshay Iyer,
Wei Chen,
Daniel W. Apley
Abstract:
Real engineering and scientific applications often involve one or more qualitative inputs. Standard Gaussian processes (GPs), however, cannot directly accommodate qualitative inputs. The recently introduced latent variable Gaussian process (LVGP) overcomes this issue by first map** each qualitative factor to underlying latent variables (LVs), and then uses any standard GP covariance function ove…
▽ More
Real engineering and scientific applications often involve one or more qualitative inputs. Standard Gaussian processes (GPs), however, cannot directly accommodate qualitative inputs. The recently introduced latent variable Gaussian process (LVGP) overcomes this issue by first map** each qualitative factor to underlying latent variables (LVs), and then uses any standard GP covariance function over these LVs. The LVs are estimated similarly to the other GP hyperparameters through maximum likelihood estimation, and then plugged into the prediction expressions. However, this plug-in approach will not account for uncertainty in estimation of the LVs, which can be significant especially with limited training data. In this work, we develop a fully Bayesian approach for the LVGP model and for visualizing the effects of the qualitative inputs via their LVs. We also develop approximations for scaling up LVGPs and fully Bayesian inference for the LVGP hyperparameters. We conduct numerical studies comparing plug-in inference against fully Bayesian inference over a few engineering models and material design applications. In contrast to previous studies on standard GP modeling that have largely concluded that a fully Bayesian treatment offers limited improvements, our results show that for LVGP modeling it offers significant improvements in prediction accuracy and uncertainty quantification over the plug-in approach.
△ Less
Submitted 19 March, 2023; v1 submitted 3 November, 2022;
originally announced November 2022.
-
Learning to estimate a surrogate respiratory signal from cardiac motion by signal-to-signal translation
Authors:
Akshay Iyer,
Clifford Lindsay,
Hendrik Pretorius,
Michael King
Abstract:
In this work, we develop a neural network-based method to convert a noisy motion signal generated from segmenting rebinned list-mode cardiac SPECT images, to that of a high-quality surrogate signal, such as those seen from external motion tracking systems (EMTs). This synthetic surrogate will be used as input to our pre-existing motion correction technique developed for EMT surrogate signals. In o…
▽ More
In this work, we develop a neural network-based method to convert a noisy motion signal generated from segmenting rebinned list-mode cardiac SPECT images, to that of a high-quality surrogate signal, such as those seen from external motion tracking systems (EMTs). This synthetic surrogate will be used as input to our pre-existing motion correction technique developed for EMT surrogate signals. In our method, we test two families of neural networks to translate noisy internal motion to external surrogate: 1) fully connected networks and 2) convolutional neural networks. Our dataset consists of cardiac perfusion SPECT acquisitions for which cardiac motion was estimated (input: center-of-count-mass - COM signals) in conjunction with a respiratory surrogate motion signal acquired using a commercial Vicon Motion Tracking System (GT: EMT signals). We obtained an average R-score of 0.76 between the predicted surrogate and the EMT signal. Our goal is to lay a foundation to guide the optimization of neural networks for respiratory motion correction from SPECT without the need for an EMT.
△ Less
Submitted 20 July, 2022;
originally announced August 2022.
-
Action Quality Assessment using Transformers
Authors:
Abhay Iyer,
Mohammad Alali,
Hemanth Bodala,
Sunit Vaidya
Abstract:
Action quality assessment (AQA) is an active research problem in video-based applications that is a challenging task due to the score variance per frame. Existing methods address this problem via convolutional-based approaches but suffer from its limitation of effectively capturing long-range dependencies. With the recent advancements in Transformers, we show that they are a suitable alternative t…
▽ More
Action quality assessment (AQA) is an active research problem in video-based applications that is a challenging task due to the score variance per frame. Existing methods address this problem via convolutional-based approaches but suffer from its limitation of effectively capturing long-range dependencies. With the recent advancements in Transformers, we show that they are a suitable alternative to the conventional convolutional-based architectures. Specifically, can transformer-based models solve the task of AQA by effectively capturing long-range dependencies, parallelizing computation, and providing a wider receptive field for diving videos? To demonstrate the effectiveness of our proposed architectures, we conducted comprehensive experiments and achieved a competitive Spearman correlation score of 0.9317. Additionally, we explore the hyperparameters effect on the model's performance and pave a new path for exploiting Transformers in AQA.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
Learning to identify cracks on wind turbine blade surfaces using drone-based inspection images
Authors:
Akshay Iyer,
Linh Nguyen,
Shweta Khushu
Abstract:
Wind energy is expected to be one of the leading ways to achieve the goals of the Paris Agreement but it in turn heavily depends on effective management of its operations and maintenance (O&M) costs. Blade failures account for one-third of all O&M costs thus making accurate detection of blade damages, especially cracks, very important for sustained operations and cost savings. Traditionally, damag…
▽ More
Wind energy is expected to be one of the leading ways to achieve the goals of the Paris Agreement but it in turn heavily depends on effective management of its operations and maintenance (O&M) costs. Blade failures account for one-third of all O&M costs thus making accurate detection of blade damages, especially cracks, very important for sustained operations and cost savings. Traditionally, damage inspection has been a completely manual process thus making it subjective, error-prone, and time-consuming. Hence in this work, we bring more objectivity, scalability, and repeatability in our damage inspection process, using deep learning, to miss fewer cracks. We build a deep learning model trained on a large dataset of blade damages, collected by our drone-based inspection, to correctly detect cracks. Our model is already in production and has processed more than a million damages with a recall of 0.96. We also focus on model interpretability using class activation maps to get a peek into the model workings. The model not only performs as good as human experts but also better in certain tricky cases. Thus, in this work, we aim to increase wind energy adoption by decreasing one of its major hurdles - the O\&M costs resulting from missing blade failures like cracks.
△ Less
Submitted 20 July, 2022;
originally announced July 2022.
-
Uncertainty-Aware Mixed-Variable Machine Learning for Materials Design
Authors:
Hengrui Zhang,
Wei Wayne Chen,
Akshay Iyer,
Daniel W. Apley,
Wei Chen
Abstract:
Data-driven design shows the promise of accelerating materials discovery but is challenging due to the prohibitive cost of searching the vast design space of chemistry, structure, and synthesis methods. Bayesian Optimization (BO) employs uncertainty-aware machine learning models to select promising designs to evaluate, hence reducing the cost. However, BO with mixed numerical and categorical varia…
▽ More
Data-driven design shows the promise of accelerating materials discovery but is challenging due to the prohibitive cost of searching the vast design space of chemistry, structure, and synthesis methods. Bayesian Optimization (BO) employs uncertainty-aware machine learning models to select promising designs to evaluate, hence reducing the cost. However, BO with mixed numerical and categorical variables, which is of particular interest in materials design, has not been well studied. In this work, we survey frequentist and Bayesian approaches to uncertainty quantification of machine learning with mixed variables. We then conduct a systematic comparative study of their performances in BO using a popular representative model from each group, the random forest-based Lolo model (frequentist) and the latent variable Gaussian process model (Bayesian). We examine the efficacy of the two models in the optimization of mathematical functions, as well as properties of structural and functional materials, where we observe performance differences as related to problem dimensionality and complexity. By investigating the machine learning models' predictive and uncertainty estimation capabilities, we provide interpretations of the observed performance differences. Our results provide practical guidance on choosing between frequentist and Bayesian uncertainty-aware machine learning models for mixed-variable BO in materials design.
△ Less
Submitted 4 October, 2022; v1 submitted 11 July, 2022;
originally announced July 2022.
-
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Authors:
Aarohi Srivastava,
Abhinav Rastogi,
Abhishek Rao,
Abu Awal Md Shoeb,
Abubakar Abid,
Adam Fisch,
Adam R. Brown,
Adam Santoro,
Aditya Gupta,
Adrià Garriga-Alonso,
Agnieszka Kluska,
Aitor Lewkowycz,
Akshat Agarwal,
Alethea Power,
Alex Ray,
Alex Warstadt,
Alexander W. Kocurek,
Ali Safaya,
Ali Tazarv,
Alice Xiang,
Alicia Parrish,
Allen Nie,
Aman Hussain,
Amanda Askell,
Amanda Dsouza
, et al. (426 additional authors not shown)
Abstract:
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur…
▽ More
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 450 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.
△ Less
Submitted 12 June, 2023; v1 submitted 9 June, 2022;
originally announced June 2022.
-
Story Beyond the Eye: Glyph Positions Break PDF Text Redaction
Authors:
Maxwell Bland,
Anushya Iyer,
Kirill Levchenko
Abstract:
In this work we find that many current redactions of PDF text are insecure due to non-redacted character positioning information. In particular, subpixel-sized horizontal shifts in redacted and non-redacted characters can be recovered and used to effectively deredact first and last names. Unfortunately these findings affect redactions where the text underneath the black box is removed from the PDF…
▽ More
In this work we find that many current redactions of PDF text are insecure due to non-redacted character positioning information. In particular, subpixel-sized horizontal shifts in redacted and non-redacted characters can be recovered and used to effectively deredact first and last names. Unfortunately these findings affect redactions where the text underneath the black box is removed from the PDF.
We demonstrate these findings by performing a comprehensive vulnerability assessment of common PDF redaction types. We examine 11 popular PDF redaction tools, including Adobe Acrobat, and find that they leak information about redacted text. We also effectively deredact hundreds of real-world PDF redactions, including those found in OIG investigation reports and FOIA responses.
To correct the problem, we have released open source algorithms to fix trivial redactions and reduce the amount of information leaked by nonexcising redactions (where the text underneath the redaction is copy-pastable). We have also notified the developers of the studied redaction tools. We have notified the Office of Inspector General, the Free Law Project, PACER, Adobe, Microsoft, and the US Department of Justice. We are working with several of these groups to prevent our discoveries from being used for malicious purposes.
△ Less
Submitted 13 November, 2022; v1 submitted 5 June, 2022;
originally announced June 2022.
-
An Automated System for Detecting Visual Damages of Wind Turbine Blades
Authors:
Linh Nguyen,
Akshay Iyer,
Shweta Khushu
Abstract:
Wind energy's ability to compete with fossil fuels on a market level depends on lowering wind's high operational costs. Since damages on wind turbine blades are the leading cause for these operational problems, identifying blade damages is critical. However, recent works in visual identification of blade damages are still experimental and focus on optimizing the traditional machine learning metric…
▽ More
Wind energy's ability to compete with fossil fuels on a market level depends on lowering wind's high operational costs. Since damages on wind turbine blades are the leading cause for these operational problems, identifying blade damages is critical. However, recent works in visual identification of blade damages are still experimental and focus on optimizing the traditional machine learning metrics such as IoU. In this paper, we argue that pushing models to production long before achieving the "optimal" model performance can still generate real value for this use case. We discuss the performance of our damage's suggestion model in production and how this system works in coordination with humans as part of a commercialized product and how it can contribute towards lowering wind energy's operational costs.
△ Less
Submitted 22 May, 2022;
originally announced May 2022.
-
Landmarks and Regions: A Robust Approach to Data Extraction
Authors:
Suresh Parthasarathy,
Lincy Pattanaik,
Anirudh Khatry,
Arun Iyer,
Arjun Radhakrishna,
Sriram Rajamani,
Mohammad Raza
Abstract:
We propose a new approach to extracting data items or field values from semi-structured documents. Examples of such problems include extracting passenger name, departure time and departure airport from a travel itinerary, or extracting price of an item from a purchase receipt. Traditional approaches to data extraction use machine learning or program synthesis to process the whole document to extra…
▽ More
We propose a new approach to extracting data items or field values from semi-structured documents. Examples of such problems include extracting passenger name, departure time and departure airport from a travel itinerary, or extracting price of an item from a purchase receipt. Traditional approaches to data extraction use machine learning or program synthesis to process the whole document to extract the desired fields. Such approaches are not robust to format changes in the document, and the extraction process typically fails even if changes are made to parts of the document that are unrelated to the desired fields of interest. We propose a new approach to data extraction based on the concepts of landmarks and regions. Humans routinely use landmarks in manual processing of documents to zoom in and focus their attention on small regions of interest in the document. Inspired by this human intuition, we use the notion of landmarks in program synthesis to automatically synthesize extraction programs that first extract a small region of interest, and then automatically extract the desired value from the region in a subsequent step. We have implemented our landmark-based extraction approach in a tool LRSyn, and show extensive evaluation on documents in HTML as well as scanned images of invoices and receipts. Our results show that our approach is robust to various types of format changes that routinely happen in real-world settings.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
Automatic Facial Paralysis Estimation with Facial Action Units
Authors:
Xuri Ge,
Joemon M. Jose,
Pengcheng Wang,
Arunachalam Iyer,
Xiao Liu,
Hu Han
Abstract:
Facial palsy is unilateral facial nerve weakness or paralysis of rapid onset with unknown causes. Automatically estimating facial palsy severeness can be helpful for the diagnosis and treatment of people suffering from it across the world. In this work, we develop and experiment with a novel model for estimating facial palsy severity. For this, an effective Facial Action Units (AU) detection techn…
▽ More
Facial palsy is unilateral facial nerve weakness or paralysis of rapid onset with unknown causes. Automatically estimating facial palsy severeness can be helpful for the diagnosis and treatment of people suffering from it across the world. In this work, we develop and experiment with a novel model for estimating facial palsy severity. For this, an effective Facial Action Units (AU) detection technique is incorporated into our model, where AUs refer to a unique set of facial muscle movements used to describe almost every anatomically possible facial expression. In this paper, we propose a novel Adaptive Local-Global Relational Network (ALGRNet) for facial AU detection and use it to classify facial paralysis severity. ALGRNet mainly consists of three main novel structures: (i) an adaptive region learning module that learns the adaptive muscle regions based on the detected landmarks; (ii) a skip-BiLSTM that models the latent relationships among local AUs; and (iii) a feature fusion&refining module that investigates the complementary between the local and global face. Quantitative results on two AU benchmarks, i.e., BP4D and DISFA, demonstrate our ALGRNet can achieve promising AU detection accuracy. We further demonstrate the effectiveness of its application to facial paralysis estimation by migrating ALGRNet to a facial paralysis dataset collected and annotated by medical professionals.
△ Less
Submitted 30 March, 2022; v1 submitted 3 March, 2022;
originally announced March 2022.
-
GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge
Authors:
Arthi Padmanabhan,
Neil Agarwal,
Anand Iyer,
Ganesh Ananthanarayanan,
Yuanchao Shu,
Nikolaos Karianakis,
Guoqing Harry Xu,
Ravi Netravali
Abstract:
Video analytics pipelines have steadily shifted to edge deployments to reduce bandwidth overheads and privacy violations, but in doing so, face an ever-growing resource tension. Most notably, edge-box GPUs lack the memory needed to concurrently house the growing number of (increasingly complex) models for real-time inference. Unfortunately, existing solutions that rely on time/space sharing of GPU…
▽ More
Video analytics pipelines have steadily shifted to edge deployments to reduce bandwidth overheads and privacy violations, but in doing so, face an ever-growing resource tension. Most notably, edge-box GPUs lack the memory needed to concurrently house the growing number of (increasingly complex) models for real-time inference. Unfortunately, existing solutions that rely on time/space sharing of GPU resources are insufficient as the required swap** delays result in unacceptable frame drops and accuracy violations. We present model merging, a new memory management technique that exploits architectural similarities between edge vision models by judiciously sharing their layers (including weights) to reduce workload memory costs and swap** delays. Our system, GEMEL, efficiently integrates merging into existing pipelines by (1) leveraging several guiding observations about per-model memory usage and inter-layer dependencies to quickly identify fruitful and accuracy-preserving merging configurations, and (2) altering edge inference schedules to maximize merging benefits. Experiments across diverse workloads reveal that GEMEL reduces memory usage by up to 60.7%, and improves overall accuracy by 8-39% relative to time/space sharing alone.
△ Less
Submitted 4 May, 2022; v1 submitted 19 January, 2022;
originally announced January 2022.
-
Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments
Authors:
Abhiram Iyer,
Karan Grewal,
Akash Velu,
Lucas Oliveira Souza,
Jeremy Forest,
Subutai Ahmad
Abstract:
A key challenge for AI is to build embodied systems that operate in dynamically changing environments. Such systems must adapt to changing task contexts and learn continuously. Although standard deep learning systems achieve state of the art results on static benchmarks, they often struggle in dynamic scenarios. In these settings, error signals from multiple contexts can interfere with one another…
▽ More
A key challenge for AI is to build embodied systems that operate in dynamically changing environments. Such systems must adapt to changing task contexts and learn continuously. Although standard deep learning systems achieve state of the art results on static benchmarks, they often struggle in dynamic scenarios. In these settings, error signals from multiple contexts can interfere with one another, ultimately leading to a phenomenon known as catastrophic forgetting. In this article we investigate biologically inspired architectures as solutions to these problems. Specifically, we show that the biophysical properties of dendrites and local inhibitory systems enable networks to dynamically restrict and route information in a context-specific manner. Our key contributions are as follows. First, we propose a novel artificial neural network architecture that incorporates active dendrites and sparse representations into the standard deep learning framework. Next, we study the performance of this architecture on two separate benchmarks requiring task-based adaptation: Meta-World, a multi-task reinforcement learning environment where a robotic agent must learn to solve a variety of manipulation tasks simultaneously; and a continual learning benchmark in which the model's prediction task changes throughout training. Analysis on both benchmarks demonstrates the emergence of overlap** but distinct and sparse subnetworks, allowing the system to fluidly learn multiple tasks with minimal forgetting. Our neural implementation marks the first time a single architecture has achieved competitive results on both multi-task and continual learning settings. Our research sheds light on how biological properties of neurons can inform deep learning systems to address dynamic scenarios that are typically impossible for traditional ANNs to solve.
△ Less
Submitted 25 April, 2022; v1 submitted 31 December, 2021;
originally announced January 2022.
-
A Piece-wise Polynomial Filtering Approach for Graph Neural Networks
Authors:
Vijay Lingam,
Chanakya Ekbote,
Manan Sharma,
Rahul Ragesh,
Arun Iyer,
Sundararajan Sellamanickam
Abstract:
Graph Neural Networks (GNNs) exploit signals from node features and the input graph topology to improve node classification task performance. However, these models tend to perform poorly on heterophilic graphs, where connected nodes have different labels. Recently proposed GNNs work across graphs having varying levels of homophily. Among these, models relying on polynomial graph filters have shown…
▽ More
Graph Neural Networks (GNNs) exploit signals from node features and the input graph topology to improve node classification task performance. However, these models tend to perform poorly on heterophilic graphs, where connected nodes have different labels. Recently proposed GNNs work across graphs having varying levels of homophily. Among these, models relying on polynomial graph filters have shown promise. We observe that solutions to these polynomial graph filter models are also solutions to an overdetermined system of equations. It suggests that in some instances, the model needs to learn a reasonably high order polynomial. On investigation, we find the proposed models ineffective at learning such polynomials due to their designs. To mitigate this issue, we perform an eigendecomposition of the graph and propose to learn multiple adaptive polynomial filters acting on different subsets of the spectrum. We theoretically and empirically show that our proposed model learns a better filter, thereby improving classification accuracy. We study various aspects of our proposed model including, dependency on the number of eigencomponents utilized, latent polynomial filters learned, and performance of the individual polynomials on the node classification task. We further show that our model is scalable by evaluating over large graphs. Our model achieves performance gains of up to 5% over the state-of-the-art models and outperforms existing polynomial filter-based approaches in general.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Jigsaw: Large Language Models meet Program Synthesis
Authors:
Naman Jain,
Skanda Vaidyanath,
Arun Iyer,
Nagarajan Natarajan,
Suresh Parthasarathy,
Sriram Rajamani,
Rahul Sharma
Abstract:
Large pre-trained language models such as GPT-3, Codex, and Google's language model are now capable of generating code from natural language specifications of programmer intent. We view these developments with a mixture of optimism and caution. On the optimistic side, such large language models have the potential to improve productivity by providing an automated AI pair programmer for every progra…
▽ More
Large pre-trained language models such as GPT-3, Codex, and Google's language model are now capable of generating code from natural language specifications of programmer intent. We view these developments with a mixture of optimism and caution. On the optimistic side, such large language models have the potential to improve productivity by providing an automated AI pair programmer for every programmer in the world. On the cautionary side, since these large language models do not understand program semantics, they offer no guarantees about quality of the suggested code. In this paper, we present an approach to augment these large language models with post-processing steps based on program analysis and synthesis techniques, that understand the syntax and semantics of programs. Further, we show that such techniques can make use of user feedback and improve with usage. We present our experiences from building and evaluating such a tool jigsaw, targeted at synthesizing code for using Python Pandas API using multi-modal inputs. Our experience suggests that as these large language models evolve for synthesizing code from intent, jigsaw has an important role to play in improving the accuracy of the systems.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Eformer: Edge Enhancement based Transformer for Medical Image Denoising
Authors:
Achleshwar Luthra,
Harsh Sulakhe,
Tanish Mittal,
Abhishek Iyer,
Santosh Yadav
Abstract:
In this work, we present Eformer - Edge enhancement based transformer, a novel architecture that builds an encoder-decoder network using transformer blocks for medical image denoising. Non-overlap** window-based self-attention is used in the transformer block that reduces computational requirements. This work further incorporates learnable Sobel-Feldman operators to enhance edges in the image an…
▽ More
In this work, we present Eformer - Edge enhancement based transformer, a novel architecture that builds an encoder-decoder network using transformer blocks for medical image denoising. Non-overlap** window-based self-attention is used in the transformer block that reduces computational requirements. This work further incorporates learnable Sobel-Feldman operators to enhance edges in the image and propose an effective way to concatenate them in the intermediate layers of our architecture. The experimental analysis is conducted by comparing deterministic learning and residual learning for the task of medical image denoising. To defend the effectiveness of our approach, our model is evaluated on the AAPM-Mayo Clinic Low-Dose CT Grand Challenge Dataset and achieves state-of-the-art performance, $i.e.$, 43.487 PSNR, 0.0067 RMSE, and 0.9861 SSIM. We believe that our work will encourage more research in transformer-based architectures for medical image denoising using residual learning.
△ Less
Submitted 9 November, 2021; v1 submitted 16 September, 2021;
originally announced September 2021.
-
Effective Eigendecomposition based Graph Adaptation for Heterophilic Networks
Authors:
Vijay Lingam,
Rahul Ragesh,
Arun Iyer,
Sundararajan Sellamanickam
Abstract:
Graph Neural Networks (GNNs) exhibit excellent performance when graphs have strong homophily property, i.e. connected nodes have the same labels. However, they perform poorly on heterophilic graphs. Several approaches address the issue of heterophily by proposing models that adapt the graph by optimizing task-specific loss function using labelled data. These adaptations are made either via attenti…
▽ More
Graph Neural Networks (GNNs) exhibit excellent performance when graphs have strong homophily property, i.e. connected nodes have the same labels. However, they perform poorly on heterophilic graphs. Several approaches address the issue of heterophily by proposing models that adapt the graph by optimizing task-specific loss function using labelled data. These adaptations are made either via attention or by attenuating or enhancing various low-frequency/high-frequency signals, as needed for the task at hand. More recent approaches adapt the eigenvalues of the graph. One important interpretation of this adaptation is that these models select/weigh the eigenvectors of the graph. Based on this interpretation, we present an eigendecomposition based approach and propose EigenNetwork models that improve the performance of GNNs on heterophilic graphs. Performance improvement is achieved by learning flexible graph adaptation functions that modulate the eigenvalues of the graph. Regularization of these functions via parameter sharing helps to improve the performance even more. Our approach achieves up to 11% improvement in performance over the state-of-the-art methods on heterophilic graphs.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
The Multi-phase spatial meta-heuristic algorithm for public health emergency transportation
Authors:
Fariba Afrin Irany,
Arnav Iyer,
Rubenia Borge Flores,
Armin R. Mikler
Abstract:
The delivery of Medical Countermeasures(MCMs) for mass prophylaxis in the case of a bio-terrorist attack is an active research topic that has interested the research community over the past decades. The objective of this study is to design an efficient algorithm for the Receive Reload and Store Problem(RSS) in which we aim to find feasible routes to deliver MCMs to a target population considering…
▽ More
The delivery of Medical Countermeasures(MCMs) for mass prophylaxis in the case of a bio-terrorist attack is an active research topic that has interested the research community over the past decades. The objective of this study is to design an efficient algorithm for the Receive Reload and Store Problem(RSS) in which we aim to find feasible routes to deliver MCMs to a target population considering time, physical, and human resources, and capacity limitations. For doing this, we adapt the p-median problem to the POD-based emergency response planning procedures and propose an efficient algorithm solution to perform the p-median in reasonable computational time. We present RE-PLAN, the Response PLan Analyzer system that contains some RSS solutions developed at The Center for Computational Epidemiology and Response Analysis (CeCERA) at the University of North Texas. Finally, we analyze a study case where we show how the computational performance of the algorithm can impact the process of decision making and emergency planning in the short and long terms.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
Scalable Gaussian Processes for Data-Driven Design using Big Data with Categorical Factors
Authors:
Liwei Wang,
Suraj Yerramilli,
Akshay Iyer,
Daniel Apley,
** Zhu,
Wei Chen
Abstract:
Scientific and engineering problems often require the use of artificial intelligence to aid understanding and the search for promising designs. While Gaussian processes (GP) stand out as easy-to-use and interpretable learners, they have difficulties in accommodating big datasets, categorical inputs, and multiple responses, which has become a common challenge for a growing number of data-driven des…
▽ More
Scientific and engineering problems often require the use of artificial intelligence to aid understanding and the search for promising designs. While Gaussian processes (GP) stand out as easy-to-use and interpretable learners, they have difficulties in accommodating big datasets, categorical inputs, and multiple responses, which has become a common challenge for a growing number of data-driven design applications. In this paper, we propose a GP model that utilizes latent variables and functions obtained through variational inference to address the aforementioned challenges simultaneously. The method is built upon the latent variable Gaussian process (LVGP) model where categorical factors are mapped into a continuous latent space to enable GP modeling of mixed-variable datasets. By extending variational inference to LVGP models, the large training dataset is replaced by a small set of inducing points to address the scalability issue. Output response vectors are represented by a linear combination of independent latent functions, forming a flexible kernel structure to handle multiple responses that might have distinct behaviors. Comparative studies demonstrate that the proposed method scales well for large datasets with over 10^4 data points, while outperforming state-of-the-art machine learning methods without requiring much hyperparameter tuning. In addition, an interpretable latent space is obtained to draw insights into the effect of categorical factors, such as those associated with building blocks of architectures and element choices in metamaterial and materials design. Our approach is demonstrated for machine learning of ternary oxide materials and topology optimization of a multiscale compliant mechanism with aperiodic microstructures and multiple materials.
△ Less
Submitted 29 June, 2021; v1 submitted 25 June, 2021;
originally announced June 2021.
-
Simple Truncated SVD based Model for Node Classification on Heterophilic Graphs
Authors:
Vijay Lingam,
Rahul Ragesh,
Arun Iyer,
Sundararajan Sellamanickam
Abstract:
Graph Neural Networks (GNNs) have shown excellent performance on graphs that exhibit strong homophily with respect to the node labels i.e. connected nodes have same labels. However, they perform poorly on heterophilic graphs. Recent approaches have typically modified aggregation schemes, designed adaptive graph filters, etc. to address this limitation. In spite of this, the performance on heteroph…
▽ More
Graph Neural Networks (GNNs) have shown excellent performance on graphs that exhibit strong homophily with respect to the node labels i.e. connected nodes have same labels. However, they perform poorly on heterophilic graphs. Recent approaches have typically modified aggregation schemes, designed adaptive graph filters, etc. to address this limitation. In spite of this, the performance on heterophilic graphs can still be poor. We propose a simple alternative method that exploits Truncated Singular Value Decomposition (TSVD) of topological structure and node features. Our approach achieves up to ~30% improvement in performance over state-of-the-art methods on heterophilic graphs. This work is an early investigation into methods that differ from aggregation based approaches. Our experimental results suggest that it might be important to explore other alternatives to aggregation methods for heterophilic setting.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
The Design of the User Interfaces for Privacy Enhancements for Android
Authors:
Jason I. Hong,
Yuvraj Agarwal,
Matt Fredrikson,
Mike Czapik,
Shawn Hanna,
Swarup Sahoo,
Judy Chun,
Won-Woo Chung,
Aniruddh Iyer,
Ally Liu,
Shen Lu,
Rituparna Roychoudhury,
Qian Wang,
Shan Wang,
Siqi Wang,
Vida Zhang,
Jessica Zhao,
Yuan Jiang,
Haojian **,
Sam Kim,
Evelyn Kuo,
Tianshi Li,
**** Liu,
Yile Liu,
Robert Zhang
Abstract:
We present the design and design rationale for the user interfaces for Privacy Enhancements for Android (PE for Android). These UIs are built around two core ideas, namely that developers should explicitly declare the purpose of why sensitive data is being used, and these permission-purpose pairs should be split by first party and third party uses. We also present a taxonomy of purposes and ways o…
▽ More
We present the design and design rationale for the user interfaces for Privacy Enhancements for Android (PE for Android). These UIs are built around two core ideas, namely that developers should explicitly declare the purpose of why sensitive data is being used, and these permission-purpose pairs should be split by first party and third party uses. We also present a taxonomy of purposes and ways of how these ideas can be deployed in the existing Android ecosystem.
△ Less
Submitted 24 April, 2021;
originally announced April 2021.
-
GLAM: Graph Learning by Modeling Affinity to Labeled Nodes for Graph Neural Networks
Authors:
Vijay Lingam,
Arun Iyer,
Rahul Ragesh
Abstract:
Graph Neural Networks have shown excellent performance on semi-supervised classification tasks. However, they assume access to a graph that may not be often available in practice. In the absence of any graph, constructing k-Nearest Neighbor (kNN) graphs from the given data have shown to give improvements when used with GNNs over other semi-supervised methods. This paper proposes a semi-supervised…
▽ More
Graph Neural Networks have shown excellent performance on semi-supervised classification tasks. However, they assume access to a graph that may not be often available in practice. In the absence of any graph, constructing k-Nearest Neighbor (kNN) graphs from the given data have shown to give improvements when used with GNNs over other semi-supervised methods. This paper proposes a semi-supervised graph learning method for cases when there are no graphs available. This method learns a graph as a convex combination of the unsupervised kNN graph and a supervised label-affinity graph. The label-affinity graph directly captures all the nodes' label-affinity with the labeled nodes, i.e., how likely a node has the same label as the labeled nodes. This affinity measure contrasts with the kNN graph where the metric measures closeness in the feature space. Our experiments suggest that this approach gives close to or better performance (up to 1.5%), while being simpler and faster (up to 70x) to train, than state-of-the-art graph learning methods. We also conduct several experiments to highlight the importance of individual components and contrast them with state-of-the-art methods.
△ Less
Submitted 20 February, 2021;
originally announced February 2021.
-
User Embedding based Neighborhood Aggregation Method for Inductive Recommendation
Authors:
Rahul Ragesh,
Sundararajan Sellamanickam,
Vijay Lingam,
Arun Iyer,
Ramakrishna Bairi
Abstract:
We consider the problem of learning latent features (aka embedding) for users and items in a recommendation setting. Given only a user-item interaction graph, the goal is to recommend items for each user. Traditional approaches employ matrix factorization-based collaborative filtering methods. Recent methods using graph convolutional networks (e.g., LightGCN) achieve state-of-the-art performance.…
▽ More
We consider the problem of learning latent features (aka embedding) for users and items in a recommendation setting. Given only a user-item interaction graph, the goal is to recommend items for each user. Traditional approaches employ matrix factorization-based collaborative filtering methods. Recent methods using graph convolutional networks (e.g., LightGCN) achieve state-of-the-art performance. They learn both user and item embedding. One major drawback of most existing methods is that they are not inductive; they do not generalize for users and items unseen during training. Besides, existing network models are quite complex, difficult to train and scale. Motivated by LightGCN, we propose a graph convolutional network modeling approach for collaborative filtering CF-GCN. We solely learn user embedding and derive item embedding using light variant CF-LGCN-U performing neighborhood aggregation, making it scalable due to reduced model complexity. CF-LGCN-U models naturally possess the inductive capability for new items, and we propose a simple solution to generalize for new users. We show how the proposed models are related to LightGCN. As a by-product, we suggest a simple solution to make LightGCN inductive. We perform comprehensive experiments on several benchmark datasets and demonstrate the capabilities of the proposed approach. Experimental results show that similar or better generalization performance is achievable than the state of the art methods in both transductive and inductive settings.
△ Less
Submitted 16 February, 2021; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Cost-effective Machine Learning Inference Offload for Edge Computing
Authors:
Christian Makaya,
Amalendu Iyer,
Jonathan Salfity,
Madhu Athreya,
M Anthony Lewis
Abstract:
Computing at the edge is increasingly important since a massive amount of data is generated. This poses challenges in transporting all that data to the remote data centers and cloud, where they can be processed and analyzed. On the other hand, harnessing the edge data is essential for offering data-driven and machine learning-based applications, if the challenges, such as device capabilities, conn…
▽ More
Computing at the edge is increasingly important since a massive amount of data is generated. This poses challenges in transporting all that data to the remote data centers and cloud, where they can be processed and analyzed. On the other hand, harnessing the edge data is essential for offering data-driven and machine learning-based applications, if the challenges, such as device capabilities, connectivity, and heterogeneity can be mitigated. Machine learning applications are very compute-intensive and require processing of large amount of data. However, edge devices are often resources-constrained, in terms of compute resources, power, storage, and network connectivity. Hence, limiting their potential to run efficiently and accurately state-of-the art deep neural network (DNN) models, which are becoming larger and more complex. This paper proposes a novel offloading mechanism by leveraging installed-base on-premises (edge) computational resources. The proposed mechanism allows the edge devices to offload heavy and compute-intensive workloads to edge nodes instead of using remote cloud. Our offloading mechanism has been prototyped and tested with state-of-the art person and object detection DNN models for mobile robots and video surveillance applications. The performance shows a significant gain compared to cloud-based offloading strategies in terms of accuracy and latency.
△ Less
Submitted 7 December, 2020;
originally announced December 2020.
-
HeteGCN: Heterogeneous Graph Convolutional Networks for Text Classification
Authors:
Rahul Ragesh,
Sundararajan Sellamanickam,
Arun Iyer,
Ram Bairi,
Vijay Lingam
Abstract:
We consider the problem of learning efficient and inductive graph convolutional networks for text classification with a large number of examples and features. Existing state-of-the-art graph embedding based methods such as predictive text embedding (PTE) and TextGCN have shortcomings in terms of predictive performance, scalability and inductive capability. To address these limitations, we propose…
▽ More
We consider the problem of learning efficient and inductive graph convolutional networks for text classification with a large number of examples and features. Existing state-of-the-art graph embedding based methods such as predictive text embedding (PTE) and TextGCN have shortcomings in terms of predictive performance, scalability and inductive capability. To address these limitations, we propose a heterogeneous graph convolutional network (HeteGCN) modeling approach that unites the best aspects of PTE and TextGCN together. The main idea is to learn feature embeddings and derive document embeddings using a HeteGCN architecture with different graphs used across layers. We simplify TextGCN by dissecting into several HeteGCN models which (a) helps to study the usefulness of individual models and (b) offers flexibility in fusing learned embeddings from different models. In effect, the number of model parameters is reduced significantly, enabling faster training and improving performance in small labeled training set scenario. Our detailed experimental studies demonstrate the efficacy of the proposed approach.
△ Less
Submitted 19 August, 2020;
originally announced August 2020.
-
Collision Avoidance Robotics Via Meta-Learning (CARML)
Authors:
Abhiram Iyer,
Aravind Mahadevan
Abstract:
This paper presents an approach to exploring a multi-objective reinforcement learning problem with Model-Agnostic Meta-Learning. The environment we used consists of a 2D vehicle equipped with a LIDAR sensor. The goal of the environment is to reach some pre-determined target location but also effectively avoid any obstacles it may find along its path. We also compare this approach against a baselin…
▽ More
This paper presents an approach to exploring a multi-objective reinforcement learning problem with Model-Agnostic Meta-Learning. The environment we used consists of a 2D vehicle equipped with a LIDAR sensor. The goal of the environment is to reach some pre-determined target location but also effectively avoid any obstacles it may find along its path. We also compare this approach against a baseline TD3 solution that attempts to solve the same problem.
△ Less
Submitted 16 July, 2020;
originally announced July 2020.
-
A Graph Convolutional Network Composition Framework for Semi-supervised Classification
Authors:
Rahul Ragesh,
Sundararajan Sellamanickam,
Vijay Lingam,
Arun Iyer
Abstract:
Graph convolutional networks (GCNs) have gained popularity due to high performance achievable on several downstream tasks including node classification. Several architectural variants of these networks have been proposed and investigated with experimental studies in the literature. Motivated by a recent work on simplifying GCNs, we study the problem of designing other variants and propose a framew…
▽ More
Graph convolutional networks (GCNs) have gained popularity due to high performance achievable on several downstream tasks including node classification. Several architectural variants of these networks have been proposed and investigated with experimental studies in the literature. Motivated by a recent work on simplifying GCNs, we study the problem of designing other variants and propose a framework to compose networks using building blocks of GCN. The framework offers flexibility to compose and evaluate different networks using feature and/or label propagation networks, linear or non-linear networks, with each composition having different computational complexity. We conduct a detailed experimental study on several benchmark datasets with many variants and present observations from our evaluation. Our empirical experimental results suggest that several newly composed variants are useful alternatives to consider because they are as competitive as, or better than the original GCN.
△ Less
Submitted 8 April, 2020;
originally announced April 2020.
-
A Conditional Generative Model for Predicting Material Microstructures from Processing Methods
Authors:
Akshay Iyer,
Biswadip Dey,
Arindam Dasgupta,
Wei Chen,
Amit Chakraborty
Abstract:
Microstructures of a material form the bridge linking processing conditions - which can be controlled, to the material property - which is the primary interest in engineering applications. Thus a critical task in material design is establishing the processing-structure relationship, which requires domain expertise and techniques that can model the high-dimensional material microstructure. This wor…
▽ More
Microstructures of a material form the bridge linking processing conditions - which can be controlled, to the material property - which is the primary interest in engineering applications. Thus a critical task in material design is establishing the processing-structure relationship, which requires domain expertise and techniques that can model the high-dimensional material microstructure. This work proposes a deep learning based approach that models the processing-structure relationship as a conditional image synthesis problem. In particular, we develop an auxiliary classifier Wasserstein GAN with gradient penalty (ACWGAN-GP) to synthesize microstructures under a given processing condition. This approach is free of feature engineering, requires modest domain knowledge and is applicable to a wide range of material systems. We demonstrate this approach using the ultra high carbon steel (UHCS) database, where each microstructure is annotated with a label describing the cooling method it was subjected to. Our results show that ACWGAN-GP can synthesize high-quality multiphase microstructures for a given cooling method.
△ Less
Submitted 4 October, 2019;
originally announced October 2019.
-
Data-Centric Mixed-Variable Bayesian Optimization For Materials Design
Authors:
Akshay Iyer,
Yichi Zhang,
Aditya Prasad,
Siyu Tao,
Yixing Wang,
Linda Schadler,
L Catherine Brinson,
Wei Chen
Abstract:
Materials design can be cast as an optimization problem with the goal of achieving desired properties, by varying material composition, microstructure morphology, and processing conditions. Existence of both qualitative and quantitative material design variables leads to disjointed regions in property space, making the search for optimal design challenging. Limited availability of experimental dat…
▽ More
Materials design can be cast as an optimization problem with the goal of achieving desired properties, by varying material composition, microstructure morphology, and processing conditions. Existence of both qualitative and quantitative material design variables leads to disjointed regions in property space, making the search for optimal design challenging. Limited availability of experimental data and the high cost of simulations magnify the challenge. This situation calls for design methodologies that can extract useful information from existing data and guide the search for optimal designs efficiently. To this end, we present a data-centric, mixed-variable Bayesian Optimization framework that integrates data from literature, experiments, and simulations for knowledge discovery and computational materials design. Our framework pivots around the Latent Variable Gaussian Process (LVGP), a novel Gaussian Process technique which projects qualitative variables on a continuous latent space for covariance formulation, as the surrogate model to quantify "lack of data" uncertainty. Expected improvement, an acquisition criterion that balances exploration and exploitation, helps navigate a complex, nonlinear design space to locate the optimum design. The proposed framework is tested through a case study which seeks to concurrently identify the optimal composition and morphology for insulating polymer nanocomposites. We also present an extension of mixed-variable Bayesian Optimization for multiple objectives to identify the Pareto Frontier within tens of iterations. These findings project Bayesian Optimization as a powerful tool for design of engineered material systems.
△ Less
Submitted 4 July, 2019;
originally announced July 2019.