-
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Authors:
Omar Khattab,
Arnav Singhvi,
Paridhi Maheshwari,
Zhiyuan Zhang,
Keshav Santhanam,
Sri Vardhamanan,
Saiful Haq,
Ashutosh Sharma,
Thomas T. Joshi,
Hanna Moazam,
Heather Miller,
Matei Zaharia,
Christopher Potts
Abstract:
The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded "prompt templates", i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for develo** and optimizing LM pipelines, we introduce DSPy, a…
▽ More
The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded "prompt templates", i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for develo** and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, i.e. imperative computational graphs where LMs are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn (by creating and collecting demonstrations) how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. We design a compiler that will optimize any DSPy pipeline to maximize a given metric. We conduct two case studies, showing that succinct DSPy programs can express and optimize sophisticated LM pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot prompting (generally by over 25% and 65%, respectively) and pipelines with expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top of that, DSPy programs compiled to open and relatively small LMs like 770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at https://github.com/stanfordnlp/dspy
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Document Automation Architectures: Updated Survey in Light of Large Language Models
Authors:
Mohammad Ahmadi Achachlouei,
Omkar Patil,
Tarun Joshi,
Vijayan N. Nair
Abstract:
This paper surveys the current state of the art in document automation (DA). The objective of DA is to reduce the manual effort during the generation of documents by automatically creating and integrating input from different sources and assembling documents conforming to defined templates. There have been reviews of commercial solutions of DA, particularly in the legal domain, but to date there h…
▽ More
This paper surveys the current state of the art in document automation (DA). The objective of DA is to reduce the manual effort during the generation of documents by automatically creating and integrating input from different sources and assembling documents conforming to defined templates. There have been reviews of commercial solutions of DA, particularly in the legal domain, but to date there has been no comprehensive review of the academic research on DA architectures and technologies. The current survey of DA reviews the academic literature and provides a clearer definition and characterization of DA and its features, identifies state-of-the-art DA architectures and technologies in academic research, and provides ideas that can lead to new research opportunities within the DA field in light of recent advances in generative AI and large language models.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Understanding Metrics for Paraphrasing
Authors:
Omkar Patil,
Rahul Singh,
Tarun Joshi
Abstract:
Paraphrase generation is a difficult problem. This is not only because of the limitations in text generation capabilities but also due that to the lack of a proper definition of what qualifies as a paraphrase and corresponding metrics to measure how good it is. Metrics for evaluation of paraphrasing quality is an on going research problem. Most of the existing metrics in use having been borrowed f…
▽ More
Paraphrase generation is a difficult problem. This is not only because of the limitations in text generation capabilities but also due that to the lack of a proper definition of what qualifies as a paraphrase and corresponding metrics to measure how good it is. Metrics for evaluation of paraphrasing quality is an on going research problem. Most of the existing metrics in use having been borrowed from other tasks do not capture the complete essence of a good paraphrase, and often fail at borderline-cases. In this work, we propose a novel metric $ROUGE_P$ to measure the quality of paraphrases along the dimensions of adequacy, novelty and fluency. We also provide empirical evidence to show that the current natural language generation metrics are insufficient to measure these desired properties of a good paraphrase. We look at paraphrase model fine-tuning and generation from the lens of metrics to gain a deeper understanding of what it takes to generate and evaluate a good paraphrase.
△ Less
Submitted 25 May, 2022;
originally announced May 2022.
-
TASAC: a twin-actor reinforcement learning framework with stochastic policy for batch process control
Authors:
Tanuja Joshi,
Hariprasad Kodamana,
Harikumar Kandath,
Niket Kaisare
Abstract:
Due to their complex nonlinear dynamics and batch-to-batch variability, batch processes pose a challenge for process control. Due to the absence of accurate models and resulting plant-model mismatch, these problems become harder to address for advanced model-based control strategies. Reinforcement Learning (RL), wherein an agent learns the policy by directly interacting with the environment, offer…
▽ More
Due to their complex nonlinear dynamics and batch-to-batch variability, batch processes pose a challenge for process control. Due to the absence of accurate models and resulting plant-model mismatch, these problems become harder to address for advanced model-based control strategies. Reinforcement Learning (RL), wherein an agent learns the policy by directly interacting with the environment, offers a potential alternative in this context. RL frameworks with actor-critic architecture have recently become popular for controlling systems where state and action spaces are continuous. It has been shown that an ensemble of actor and critic networks further helps the agent learn better policies due to the enhanced exploration due to simultaneous policy learning. To this end, the current study proposes a stochastic actor-critic RL algorithm, termed Twin Actor Soft Actor-Critic (TASAC), by incorporating an ensemble of actors for learning, in a maximum entropy framework, for batch process control.
△ Less
Submitted 2 May, 2022; v1 submitted 22 April, 2022;
originally announced April 2022.
-
Pruning Attention Heads of Transformer Models Using A* Search: A Novel Approach to Compress Big NLP Architectures
Authors:
Archit Parnami,
Rahul Singh,
Tarun Joshi
Abstract:
Recent years have seen a growing adoption of Transformer models such as BERT in Natural Language Processing and even in Computer Vision. However, due to their size, there has been limited adoption of such models within resource-constrained computing environments. This paper proposes novel pruning algorithm to compress transformer models by eliminating redundant Attention Heads. We apply the A* sea…
▽ More
Recent years have seen a growing adoption of Transformer models such as BERT in Natural Language Processing and even in Computer Vision. However, due to their size, there has been limited adoption of such models within resource-constrained computing environments. This paper proposes novel pruning algorithm to compress transformer models by eliminating redundant Attention Heads. We apply the A* search algorithm to obtain a pruned model with strict accuracy guarantees. Our results indicate that the method could eliminate as much as 40% of the attention heads in the BERT transformer model with no loss in accuracy.
△ Less
Submitted 17 November, 2021; v1 submitted 28 October, 2021;
originally announced October 2021.
-
Document Automation Architectures and Technologies: A Survey
Authors:
Mohammad Ahmadi Achachlouei,
Omkar Patil,
Tarun Joshi,
Vijayan N. Nair
Abstract:
This paper surveys the current state of the art in document automation (DA). The objective of DA is to reduce the manual effort during the generation of documents by automatically integrating input from different sources and assembling documents conforming to defined templates. There have been reviews of commercial solutions of DA, particularly in the legal domain, but to date there has been no co…
▽ More
This paper surveys the current state of the art in document automation (DA). The objective of DA is to reduce the manual effort during the generation of documents by automatically integrating input from different sources and assembling documents conforming to defined templates. There have been reviews of commercial solutions of DA, particularly in the legal domain, but to date there has been no comprehensive review of the academic research on DA architectures and technologies. The current survey of DA reviews the academic literature and provides a clearer definition and characterization of DA and its features, identifies state-of-the-art DA architectures and technologies in academic research, and provides ideas that can lead to new research opportunities within the DA field in light of recent advances in artificial intelligence and deep neural networks.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Finding top performers through email patterns analysis
Authors:
Q. Wen,
P. A. Gloor,
A. Fronzetti Colladon,
P. Tickoo,
T. Joshi
Abstract:
In the information economy, individuals' work performance is closely associated with their digital communication strategies. This study combines social network and semantic analysis to develop a method to identify top performers based on email communication. By reviewing existing literature, we identified the indicators that quantify email communication into measurable dimensions. To empirically e…
▽ More
In the information economy, individuals' work performance is closely associated with their digital communication strategies. This study combines social network and semantic analysis to develop a method to identify top performers based on email communication. By reviewing existing literature, we identified the indicators that quantify email communication into measurable dimensions. To empirically examine the predictive power of the proposed indicators, we collected 2 million email archive of 578 executives in an international service company. Panel regression was employed to derive interpretable association between email indicators and top performance. The results suggest that top performers tend to assume central network positions and have high responsiveness to emails. In email contents, top performers use more positive and complex language, with low emotionality, but rich in influential words that are probably reused by co-workers. To better explore the predictive power of the email indicators, we employed AdaBoost machine learning models, which achieved 83.56% accuracy in identifying top performers. With cluster analysis, we further find three categories of top performers, "networkers" with central network positions, "influencers" with influential ideas and "positivists" with positive sentiments. The findings suggest that top performers have distinctive email communication patterns, laying the foundation for grounding email communication competence in theory. The proposed email analysis method also provides a tool to evaluate the different types of individual communication styles.
△ Less
Submitted 27 May, 2021;
originally announced May 2021.
-
Self-interpretable Convolutional Neural Networks for Text Classification
Authors:
Wei Zhao,
Rahul Singh,
Tarun Joshi,
Agus Sudjianto,
Vijayan N. Nair
Abstract:
Deep learning models for natural language processing (NLP) are inherently complex and often viewed as black box in nature. This paper develops an approach for interpreting convolutional neural networks for text classification problems by exploiting the local-linear models inherent in ReLU-DNNs. The CNN model combines the word embedding through convolutional layers, filters them using max-pooling,…
▽ More
Deep learning models for natural language processing (NLP) are inherently complex and often viewed as black box in nature. This paper develops an approach for interpreting convolutional neural networks for text classification problems by exploiting the local-linear models inherent in ReLU-DNNs. The CNN model combines the word embedding through convolutional layers, filters them using max-pooling, and optimizes using a ReLU-DNN for classification. To get an overall self-interpretable model, the system of local linear models from the ReLU DNN are mapped back through the max-pool filter to the appropriate n-grams. Our results on experimental datasets demonstrate that our proposed technique produce parsimonious models that are self-interpretable and have comparable performance with respect to a more complex CNN model. We also study the impact of the complexity of the convolutional layers and the classification layers on the model performance.
△ Less
Submitted 8 July, 2021; v1 submitted 18 May, 2021;
originally announced May 2021.
-
Robustness Tests of NLP Machine Learning Models: Search and Semantically Replace
Authors:
Rahul Singh,
Karan **dal,
Yufei Yu,
Hanyu Yang,
Tarun Joshi,
Matthew A. Campbell,
Wayne B. Shoumaker
Abstract:
This paper proposes a strategy to assess the robustness of different machine learning models that involve natural language processing (NLP). The overall approach relies upon a Search and Semantically Replace strategy that consists of two steps: (1) Search, which identifies important parts in the text; (2) Semantically Replace, which finds replacements for the important parts, and constrains the re…
▽ More
This paper proposes a strategy to assess the robustness of different machine learning models that involve natural language processing (NLP). The overall approach relies upon a Search and Semantically Replace strategy that consists of two steps: (1) Search, which identifies important parts in the text; (2) Semantically Replace, which finds replacements for the important parts, and constrains the replaced tokens with semantically similar words. We introduce different types of Search and Semantically Replace methods designed specifically for particular types of machine learning models. We also investigate the effectiveness of this strategy and provide a general framework to assess a variety of machine learning models. Finally, an empirical comparison is provided of robustness performance among three different model types, each with a different text representation.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
KBCNMUJAL@HASOC-Dravidian-CodeMix-FIRE2020: Using Machine Learning for Detection of Hate Speech and Offensive Code-Mixed Social Media text
Authors:
Varsha Pathak,
Manish Joshi,
Prasad Joshi,
Monica Mundada,
Tanmay Joshi
Abstract:
This paper describes the system submitted by our team, KBCNMUJAL, for Task 2 of the shared task Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC), at Forum for Information Retrieval Evaluation, December 16-20, 2020, Hyderabad, India. The datasets of two Dravidian languages Viz. Malayalam and Tamil of size 4000 observations, each were shared by the HASOC organizers…
▽ More
This paper describes the system submitted by our team, KBCNMUJAL, for Task 2 of the shared task Hate Speech and Offensive Content Identification in Indo-European Languages (HASOC), at Forum for Information Retrieval Evaluation, December 16-20, 2020, Hyderabad, India. The datasets of two Dravidian languages Viz. Malayalam and Tamil of size 4000 observations, each were shared by the HASOC organizers. These datasets are used to train the machine using different machine learning algorithms, based on classification and regression models. The datasets consist of tweets or YouTube comments with two class labels offensive and not offensive. The machine is trained to classify such social media messages in these two categories. Appropriate n-gram feature sets are extracted to learn the specific characteristics of the Hate Speech text messages. These feature models are based on TFIDF weights of n-gram. The referred work and respective experiments show that the features such as word, character and combined model of word and character n-grams could be used to identify the term patterns of offensive text contents. As a part of the HASOC shared task, the test data sets are made available by the HASOC track organizers. The best performing classification models developed for both languages are applied on test datasets. The model which gives the highest accuracy result on training dataset for Malayalam language was experimented to predict the categories of respective test data. This system has obtained an F1 score of 0.77. Similarly the best performing model for Tamil language has obtained an F1 score of 0.87. This work has received 2nd and 3rd rank in this shared Task 2 for Malayalam and Tamil language respectively. The proposed system is named HASOC_kbcnmujal.
△ Less
Submitted 19 February, 2021;
originally announced February 2021.
-
Recent Trends in the Use of Deep Learning Models for Grammar Error Handling
Authors:
Mina Naghshnejad,
Tarun Joshi,
Vijayan N. Nair
Abstract:
Grammar error handling (GEH) is an important topic in natural language processing (NLP). GEH includes both grammar error detection and grammar error correction. Recent advances in computation systems have promoted the use of deep learning (DL) models for NLP problems such as GEH. In this survey we focus on two main DL approaches for GEH: neural machine translation models and editor models. We desc…
▽ More
Grammar error handling (GEH) is an important topic in natural language processing (NLP). GEH includes both grammar error detection and grammar error correction. Recent advances in computation systems have promoted the use of deep learning (DL) models for NLP problems such as GEH. In this survey we focus on two main DL approaches for GEH: neural machine translation models and editor models. We describe the three main stages of the pipeline for these models: data preparation, training, and inference. Additionally, we discuss different techniques to improve the performance of these models at each stage of the pipeline. We compare the performance of different models and conclude with proposed future directions.
△ Less
Submitted 4 September, 2020;
originally announced September 2020.
-
SHAP values for Explaining CNN-based Text Classification Models
Authors:
Wei Zhao,
Tarun Joshi,
Vijayan N. Nair,
Agus Sudjianto
Abstract:
Deep neural networks are increasingly used in natural language processing (NLP) models. However, the need to interpret and explain the results from complex algorithms are limiting their widespread adoption in regulated industries such as banking. There has been recent work on interpretability of machine learning algorithms with structured data. But there are only limited techniques for NLP applica…
▽ More
Deep neural networks are increasingly used in natural language processing (NLP) models. However, the need to interpret and explain the results from complex algorithms are limiting their widespread adoption in regulated industries such as banking. There has been recent work on interpretability of machine learning algorithms with structured data. But there are only limited techniques for NLP applications where the problem is more challenging due to the size of the vocabulary, high-dimensional nature, and the need to consider textual coherence and language structure. This paper develops a methodology to compute SHAP values for local explainability of CNN-based text classification models. The approach is also extended to compute global scores to assess the importance of features. The results are illustrated on sentiment analysis of Amazon Electronic Review data.
△ Less
Submitted 8 July, 2021; v1 submitted 26 August, 2020;
originally announced August 2020.
-
Model Robustness with Text Classification: Semantic-preserving adversarial attacks
Authors:
Rahul Singh,
Tarun Joshi,
Vijayan N. Nair,
Agus Sudjianto
Abstract:
We propose algorithms to create adversarial attacks to assess model robustness in text classification problems. They can be used to create white box attacks and black box attacks while at the same time preserving the semantics and syntax of the original text. The attacks cause significant number of flips in white-box setting and same rule based can be used in black-box setting. In a black-box sett…
▽ More
We propose algorithms to create adversarial attacks to assess model robustness in text classification problems. They can be used to create white box attacks and black box attacks while at the same time preserving the semantics and syntax of the original text. The attacks cause significant number of flips in white-box setting and same rule based can be used in black-box setting. In a black-box setting, the attacks created are able to reverse decisions of transformer based architectures.
△ Less
Submitted 13 August, 2020; v1 submitted 12 August, 2020;
originally announced August 2020.