-
Interpretable Cross-Examination Technique (ICE-T): Using highly informative features to boost LLM performance
Authors:
Goran Muric,
Ben Delay,
Steven Minton
Abstract:
In this paper, we introduce the Interpretable Cross-Examination Technique (ICE-T), a novel approach that leverages structured multi-prompt techniques with Large Language Models (LLMs) to improve classification performance over zero-shot and few-shot methods. In domains where interpretability is crucial, such as medicine and law, standard models often fall short due to their "black-box" nature. ICE…
▽ More
In this paper, we introduce the Interpretable Cross-Examination Technique (ICE-T), a novel approach that leverages structured multi-prompt techniques with Large Language Models (LLMs) to improve classification performance over zero-shot and few-shot methods. In domains where interpretability is crucial, such as medicine and law, standard models often fall short due to their "black-box" nature. ICE-T addresses these limitations by using a series of generated prompts that allow an LLM to approach the problem from multiple directions. The responses from the LLM are then converted into numerical feature vectors and processed by a traditional classifier. This method not only maintains high interpretability but also allows for smaller, less capable models to achieve or exceed the performance of larger, more advanced models under zero-shot conditions. We demonstrate the effectiveness of ICE-T across a diverse set of data sources, including medical records and legal documents, consistently surpassing the zero-shot baseline in terms of classification metrics such as F1 scores. Our results indicate that ICE-T can be used for improving both the performance and transparency of AI applications in complex decision-making environments.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Active Learning with Multiple Views
Authors:
C. A. Knoblock,
S. Minton,
I. Muslea
Abstract:
Active learners alleviate the burden of labeling large amounts of data by detecting and asking the user to label only the most informative examples in the domain. We focus here on active learning for multi-view domains, in which there are several disjoint subsets of features (views), each of which is sufficient to learn the target concept. In this paper we make several contributions. First, we i…
▽ More
Active learners alleviate the burden of labeling large amounts of data by detecting and asking the user to label only the most informative examples in the domain. We focus here on active learning for multi-view domains, in which there are several disjoint subsets of features (views), each of which is sufficient to learn the target concept. In this paper we make several contributions. First, we introduce Co-Testing, which is the first approach to multi-view active learning. Second, we extend the multi-view learning framework by also exploiting weak views, which are adequate only for learning a concept that is more general/specific than the target concept. Finally, we empirically show that Co-Testing outperforms existing active learners on a variety of real world domains such as wrapper induction, Web page classification, advertisement removal, and discourse tree parsing.
△ Less
Submitted 5 October, 2011;
originally announced October 2011.
-
Wrapper Maintenance: A Machine Learning Approach
Authors:
C. A. Knoblock,
K. Lerman,
S. N. Minton
Abstract:
The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrapp…
▽ More
The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe how this information can be used for two wrapper maintenance applications: wrapper verification and reinduction. The wrapper verification system detects when a wrapper is not extracting correct data, usually because the Web source has changed its format. The reinduction algorithm automatically recovers from changes in the Web source by identifying data on Web pages so that a new wrapper may be generated for this source. To validate our approach, we monitored 27 wrappers over a period of a year. The verification algorithm correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes, resulting in precision of 0.73 and recall of 0.95. We validated the reinduction algorithm on ten Web sources. We were able to successfully reinduce the wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data extraction task.
△ Less
Submitted 23 June, 2011;
originally announced June 2011.
-
Interactive Data Integration through Smart Copy & Paste
Authors:
Zachary Ives,
Craig Knoblock,
Steve Minton,
Marie Jacob,
Partha Talukdar,
Rattapoom Tuchinda,
Jose Luis Ambite,
Maria Muslea,
Cenk Gazen
Abstract:
In many scenarios, such as emergency response or ad hoc collaboration, it is critical to reduce the overhead in integrating data. Ideally, one could perform the entire process interactively under one unified interface: defining extractors and wrappers for sources, creating a mediated schema, and adding schema map**s ? while seeing how these impact the integrated view of the data, and refining…
▽ More
In many scenarios, such as emergency response or ad hoc collaboration, it is critical to reduce the overhead in integrating data. Ideally, one could perform the entire process interactively under one unified interface: defining extractors and wrappers for sources, creating a mediated schema, and adding schema map**s ? while seeing how these impact the integrated view of the data, and refining the design accordingly.
We propose a novel smart copy and paste (SCP) model and architecture for seamlessly combining the design-time and run-time aspects of data integration, and we describe an initial prototype, the CopyCat system. In CopyCat, the user does not need special tools for the different stages of integration: instead, the system watches as the user copies data from applications (including the Web browser) and pastes them into CopyCat?s spreadsheet-like workspace. CopyCat generalizes these actions and presents proposed auto-completions, each with an explanation in the form of provenance. The user provides feedback on these suggestions ? through either direct interactions or further copy-and-paste operations ? and the system learns from this feedback. This paper provides an overview of our prototype system, and identifies key research challenges in achieving SCP in its full generality.
△ Less
Submitted 9 September, 2009;
originally announced September 2009.
-
Total-Order and Partial-Order Planning: A Comparative Analysis
Authors:
S. Minton,
J. Bresina,
M. Drummond
Abstract:
For many years, the intuitions underlying partial-order planning were largely taken for granted. Only in the past few years has there been renewed interest in the fundamental principles underlying this paradigm. In this paper, we present a rigorous comparative analysis of partial-order and total-order planning by focusing on two specific planners that can be directly compared. We show that there…
▽ More
For many years, the intuitions underlying partial-order planning were largely taken for granted. Only in the past few years has there been renewed interest in the fundamental principles underlying this paradigm. In this paper, we present a rigorous comparative analysis of partial-order and total-order planning by focusing on two specific planners that can be directly compared. We show that there are some subtle assumptions that underly the wide-spread intuitions regarding the supposed efficiency of partial-order planning. For instance, the superiority of partial-order planning can depend critically upon the search strategy and the structure of the search space. Understanding the underlying assumptions is crucial for constructing efficient planners.
△ Less
Submitted 30 November, 1994;
originally announced December 1994.