-
Region-Based Representations Revisited
Authors:
Michal Shlapentokh-Rothman,
Ansel Blume,
Yao Xiao,
Yuqun Wu,
Sethuraman T V,
Heyi Tao,
Jae Yong Lee,
Wilfredo Torres,
Yu-Xiong Wang,
Derek Hoiem
Abstract:
We investigate whether region-based representations are effective for recognition. Regions were once a mainstay in recognition approaches, but pixel and patch-based features are now used almost exclusively. We show that recent class-agnostic segmenters like SAM can be effectively combined with strong unsupervised representations like DINOv2 and used for a wide variety of tasks, including semantic…
▽ More
We investigate whether region-based representations are effective for recognition. Regions were once a mainstay in recognition approaches, but pixel and patch-based features are now used almost exclusively. We show that recent class-agnostic segmenters like SAM can be effectively combined with strong unsupervised representations like DINOv2 and used for a wide variety of tasks, including semantic segmentation, object-based image retrieval, and multi-image analysis. Once the masks and features are extracted, these representations, even with linear decoders, enable competitive performance, making them well suited to applications that require custom queries. The compactness of the representation also makes it well-suited to video analysis and other problems requiring inference across many images.
△ Less
Submitted 9 June, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
WebWISE: Web Interface Control and Sequential Exploration with Large Language Models
Authors:
Heyi Tao,
Sethuraman T V,
Michal Shlapentokh-Rothman,
Derek Hoiem
Abstract:
The paper investigates using a Large Language Model (LLM) to automatically perform web software tasks using click, scroll, and text input operations. Previous approaches, such as reinforcement learning (RL) or imitation learning, are inefficient to train and task-specific. Our method uses filtered Document Object Model (DOM) elements as observations and performs tasks step-by-step, sequentially ge…
▽ More
The paper investigates using a Large Language Model (LLM) to automatically perform web software tasks using click, scroll, and text input operations. Previous approaches, such as reinforcement learning (RL) or imitation learning, are inefficient to train and task-specific. Our method uses filtered Document Object Model (DOM) elements as observations and performs tasks step-by-step, sequentially generating small programs based on the current observations. We use in-context learning, either benefiting from a single manually provided example, or an automatically generated example based on a successful zero-shot trial. We evaluate the proposed method on the MiniWob++ benchmark. With only one in-context example, our WebWISE method achieves similar or better performance than other methods that require many demonstrations or trials.
△ Less
Submitted 24 October, 2023; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Authors:
Andy Zhou,
Kai Yan,
Michal Shlapentokh-Rothman,
Haohan Wang,
Yu-Xiong Wang
Abstract:
While language models (LMs) have shown potential across a range of decision-making tasks, their reliance on simple acting processes limits their broad deployment as autonomous agents. In this paper, we introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of LMs in reasoning, acting, and planning. By leveraging the in-context learning ability o…
▽ More
While language models (LMs) have shown potential across a range of decision-making tasks, their reliance on simple acting processes limits their broad deployment as autonomous agents. In this paper, we introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of LMs in reasoning, acting, and planning. By leveraging the in-context learning ability of LMs, we integrate Monte Carlo Tree Search into LATS to enable LMs as agents, along with LM-powered value functions and self-reflections for proficient exploration and enhanced decision-making. A key feature of our approach is the incorporation of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism that surpasses the constraints of existing techniques. Our experimental evaluation across diverse domains, including programming, interactive question-answering (QA), web navigation, and math, validates the effectiveness and generality of LATS in decision-making while maintaining competitive or improved reasoning performance. Notably, LATS achieves state-of-the-art pass@1 accuracy (92.7%) for programming on HumanEval with GPT-4 and demonstrates gradient-free performance (average score of 75.9) comparable to gradient-based fine-tuning for web navigation on WebShop with GPT-3.5. Code can be found at https://github.com/lapisrocks/LanguageAgentTreeSearch
△ Less
Submitted 5 June, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Learning Curves for Analysis of Deep Networks
Authors:
Derek Hoiem,
Tanmay Gupta,
Zhizhong Li,
Michal M. Shlapentokh-Rothman
Abstract:
Learning curves model a classifier's test error as a function of the number of training samples. Prior works show that learning curves can be used to select model parameters and extrapolate performance. We investigate how to use learning curves to evaluate design choices, such as pretraining, architecture, and data augmentation. We propose a method to robustly estimate learning curves, abstract th…
▽ More
Learning curves model a classifier's test error as a function of the number of training samples. Prior works show that learning curves can be used to select model parameters and extrapolate performance. We investigate how to use learning curves to evaluate design choices, such as pretraining, architecture, and data augmentation. We propose a method to robustly estimate learning curves, abstract their parameters into error and data-reliance, and evaluate the effectiveness of different parameterizations. Our experiments exemplify use of learning curves for analysis and yield several interesting observations.
△ Less
Submitted 5 April, 2021; v1 submitted 21 October, 2020;
originally announced October 2020.
-
Linking Threat Tactics, Techniques, and Patterns with Defensive Weaknesses, Vulnerabilities and Affected Platform Configurations for Cyber Hunting
Authors:
Erik Hemberg,
Jonathan Kelly,
Michal Shlapentokh-Rothman,
Bryn Reinstadler,
Katherine Xu,
Nick Rutar,
Una-May O'Reilly
Abstract:
Many public sources of cyber threat and vulnerability information exist to help defend cyber systems. This paper links MITRE's ATT&CK MATRIX of Tactics and Techniques, NIST's Common Weakness Enumerations (CWE), Common Vulnerabilities and Exposures (CVE), and Common Attack Pattern Enumeration and Classification list (CAPEC), to gain further insight from alerts, threats and vulnerabilities. We prese…
▽ More
Many public sources of cyber threat and vulnerability information exist to help defend cyber systems. This paper links MITRE's ATT&CK MATRIX of Tactics and Techniques, NIST's Common Weakness Enumerations (CWE), Common Vulnerabilities and Exposures (CVE), and Common Attack Pattern Enumeration and Classification list (CAPEC), to gain further insight from alerts, threats and vulnerabilities. We preserve all entries and relations of the sources, while enabling bi-directional, relational path tracing within an aggregate data graph called BRON. In one example, we use BRON to enhance the information derived from a list of the top 10 most frequently exploited CVEs. We identify attack patterns, tactics, and techniques that exploit these CVEs and also uncover a disparity in how much linked information exists for each of these CVEs. This prompts us to further inventory BRON's collection of sources to provide a view of the extent and range of the coverage and blind spots of public data sources.
△ Less
Submitted 10 February, 2021; v1 submitted 1 October, 2020;
originally announced October 2020.