Search | arXiv e-print repository

Region-Based Representations Revisited

Authors: Michal Shlapentokh-Rothman, Ansel Blume, Yao Xiao, Yuqun Wu, Sethuraman T V, Heyi Tao, Jae Yong Lee, Wilfredo Torres, Yu-Xiong Wang, Derek Hoiem

Abstract: We investigate whether region-based representations are effective for recognition. Regions were once a mainstay in recognition approaches, but pixel and patch-based features are now used almost exclusively. We show that recent class-agnostic segmenters like SAM can be effectively combined with strong unsupervised representations like DINOv2 and used for a wide variety of tasks, including semantic… ▽ More We investigate whether region-based representations are effective for recognition. Regions were once a mainstay in recognition approaches, but pixel and patch-based features are now used almost exclusively. We show that recent class-agnostic segmenters like SAM can be effectively combined with strong unsupervised representations like DINOv2 and used for a wide variety of tasks, including semantic segmentation, object-based image retrieval, and multi-image analysis. Once the masks and features are extracted, these representations, even with linear decoders, enable competitive performance, making them well suited to applications that require custom queries. The compactness of the representation also makes it well-suited to video analysis and other problems requiring inference across many images. △ Less

Submitted 9 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

Comments: CVPR 2024 Camera Ready; website: https://regionreps.web.illinois.edu/

arXiv:2310.16042 [pdf, other]

WebWISE: Web Interface Control and Sequential Exploration with Large Language Models

Authors: Heyi Tao, Sethuraman T V, Michal Shlapentokh-Rothman, Derek Hoiem

Abstract: The paper investigates using a Large Language Model (LLM) to automatically perform web software tasks using click, scroll, and text input operations. Previous approaches, such as reinforcement learning (RL) or imitation learning, are inefficient to train and task-specific. Our method uses filtered Document Object Model (DOM) elements as observations and performs tasks step-by-step, sequentially ge… ▽ More The paper investigates using a Large Language Model (LLM) to automatically perform web software tasks using click, scroll, and text input operations. Previous approaches, such as reinforcement learning (RL) or imitation learning, are inefficient to train and task-specific. Our method uses filtered Document Object Model (DOM) elements as observations and performs tasks step-by-step, sequentially generating small programs based on the current observations. We use in-context learning, either benefiting from a single manually provided example, or an automatically generated example based on a successful zero-shot trial. We evaluate the proposed method on the MiniWob++ benchmark. With only one in-context example, our WebWISE method achieves similar or better performance than other methods that require many demonstrations or trials. △ Less

Submitted 24 October, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.04406 [pdf, other]

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Authors: Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang

Abstract: While language models (LMs) have shown potential across a range of decision-making tasks, their reliance on simple acting processes limits their broad deployment as autonomous agents. In this paper, we introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of LMs in reasoning, acting, and planning. By leveraging the in-context learning ability o… ▽ More While language models (LMs) have shown potential across a range of decision-making tasks, their reliance on simple acting processes limits their broad deployment as autonomous agents. In this paper, we introduce Language Agent Tree Search (LATS) -- the first general framework that synergizes the capabilities of LMs in reasoning, acting, and planning. By leveraging the in-context learning ability of LMs, we integrate Monte Carlo Tree Search into LATS to enable LMs as agents, along with LM-powered value functions and self-reflections for proficient exploration and enhanced decision-making. A key feature of our approach is the incorporation of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism that surpasses the constraints of existing techniques. Our experimental evaluation across diverse domains, including programming, interactive question-answering (QA), web navigation, and math, validates the effectiveness and generality of LATS in decision-making while maintaining competitive or improved reasoning performance. Notably, LATS achieves state-of-the-art pass@1 accuracy (92.7%) for programming on HumanEval with GPT-4 and demonstrates gradient-free performance (average score of 75.9) comparable to gradient-based fine-tuning for web navigation on WebShop with GPT-3.5. Code can be found at https://github.com/lapisrocks/LanguageAgentTreeSearch △ Less

Submitted 5 June, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: Code at https://github.com/lapisrocks/LanguageAgentTreeSearch

arXiv:2010.11029 [pdf, other]

Learning Curves for Analysis of Deep Networks

Authors: Derek Hoiem, Tanmay Gupta, Zhizhong Li, Michal M. Shlapentokh-Rothman

Abstract: Learning curves model a classifier's test error as a function of the number of training samples. Prior works show that learning curves can be used to select model parameters and extrapolate performance. We investigate how to use learning curves to evaluate design choices, such as pretraining, architecture, and data augmentation. We propose a method to robustly estimate learning curves, abstract th… ▽ More Learning curves model a classifier's test error as a function of the number of training samples. Prior works show that learning curves can be used to select model parameters and extrapolate performance. We investigate how to use learning curves to evaluate design choices, such as pretraining, architecture, and data augmentation. We propose a method to robustly estimate learning curves, abstract their parameters into error and data-reliance, and evaluate the effectiveness of different parameterizations. Our experiments exemplify use of learning curves for analysis and yield several interesting observations. △ Less

Submitted 5 April, 2021; v1 submitted 21 October, 2020; originally announced October 2020.

Comments: Improved text and figure organization, additional experiments on optimization

arXiv:2010.00533 [pdf, other]

Linking Threat Tactics, Techniques, and Patterns with Defensive Weaknesses, Vulnerabilities and Affected Platform Configurations for Cyber Hunting

Authors: Erik Hemberg, Jonathan Kelly, Michal Shlapentokh-Rothman, Bryn Reinstadler, Katherine Xu, Nick Rutar, Una-May O'Reilly

Abstract: Many public sources of cyber threat and vulnerability information exist to help defend cyber systems. This paper links MITRE's ATT&CK MATRIX of Tactics and Techniques, NIST's Common Weakness Enumerations (CWE), Common Vulnerabilities and Exposures (CVE), and Common Attack Pattern Enumeration and Classification list (CAPEC), to gain further insight from alerts, threats and vulnerabilities. We prese… ▽ More Many public sources of cyber threat and vulnerability information exist to help defend cyber systems. This paper links MITRE's ATT&CK MATRIX of Tactics and Techniques, NIST's Common Weakness Enumerations (CWE), Common Vulnerabilities and Exposures (CVE), and Common Attack Pattern Enumeration and Classification list (CAPEC), to gain further insight from alerts, threats and vulnerabilities. We preserve all entries and relations of the sources, while enabling bi-directional, relational path tracing within an aggregate data graph called BRON. In one example, we use BRON to enhance the information derived from a list of the top 10 most frequently exploited CVEs. We identify attack patterns, tactics, and techniques that exploit these CVEs and also uncover a disparity in how much linked information exists for each of these CVEs. This prompts us to further inventory BRON's collection of sources to provide a view of the extent and range of the coverage and blind spots of public data sources. △ Less

Submitted 10 February, 2021; v1 submitted 1 October, 2020; originally announced October 2020.

Comments: 13 pages, 12 figures

Showing 1–5 of 5 results for author: Shlapentokh-Rothman, M