Skip to main content

Showing 1–12 of 12 results for author: Sanders, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.09646  [pdf, other

    cs.CV cs.AI

    A Survey of Video Datasets for Grounded Event Understanding

    Authors: Kate Sanders, Benjamin Van Durme

    Abstract: While existing video benchmarks largely consider specialized downstream tasks like retrieval or question-answering (QA), contemporary multimodal AI systems must be capable of well-rounded common-sense reasoning akin to human visual understanding. A critical component of human temporal-visual perception is our ability to identify and cognitively model "things happening", or events. Historically, vi… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  2. On the Evaluation of Machine-Generated Reports

    Authors: James Mayfield, Eugene Yang, Dawn Lawrie, Sean MacAvaney, Paul McNamee, Douglas W. Oard, Luca Soldaini, Ian Soboroff, Orion Weller, Efsun Kayi, Kate Sanders, Marc Mason, Noah Hibbler

    Abstract: Large Language Models (LLMs) have enabled new ways to satisfy information needs. Although great strides have been made in applying them to settings like document ranking and short-form text generation, they still struggle to compose complete, accurate, and verifiable long-form reports. Reports with these qualities are necessary to satisfy the complex, nuanced, or multi-faceted information needs of… ▽ More

    Submitted 9 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures, accepted at SIGIR 2024 as perspective paper

  3. arXiv:2403.11905  [pdf, other

    cs.AI cs.CL cs.CV cs.HC

    Tur[k]ingBench: A Challenge Benchmark for Web Agents

    Authors: Kevin Xu, Yeganeh Kordi, Kate Sanders, Yizhong Wang, Adam Byerly, Jack Zhang, Benjamin Van Durme, Daniel Khashabi

    Abstract: Recent chatbots have demonstrated impressive ability to understand and communicate in raw-text form. However, there is more to the world than raw text. For example, humans spend long hours of their time on web pages, where text is intertwined with other modalities and tasks are accomplished in the form of various complex interactions. Can state-of-the-art multi-modal models generalize to such comp… ▽ More

    Submitted 21 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  4. arXiv:2402.19467  [pdf, other

    cs.CL cs.AI cs.CV

    TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning

    Authors: Kate Sanders, Nathaniel Weir, Benjamin Van Durme

    Abstract: It is challenging to perform question-answering over complex, multimodal content such as television clips. This is in part because current video-language models rely on single-modality reasoning, have lowered performance on long inputs, and lack interpetability. We propose TV-TREES, the first multimodal entailment tree generator. TV-TREES serves as an approach to video understanding that promotes… ▽ More

    Submitted 10 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: 9 pages, preprint

    ACM Class: I.2.7; I.2.10

  5. arXiv:2402.14798  [pdf, other

    cs.CL cs.AI

    Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic

    Authors: Nathaniel Weir, Kate Sanders, Orion Weller, Shreya Sharma, Dongwei Jiang, Zheng** Jiang, Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Jansen, Peter Clark, Benjamin Van Durme

    Abstract: Contemporary language models enable new opportunities for structured reasoning with text, such as the construction and evaluation of intuitive, proof-like textual entailment trees without relying on brittle formal logic. However, progress in this direction has been hampered by a long-standing lack of a clear protocol for determining what valid compositional entailment is. This absence causes noisy… ▽ More

    Submitted 27 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  6. arXiv:2307.03153  [pdf, other

    cs.IR cs.CV cs.MM

    MultiVENT: Multilingual Videos of Events with Aligned Natural Text

    Authors: Kate Sanders, David Etter, Reno Kriz, Benjamin Van Durme

    Abstract: Everyday news coverage has shifted from traditional broadcasts towards a wide range of presentation formats such as first-hand, unedited video footage. Datasets that reflect the diverse array of multimodal, multilingual news sources available online could be used to teach models to benefit from this shift, but existing news video datasets focus on traditional news broadcasts produced for English-s… ▽ More

    Submitted 6 July, 2023; originally announced July 2023.

  7. arXiv:2210.03102  [pdf, other

    cs.CV cs.AI

    Ambiguous Images With Human Judgments for Robust Visual Event Classification

    Authors: Kate Sanders, Reno Kriz, Anqi Liu, Benjamin Van Durme

    Abstract: Contemporary vision benchmarks predominantly consider tasks on which humans can achieve near-perfect performance. However, humans are frequently presented with visual data that they cannot classify with 100% certainty, and models trained on standard vision benchmarks achieve low performance when evaluated on this data. To address this issue, we introduce a procedure for creating datasets of ambigu… ▽ More

    Submitted 22 October, 2022; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: 10 pages, NeurIPS 2022 Datasets and Benchmarks Track

    ACM Class: I.2.10; I.4.8; I.2.0

  8. arXiv:2105.02345  [pdf, other

    cs.RO

    A Multi-Chamber Smart Suction Cup for Adaptive Grip** and Haptic Exploration

    Authors: Tae Myung Huh, Kate Sanders, Michael Danielczuk, Monica Li, Yunliang Chen, Ken Goldberg, Hannah S. Stuart

    Abstract: We present a novel robot end-effector for grip** and haptic exploration. Tactile sensing through suction flow monitoring is applied to a new suction cup design that contains multiple chambers for air flow. Each chamber connects with its own remote pressure transducer, which enables both absolute and differential pressure measures between chambers. By changing the overall vacuum applied to this s… ▽ More

    Submitted 18 October, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

  9. RV-GAN: Segmenting Retinal Vascular Structure in Fundus Photographs using a Novel Multi-scale Generative Adversarial Network

    Authors: Sharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli, Stewart Lee Zuckerbrod, Kenton M. Sanders, Salah A. Baker

    Abstract: High fidelity segmentation of both macro and microvascular structure of the retina plays a pivotal role in determining degenerative retinal diseases, yet it is a difficult problem. Due to successive resolution loss in the encoding phase combined with the inability to recover this lost information in the decoding phase, autoencoding based segmentation approaches are limited in their ability to extr… ▽ More

    Submitted 14 May, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: Accepted to MICCAI2021

  10. arXiv:2011.11696  [pdf, other

    cs.RO

    Mechanical Search on Shelves using Lateral Access X-RAY

    Authors: Huang Huang, Marcus Dominguez-Kuhne, Jeffrey Ichnowski, Vishal Satish, Michael Danielczuk, Kate Sanders, Andrew Lee, Anelia Angelova, Vincent Vanhoucke, Ken Goldberg

    Abstract: Efficiently finding an occluded object with lateral access arises in many contexts such as warehouses, retail, healthcare, ship**, and homes. We introduce LAX-RAY (Lateral Access maXimal Reduction of occupancY support Area), a system to automate the mechanical search for occluded objects on shelves. For such lateral access environments, LAX-RAY couples a perception pipeline predicting a target o… ▽ More

    Submitted 23 November, 2020; originally announced November 2020.

    Comments: Huang Huang and Marcus Dominguez-Kuhne contributed equally

  11. arXiv:2007.10420  [pdf, other

    cs.RO cs.AI

    Non-Markov Policies to Reduce Sequential Failures in Robot Bin Picking

    Authors: Kate Sanders, Michael Danielczuk, Jeffrey Mahler, Ajay Tanwani, Ken Goldberg

    Abstract: A new generation of automated bin picking systems using deep learning is evolving to support increasing demand for e-commerce. To accommodate a wide variety of products, many automated systems include multiple gripper types and/or tool changers. However, for some objects, sequential grasp failures are common: when a computed grasp fails to lift and remove the object, the bin is often left unchange… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

    Comments: 2020 IEEE International Conference on Automation Science and Engineering (CASE)

    ACM Class: I.2.9

  12. Fundus2Angio: A Conditional GAN Architecture for Generating Fluorescein Angiography Images from Retinal Fundus Photography

    Authors: Sharif Amit Kamran, Khondker Fariha Hossain, Alireza Tavakkoli, Stewart Lee Zuckerbrod, Salah A. Baker, Kenton M. Sanders

    Abstract: Carrying out clinical diagnosis of retinal vascular degeneration using Fluorescein Angiography (FA) is a time consuming process and can pose significant adverse effects on the patient. Angiography requires insertion of a dye that may cause severe adverse effects and can even be fatal. Currently, there are no non-invasive systems capable of generating Fluorescein Angiography images. However, retina… ▽ More

    Submitted 29 September, 2020; v1 submitted 11 May, 2020; originally announced May 2020.

    Comments: 14 pages, Accepted to 15th International Symposium on Visual Computing 2020