Skip to main content

Showing 1–50 of 58 results for author: Oliva, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.08164  [pdf, other

    cs.CV

    ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs

    Authors: Irene Huang, Wei Lin, M. Jehanzeb Mirza, Jacob A. Hansen, Sivan Doveh, Victor Ion Butoi, Roei Herzig, Assaf Arbelle, Hilde Kuhene, Trevor Darrel, Chuang Gan, Aude Oliva, Rogerio Feris, Leonid Karlinsky

    Abstract: Compositional Reasoning (CR) entails gras** the significance of attributes, relations, and word order. Recent Vision-Language Models (VLMs), comprising a visual encoder and a Large Language Model (LLM) decoder, have demonstrated remarkable proficiency in such reasoning tasks. This prompts a crucial question: have VLMs effectively tackled the CR challenge? We conjecture that existing CR benchmark… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: The first three authors contributed equally

  2. arXiv:2405.17258  [pdf, other

    cs.LG cs.AI

    $\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

    Authors: Runqian Wang, Soumya Ghosh, David Cox, Diego Antognini, Aude Oliva, Rogerio Feris, Leonid Karlinsky

    Abstract: Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning (PEFT) techniques that closely match full model fine-tune performance while requiring only a small number of additional parameters. These additional LoRA parameters are specific to the base model being adapted. When the base model needs to be deprecated and replaced with a new one, all the associated LoRA modul… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  3. arXiv:2405.13852  [pdf, other

    cs.SE

    Predicting long time contributors with knowledge units of programming languages: an empirical study

    Authors: Md Ahasanuzzaman, Gustavo A. Oliva, Ahmed E. Hassan

    Abstract: Predicting potential long-time contributors (LTCs) early allows project maintainers to effectively allocate resources and mentoring to enhance their development and retention. Map** programming language expertise to developers and characterizing projects in terms of how they use programming languages can help identify developers who are more likely to become LTCs. However, prior studies on predi… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  4. arXiv:2404.10225  [pdf

    cs.SE cs.AI

    Rethinking Software Engineering in the Foundation Model Era: From Task-Driven AI Copilots to Goal-Driven AI Pair Programmers

    Authors: Ahmed E. Hassan, Gustavo A. Oliva, Dayi Lin, Boyuan Chen, Zhen Ming, Jiang

    Abstract: The advent of Foundation Models (FMs) and AI-powered copilots has transformed the landscape of software development, offering unprecedented code completion capabilities and enhancing developer productivity. However, the current task-driven nature of these copilots falls short in addressing the broader goals and complexities inherent in software engineering (SE). In this paper, we propose a paradig… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  5. arXiv:2404.05567  [pdf, other

    cs.LG cs.AI cs.CL

    Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

    Authors: Bowen Pan, Yikang Shen, Haokun Liu, Mayank Mishra, Gaoyuan Zhang, Aude Oliva, Colin Raffel, Rameswar Panda

    Abstract: Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scenarios. However, MoE models generally require 2-4$\times$ times more parameters to achieve comparable performance to a dense model, which incurs larger GPU memory requirements and makes MoE models less… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  6. arXiv:2402.15943  [pdf

    cs.SE cs.AI

    Rethinking Software Engineering in the Foundation Model Era: A Curated Catalogue of Challenges in the Development of Trustworthy FMware

    Authors: Ahmed E. Hassan, Dayi Lin, Gopi Krishnan Rajbahadur, Keheliya Gallaba, Filipe R. Cogo, Boyuan Chen, Haoxiang Zhang, Kishanthan Thangarajah, Gustavo Ansaldi Oliva, Jiahuei Lin, Wali Mohammad Abdullah, Zhen Ming Jiang

    Abstract: Foundation models (FMs), such as Large Language Models (LLMs), have revolutionized software development by enabling new use cases and business models. We refer to software built using FMs as FMware. The unique properties of FMware (e.g., prompts, agents, and the need for orchestration), coupled with the intrinsic limitations of FMs (e.g., hallucination) lead to a completely new set of software eng… ▽ More

    Submitted 3 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  7. Performance evaluation of Private and Public Blockchains for multi-cloud service federation

    Authors: Adam Zahir, Milan Groshev, Kiril Antevski, Carlos J. Bernardos, Constantine Ayimba, Antonio de la Oliva

    Abstract: The stringent low-latency, high reliability, availability and resilience requirements of 6G use cases will present challenges to cloud providers. Currently, cloud providers lack simple, efficient, and secure implementation of provisioning solutions that meet these challenges. Multi-cloud federation is a promising approach. In this paper, we evaluate the application of private and public blockchain… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: 5 pages, 5 figures, conference

  8. arXiv:2312.07192  [pdf, other

    cs.NI cs.RO

    waveSLAM: Empowering Accurate Indoor Map** Using Off-the-Shelf Millimeter-wave Self-sensing

    Authors: Pablo Picazo, Milan Groshev, Alejandro Blanco, Claudio Fiandrino, Antonio de la Oliva, Joerg Widmer

    Abstract: This paper presents the design, implementation and evaluation of waveSLAM, a low-cost mobile robot system that uses the millimetre wave (mmWave) communication devices to enhance the indoor map** process targeting environments with reduced visibility or glass/mirror walls. A unique feature of waveSLAM is that it only leverages existing Commercial-Off-The-Shelf (COTS) hardware (Lidar and mmWave ra… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Journal ref: VTC FALL 2023

  9. arXiv:2311.06231  [pdf, other

    cs.CV

    Learning Human Action Recognition Representations Without Real Humans

    Authors: Howard Zhong, Samarth Mishra, Donghyun Kim, SouYoung **, Rameswar Panda, Hilde Kuehne, Leonid Karlinsky, Venkatesh Saligrama, Aude Oliva, Rogerio Feris

    Abstract: Pre-training on massive video datasets has become essential to achieve high action recognition performance on smaller downstream datasets. However, most large-scale video datasets contain images of people and hence are accompanied with issues related to privacy, ethics, and data protection, often preventing them from being publicly shared for reproducible research. Existing work has attempted to a… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

    Comments: 19 pages, 7 figures, 2023 NeurIPS Datasets and Benchmarks Track

  10. arXiv:2310.07889  [pdf, other

    cs.CV cs.AI cs.CL cs.RO

    LangNav: Language as a Perceptual Representation for Navigation

    Authors: Bowen Pan, Rameswar Panda, SouYoung **, Rogerio Feris, Aude Oliva, Phillip Isola, Yoon Kim

    Abstract: We explore the use of language as a perceptual representation for vision-and-language navigation (VLN), with a focus on low-data settings. Our approach uses off-the-shelf vision systems for image captioning and object detection to convert an agent's egocentric panoramic view at each time step into natural language descriptions. We then finetune a pretrained language model to select an action, base… ▽ More

    Submitted 30 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  11. arXiv:2307.14980  [pdf, other

    cs.NI

    Aligning rTWT with 802.1Qbv: a Network Calculus Approach

    Authors: Carlos Barroso-Fernández, Jorge Martín-Pérez, Constantine Ayimba, Antonio de la Oliva

    Abstract: Industry 4.0 applications impose the challenging demand of delivering packets with bounded latencies via a wireless network. This is further complicated if the network is not dedicated to the time critical application. In this paper we use network calculus analysis to derive closed form expressions of latency bounds for time critical traffic when 802.11 Target Wake Time (TWT) and 802.1Qbv work tog… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

    Comments: 3 pages, 3 figures, workshop submission

  12. arXiv:2305.05654  [pdf, other

    cs.SE

    Using Knowledge Units of Programming Languages to Recommend Reviewers for Pull Requests: An Empirical Study

    Authors: Md Ahasanuzzaman, Gustavo A. Oliva, Ahmed E. Hassan

    Abstract: Code review is a key element of quality assurance in software development. Determining the right reviewer for a given code change requires understanding the characteristics of the changed code, identifying the skills of each potential reviewer (expertise profile), and finding a good match between the two. To facilitate this task, we design a code reviewer recommender that operates on the knowledge… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  13. arXiv:2304.04733  [pdf, other

    cs.HC

    Artifact magnification on deepfake videos increases human detection and subjective confidence

    Authors: Emilie Josephs, Camilo Fosco, Aude Oliva

    Abstract: The development of technologies for easily and automatically falsifying video has raised practical questions about people's ability to detect false information online. How vulnerable are people to deepfake videos? What technologies can be applied to boost their performance? Human susceptibility to deepfake videos is typically measured in laboratory settings, which do not reflect the challenges of… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: 8 pages, 4 figures

  14. arXiv:2303.17590  [pdf, other

    cs.CV cs.CL

    Going Beyond Nouns With Vision & Language Models Using Synthetic Data

    Authors: Paola Cascante-Bonilla, Khaled Shehada, James Seale Smith, Sivan Doveh, Donghyun Kim, Rameswar Panda, Gül Varol, Aude Oliva, Vicente Ordonez, Rogerio Feris, Leonid Karlinsky

    Abstract: Large-scale pre-trained Vision & Language (VL) models have shown remarkable performance in many applications, enabling replacing a fixed set of supported classes with zero-shot open vocabulary reasoning over (almost arbitrary) natural language prompts. However, recent works have uncovered a fundamental weakness of these models. For example, their difficulty to understand Visual Language Concepts (… ▽ More

    Submitted 30 August, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: Accepted to ICCV 2023. Project page: https://synthetic-vic.github.io/

  15. IEEE 802.11az Indoor Positioning with mmWave

    Authors: Pablo Picazo-Martínez, Carlos Barroso-Fernández, Jorge Martín-Pérez, Milan Groshev, Antonio de la Oliva

    Abstract: Last years we have witnessed the uprising of location based applications, which depend on the devices ability to accurately obtain their position. IEEE 802.11, foretelling the need for such applications, started the IEEE 802.11az work on Next Generation Positioning. Although this standard provides positioning enhancements for sub-6GHz and mmWave bands, high accuracy in the order of centimeters can… ▽ More

    Submitted 12 December, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Comments: 8 pages, 6 figures, magazine submission

    Journal ref: IEEE Communications Magazine 2024

  16. arXiv:2301.03198  [pdf

    cs.CV q-bio.NC

    The Algonauts Project 2023 Challenge: How the Human Brain Makes Sense of Natural Scenes

    Authors: A. T. Gifford, B. Lahner, S. Saba-Sadiya, M. G. Vilas, A. Lascelles, A. Oliva, K. Kay, G. Roig, R. M. Cichy

    Abstract: The sciences of biological and artificial intelligence are ever more intertwined. Neural computational principles inspire new intelligent machines, which are in turn used to advance theoretical understanding of the brain. To promote further exchange of ideas and collaboration between biological and artificial intelligence researchers, we introduce the 2023 installment of the Algonauts Project chal… ▽ More

    Submitted 11 July, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

    Comments: 5 pages, 2 figures

  17. arXiv:2210.14208  [pdf, other

    cs.RO cs.NI

    Don't Let Me Down! Offloading Robot VFs Up to the Cloud

    Authors: Khasa Gillani, Jorge Martín Pérez, Milan Groshev, Antonio de la Oliva, Robert Gazda

    Abstract: Recent trends in robotic services propose offloading robot functionalities to the Edge to meet the strict latency requirements of networked robotics. However, the Edge is typically an expensive resource and sometimes the Cloud is also an option, thus, decreasing the cost. Following this idea, we propose Don't Let Me Down! (DLMD), an algorithm that promotes offloading robot functions to the Cloud w… ▽ More

    Submitted 14 February, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: 5 Pages, 6 figures, submitted to 2023 IEEE 9th International Conference on Network Softwarization (NetSoft)

  18. arXiv:2208.04229  [pdf, other

    cs.NI cs.AI cs.LG

    Choose, not Hoard: Information-to-Model Matching for Artificial Intelligence in O-RAN

    Authors: Jorge Martín-Pérez, Nuria Molner, Francesco Malandrino, Carlos Jesús Bernardos, Antonio de la Oliva, David Gomez-Barquero

    Abstract: Open Radio Access Network (O-RAN) is an emerging paradigm, whereby virtualized network infrastructure elements from different vendors communicate via open, standardized interfaces. A key element therein is the RAN Intelligent Controller (RIC), an Artificial Intelligence (AI)-based controller. Traditionally, all data available in the network has been used to train a single AI model to be used at th… ▽ More

    Submitted 12 January, 2023; v1 submitted 1 August, 2022; originally announced August 2022.

    Journal ref: IEEE Communications Magazine, 2022

  19. arXiv:2206.08959  [pdf, other

    cs.SE

    Is my transaction done yet? An empirical study of transaction processing times in the Ethereum Blockchain Platform

    Authors: Michael Pacheco, Gustavo A. Oliva, Gopi Krishnan Rajbahadur, Ahmed E. Hassan

    Abstract: Ethereum is one of the most popular platforms for the development of blockchain-powered applications. These applications are known as Dapps. When engineering Dapps, developers need to translate requests captured in the front-end of their application into one or more smart contract transactions. Developers need to pay for these transactions and, the more they pay (i.e., the higher the gas price), t… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Under review in Transactions of Software Engineering and Methodology journal

  20. arXiv:2206.08905  [pdf, other

    cs.SE cs.DC cs.NI

    What makes Ethereum blockchain transactions be processed fast or slow? An empirical study

    Authors: Michael Pacheco, Gustavo A. Oliva, Gopi Krishnan Rajbahadur, Ahmed E. Hassan

    Abstract: The Ethereum platform allows developers to implement and deploy applications called Dapps onto the blockchain for public use through the use of smart contracts. To execute code within a smart contract, a paid transaction must be issued towards one of the functions that are exposed in the interface of a contract. However, such a transaction is only processed once one of the miners in the peer-to-pe… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Under Peer review in Empirical Software Engineering Journal

  21. arXiv:2206.00535  [pdf, other

    cs.CV cs.HC cs.SI

    Deepfake Caricatures: Amplifying attention to artifacts increases deepfake detection by humans and machines

    Authors: Camilo Fosco, Emilie Josephs, Alex Andonian, Allen Lee, Xi Wang, Aude Oliva

    Abstract: Deepfakes pose a serious threat to digital well-being by fueling misinformation. As deepfakes get harder to recognize with the naked eye, human users become increasingly reliant on deepfake detection models to decide if a video is real or fake. Currently, models yield a prediction for a video's authenticity, but do not integrate a method for alerting a human user. We introduce a framework for ampl… ▽ More

    Submitted 10 April, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: 9 pages, 5 figures, 4 tables

  22. arXiv:2205.04189  [pdf, other

    cs.NI cs.RO

    FoReCo: a forecast-based recovery mechanism for real-time remote control of robotic manipulators

    Authors: Milan Groshev, Jorge Martín-Pérez, Carlos Guimarães, Antonio de la Oliva, Carlos J. Bernardos

    Abstract: Wireless communications represent a game changer for future manufacturing plants, enabling flexible production chains as machinery and other components are not restricted to a location by the rigid wired connections on the factory floor. However, the presence of electromagnetic interference in the wireless spectrum may result in packet loss and delay, making it a challenging environment to meet th… ▽ More

    Submitted 9 May, 2022; originally announced May 2022.

    Comments: 10 figures, 12 pages, journal, submitted to IEEE TNSM

  23. arXiv:2110.07058  [pdf, other

    cs.CV cs.AI

    Ego4D: Around the World in 3,000 Hours of Egocentric Video

    Authors: Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do , et al. (60 additional authors not shown)

    Abstract: We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 931 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with cons… ▽ More

    Submitted 11 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: To appear in the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. This version updates the baseline result numbers for the Hands and Objects benchmark (appendix)

  24. arXiv:2108.10394  [pdf, other

    cs.CV

    Dynamic Network Quantization for Efficient Video Inference

    Authors: Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Aude Oliva, Rogerio Feris, Kate Saenko

    Abstract: Deep convolutional networks have recently achieved great success in video recognition, yet their practical realization remains a challenge due to the large amount of computational resources required to achieve robust recognition. Motivated by the effectiveness of quantization for boosting efficiency, in this paper, we propose a dynamic network quantization framework, that selects optimal precision… ▽ More

    Submitted 23 August, 2021; originally announced August 2021.

    Comments: ICCV 2021 Camera Ready Version

  25. arXiv:2106.12620  [pdf, other

    cs.CV

    IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers

    Authors: Bowen Pan, Rameswar Panda, Yifan Jiang, Zhangyang Wang, Rogerio Feris, Aude Oliva

    Abstract: The self-attention-based model, transformer, is recently becoming the leading backbone in the field of computer vision. In spite of the impressive success made by transformers in a variety of vision tasks, it still suffers from heavy computation and intensive memory costs. To address this limitation, this paper presents an Interpretability-Aware REDundancy REDuction framework (IA-RED$^2$). We star… ▽ More

    Submitted 26 October, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: Accepted in NeurIPS 2021

  26. arXiv:2106.05438  [pdf, other

    cs.CV

    Cross-Modal Discrete Representation Learning

    Authors: Alexander H. Liu, SouYoung **, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass

    Abstract: Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector. In this work we present a self-supervised learning framework that is able to learn a representation that captures finer levels of granularity across different modalities such as concepts or events represen… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: Preprint

  27. arXiv:2105.05165  [pdf, other

    cs.CV cs.AI cs.LG

    AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

    Authors: Rameswar Panda, Chun-Fu Chen, Quanfu Fan, Ximeng Sun, Kate Saenko, Aude Oliva, Rogerio Feris

    Abstract: Multi-modal learning, which focuses on utilizing various modalities to improve the performance of a model, is widely used in video recognition. While traditional multi-modal learning offers excellent recognition results, its computational expense limits its impact for many real-world applications. In this paper, we propose an adaptive multi-modal learning framework, called AdaMML, that selects on-… ▽ More

    Submitted 12 May, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

  28. arXiv:2105.04489  [pdf, other

    cs.CV cs.CL cs.LG cs.SD eess.AS

    Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions

    Authors: Mathew Monfort, SouYoung **, Alexander Liu, David Harwath, Rogerio Feris, James Glass, Aude Oliva

    Abstract: When people observe events, they are able to abstract key information and build concise summaries of what is happening. These summaries include contextual and semantic information describing the important high-level details (what, where, who and how) of the observed event and exclude background information that is deemed unimportant to the observer. With this in mind, the descriptions people gener… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

    Comments: To appear at CVPR 2021

  29. arXiv:2104.13714  [pdf

    cs.CV q-bio.NC

    The Algonauts Project 2021 Challenge: How the Human Brain Makes Sense of a World in Motion

    Authors: R. M. Cichy, K. Dwivedi, B. Lahner, A. Lascelles, P. Iamshchinina, M. Graumann, A. Andonian, N. A. R. Murty, K. Kay, G. Roig, A. Oliva

    Abstract: The sciences of natural and artificial intelligence are fundamentally connected. Brain-inspired human-engineered AI are now the standard for predicting human brain responses during vision, and conversely, the brain continues to inspire invention in AI. To promote even deeper connections between these fields, we here release the 2021 edition of the Algonauts Project Challenge: How the Human Brain M… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

    Comments: 5 pages, 2 figures

  30. arXiv:2104.00805  [pdf, other

    cs.CV cs.HC cs.MM

    Memorability: An image-computable measure of information utility

    Authors: Zoya Bylinskii, Lore Goetschalckx, Anelise Newman, Aude Oliva

    Abstract: The pixels in an image, and the objects, scenes, and actions that they compose, determine whether an image will be memorable or forgettable. While memorability varies by image, it is largely independent of an individual observer. Observer independence is what makes memorability an image-computable measure of information, and eligible for automatic prediction. In this chapter, we zoom into memorabi… ▽ More

    Submitted 1 April, 2021; originally announced April 2021.

  31. arXiv:2103.01435  [pdf, other

    cs.CV

    Improved Techniques for Quantizing Deep Networks with Adaptive Bit-Widths

    Authors: Ximeng Sun, Rameswar Panda, Chun-Fu Chen, Naigang Wang, Bowen Pan, Kailash Gopalakrishnan, Aude Oliva, Rogerio Feris, Kate Saenko

    Abstract: Quantizing deep networks with adaptive bit-widths is a promising technique for efficient inference across many devices and resource constraints. In contrast to static methods that repeat the quantization process and train different models for different constraints, adaptive quantization enables us to flexibly adjust the bit-widths of a single deep network during inference for instant adaptation in… ▽ More

    Submitted 16 September, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

  32. arXiv:2102.07887  [pdf, other

    cs.CV

    VA-RED$^2$: Video Adaptive Redundancy Reduction

    Authors: Bowen Pan, Rameswar Panda, Camilo Fosco, Chung-Ching Lin, Alex Andonian, Yue Meng, Kate Saenko, Aude Oliva, Rogerio Feris

    Abstract: Performing inference on deep learning models for videos remains a challenge due to the large amount of computational resources required to achieve robust recognition. An inherent property of real-world videos is the high correlation of information across frames which can translate into redundancy in either temporal or spatial feature maps of the models, or both. The type of redundant features depe… ▽ More

    Submitted 4 October, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: Accepted in ICLR 2021

  33. arXiv:2102.05775  [pdf, other

    cs.CV

    AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

    Authors: Yue Meng, Rameswar Panda, Chung-Ching Lin, Prasanna Sattigeri, Leonid Karlinsky, Kate Saenko, Aude Oliva, Rogerio Feris

    Abstract: Temporal modelling is the key for efficient video action recognition. While understanding temporal information can improve recognition accuracy for dynamic actions, removing temporal redundancy and reusing past features can significantly save computation leading to efficient action recognition. In this paper, we introduce an adaptive temporal fusion network, called AdaFuse, that dynamically fuses… ▽ More

    Submitted 10 February, 2021; originally announced February 2021.

    Comments: Accepted to ICLR2021

  34. Network Support for High-performance Distributed Machine Learning

    Authors: Francesco Malandrino, Carla Fabiana Chiasserini, Nuria Molner, Antonio De La Oliva

    Abstract: The traditional approach to distributed machine learning is to adapt learning algorithms to the network, e.g., reducing updates to curb overhead. Networks based on intelligent edge, instead, make it possible to follow the opposite approach, i.e., to define the logical network topology em around the learning task to perform, so as to meet the desired learning performance. In this paper, we propose… ▽ More

    Submitted 5 July, 2022; v1 submitted 5 February, 2021; originally announced February 2021.

    Journal ref: IEEE/ACM Transactions on Networking, 2022

  35. arXiv:2101.07676  [pdf, other

    cs.RO cs.NI

    COTORRA: COntext-aware Testbed fOR Robotic Applications

    Authors: Milan Groshev, Jorge Martín-Pérez, Kiril Antevski, Antonio de la Oliva, Carlos J. Bernardos

    Abstract: Edge & Fog computing have received considerable attention as promising candidates for the evolution of robotic systems. In this letter, we propose COTORRA, an Edge & Fog driven robotic testbed that combines context information with robot sensor data to validate innovative concepts for robotic systems prior to being applied in a production environment. In lab/university, we established COTORRA as a… ▽ More

    Submitted 19 January, 2021; originally announced January 2021.

    Comments: 4 pages, 4 figures, submitted to IEEE Communications Letters

  36. arXiv:2010.11757  [pdf, ps, other

    cs.CV

    Deep Analysis of CNN-based Spatio-temporal Representations for Action Recognition

    Authors: Chun-Fu Chen, Rameswar Panda, Kandan Ramakrishnan, Rogerio Feris, John Cohn, Aude Oliva, Quanfu Fan

    Abstract: In recent years, a number of approaches based on 2D or 3D convolutional neural networks (CNN) have emerged for video action recognition, achieving state-of-the-art results on several large-scale benchmark datasets. In this paper, we carry out in-depth comparative analysis to better understand the differences between these approaches and the progress made by them. To this end, we develop an unified… ▽ More

    Submitted 29 March, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: CVPR 2021 camera-ready version. Codes and models are available on https://github.com/IBM/action-recognition-pytorch

  37. arXiv:2009.02568  [pdf, other

    cs.CV

    Multimodal Memorability: Modeling Effects of Semantics and Decay on Video Memorability

    Authors: Anelise Newman, Camilo Fosco, Vincent Casser, Allen Lee, Barry McNamara, Aude Oliva

    Abstract: A key capability of an intelligent system is deciding when events from past experience must be remembered and when they can be forgotten. Towards this goal, we develop a predictive model of human visual event memory and how those memories decay over time. We introduce Memento10k, a new, dynamic video memorability dataset containing human annotations at different viewing delays. Based on our findin… ▽ More

    Submitted 5 September, 2020; originally announced September 2020.

    Comments: European Conference on Computer Vision

  38. arXiv:2008.05596  [pdf, other

    cs.CV

    We Have So Much In Common: Modeling Semantic Relational Set Abstractions in Videos

    Authors: Alex Andonian, Camilo Fosco, Mathew Monfort, Allen Lee, Rogerio Feris, Carl Vondrick, Aude Oliva

    Abstract: Identifying common patterns among events is a key ability in human and machine perception, as it underlies intelligent decision making. We propose an approach for learning semantic relational set abstractions on videos, inspired by human learning. We combine visual features with natural language supervision to generate high-level representations of similarities across a set of videos. This allows… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

    Comments: European Conference on Computer Vision (ECCV) 2020, accepted

  39. arXiv:2007.15796  [pdf, other

    cs.CV

    AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

    Authors: Yue Meng, Chung-Ching Lin, Rameswar Panda, Prasanna Sattigeri, Leonid Karlinsky, Aude Oliva, Kate Saenko, Rogerio Feris

    Abstract: Action recognition is an open and challenging problem in computer vision. While current state-of-the-art models offer excellent recognition results, their computational expense limits their impact for many real-world applications. In this paper, we propose a novel approach, called AR-Net (Adaptive Resolution Network), that selects on-the-fly the optimal resolution for each frame conditioned on the… ▽ More

    Submitted 30 July, 2020; originally announced July 2020.

  40. arXiv:2007.11870  [pdf, other

    cs.NI

    Delay and reliability-constrained VNF placement on mobile and volatile 5G infrastructure

    Authors: Balázs Németh, Nuria Molner, Jorge Jorge Martín-Pérez, Carlos J. Bernardos, Antonio de la Oliva, Balázs Sonkoly

    Abstract: The ongoing research and industrial exploitation of SDN and NFV technologies promise higher flexibility on network automation and infrastructure optimization. Choosing the location of Virtual Network Functions is a central problem in the automation and optimization of the software-defined, virtualization-based next generation of networks such as 5G and beyond. Network services provided for autonom… ▽ More

    Submitted 23 July, 2020; originally announced July 2020.

    Comments: Preprint version

  41. arXiv:1911.00232  [pdf, other

    cs.CV cs.LG eess.IV

    Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

    Authors: Mathew Monfort, Bowen Pan, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles, Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva

    Abstract: Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds. However, most large-scale datasets built to train models for action recognition in video only provide a single label per video. Consequently, models can be incorrectly penalized for classifying actions that exist in the videos but are not explicitly labeled and do not… ▽ More

    Submitted 27 September, 2021; v1 submitted 1 November, 2019; originally announced November 2019.

  42. arXiv:1909.04743  [pdf, other

    cs.CV

    Reasoning About Human-Object Interactions Through Dual Attention Networks

    Authors: Tete Xiao, Quanfu Fan, Dan Gutfreund, Mathew Monfort, Aude Oliva, Bolei Zhou

    Abstract: Objects are entities we act upon, where the functionality of an object is determined by how we interact with it. In this work we propose a Dual Attention Network model which reasons about human-object interactions. The dual-attentional framework weights the important features for objects and actions respectively. As a result, the recognition of objects and actions mutually benefit each other. The… ▽ More

    Submitted 10 September, 2019; originally announced September 2019.

    Comments: ICCV 2019

  43. arXiv:1906.10112  [pdf, other

    cs.CV

    GANalyze: Toward Visual Definitions of Cognitive Image Properties

    Authors: Lore Goetschalckx, Alex Andonian, Aude Oliva, Phillip Isola

    Abstract: We introduce a framework that uses Generative Adversarial Networks (GANs) to study cognitive properties like memorability, aesthetics, and emotional valence. These attributes are of interest because we do not have a concrete visual definition of what they entail. What does it look like for a dog to be more or less memorable? GANs allow us to generate a manifold of natural-looking images with fine-… ▽ More

    Submitted 24 June, 2019; originally announced June 2019.

    Comments: 17 pages, 15 figures

  44. arXiv:1905.05675  [pdf, other

    cs.CV cs.AI q-bio.NC

    The Algonauts Project: A Platform for Communication between the Sciences of Biological and Artificial Intelligence

    Authors: Radoslaw Martin Cichy, Gemma Roig, Alex Andonian, Kshitij Dwivedi, Benjamin Lahner, Alex Lascelles, Yalda Mohsenzadeh, Kandan Ramakrishnan, Aude Oliva

    Abstract: In the last decade, artificial intelligence (AI) models inspired by the brain have made unprecedented progress in performing real-world perceptual tasks like object classification and speech recognition. Recently, researchers of natural intelligence have begun using those AI models to explore how the brain performs such tasks. These developments suggest that future progress will benefit from incre… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

    Comments: 4 pages, 2 figures

  45. Synthetically Trained Icon Proposals for Parsing and Summarizing Infographics

    Authors: Spandan Madan, Zoya Bylinskii, Matthew Tancik, Adrià Recasens, Kimberli Zhong, Sami Alsheikh, Hanspeter Pfister, Aude Oliva, Fredo Durand

    Abstract: Widely used in news, business, and educational media, infographics are handcrafted to effectively communicate messages about complex and often abstract topics including `ways to conserve the environment' and `understanding the financial crisis'. Composed of stylistically and semantically diverse visual and textual elements, infographics pose new challenges for computer vision. While automatic text… ▽ More

    Submitted 27 July, 2018; originally announced July 2018.

  46. arXiv:1801.03150  [pdf, other

    cs.CV cs.AI

    Moments in Time Dataset: one million videos for event understanding

    Authors: Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfruend, Carl Vondrick, Aude Oliva

    Abstract: We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds. Modeling the spatial-audio-temporal dynamics even for actions occurring in 3 second videos poses many challenges: meaningful events do not include only people, but also objects, animals, and natural phenomena; visual and audito… ▽ More

    Submitted 16 February, 2019; v1 submitted 9 January, 2018; originally announced January 2018.

  47. arXiv:1711.08496  [pdf, other

    cs.CV

    Temporal Relational Reasoning in Videos

    Authors: Bolei Zhou, Alex Andonian, Aude Oliva, Antonio Torralba

    Abstract: Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species. In this paper, we introduce an effective and interpretable network module, the Temporal Relation Network (TRN), designed to learn and reason about temporal dependencies between video frames at multiple time scales. We evaluate TRN-equippe… ▽ More

    Submitted 24 July, 2018; v1 submitted 22 November, 2017; originally announced November 2017.

    Comments: camera-ready version for ECCV'18

  48. arXiv:1711.05611  [pdf, other

    cs.CV

    Interpreting Deep Visual Representations via Network Dissection

    Authors: Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba

    Abstract: The success of recent deep convolutional neural networks (CNNs) depends on learning hidden representations that can summarize the important factors of variation behind the data. However, CNNs often criticized as being black boxes that lack interpretability, since they have millions of unexplained model parameters. In this work, we describe Network Dissection, a method that interprets networks by p… ▽ More

    Submitted 26 June, 2018; v1 submitted 15 November, 2017; originally announced November 2017.

    Comments: *B. Zhou and D. Bau contributed equally to this work. 15 pages, 27 figures

    ACM Class: I.2.10

  49. arXiv:1709.09215  [pdf, other

    cs.CV

    Understanding Infographics through Textual and Visual Tag Prediction

    Authors: Zoya Bylinskii, Sami Alsheikh, Spandan Madan, Adria Recasens, Kimberli Zhong, Hanspeter Pfister, Fredo Durand, Aude Oliva

    Abstract: We introduce the problem of visual hashtag discovery for infographics: extracting visual elements from an infographic that are diagnostic of its topic. Given an infographic as input, our computational approach automatically outputs textual and visual elements predicted to be representative of the infographic content. Concretely, from a curated dataset of 29K large infographic images sampled across… ▽ More

    Submitted 26 September, 2017; originally announced September 2017.

  50. arXiv:1704.05796  [pdf, other

    cs.CV cs.AI

    Network Dissection: Quantifying Interpretability of Deep Visual Representations

    Authors: David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba

    Abstract: We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units wit… ▽ More

    Submitted 19 April, 2017; originally announced April 2017.

    Comments: First two authors contributed equally. Oral presentation at CVPR 2017

    ACM Class: I.2.10