Skip to main content

Showing 1–30 of 30 results for author: Ballan, L

.
  1. arXiv:2405.19822  [pdf, other

    cs.CV cs.AI cs.ET

    Improving Object Detector Training on Synthetic Data by Starting With a Strong Baseline Methodology

    Authors: Frank A. Ruis, Alma M. Liezenga, Friso G. Heslinga, Luca Ballan, Thijs A. Eker, Richard J. M. den Hollander, Martin C. van Leeuwen, Judith Dijk, Wyke Huizinga

    Abstract: Collecting and annotating real-world data for the development of object detection models is a time-consuming and expensive process. In the military domain in particular, data collection can also be dangerous or infeasible. Training models on synthetic data may provide a solution for cases where access to real-world training data is restricted. However, bridging the reality gap between synthetic an… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Submitted to and presented at SPIE Defense + Commercial Sensing 2024, 13 pages, 4 figures, 3 tables

  2. arXiv:2404.11327  [pdf, other

    cs.RO cs.CV

    Following the Human Thread in Social Navigation

    Authors: Luca Scofano, Alessio Sampieri, Tommaso Campari, Valentino Sacco, Indro Spinelli, Lamberto Ballan, Fabio Galasso

    Abstract: The success of collaboration between humans and robots in shared environments relies on the robot's real-time adaptation to human motion. Specifically, in Social Navigation, the agent should be close enough to assist but ready to back up to let the human move freely, avoiding collisions. Human trajectories emerge as crucial cues in Social Navigation, but they are partially observable from the robo… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  3. arXiv:2305.10913  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Weakly-Supervised Visual-Textual Grounding with Semantic Prior Refinement

    Authors: Davide Rigoni, Luca Parolari, Luciano Serafini, Alessandro Sperduti, Lamberto Ballan

    Abstract: Using only image-sentence pairs, weakly-supervised visual-textual grounding aims to learn region-phrase correspondences of the respective entity mentions. Compared to the supervised approach, learning is more difficult since bounding boxes and textual phrases correspondences are unavailable. In light of this, we propose the Semantic Prior Refinement Model (SPRM), whose predictions are obtained by… ▽ More

    Submitted 26 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

  4. arXiv:2305.08553  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Distilling Knowledge for Short-to-Long Term Trajectory Prediction

    Authors: Sourav Das, Guglielmo Camporese, Shaokang Cheng, Lamberto Ballan

    Abstract: Long-term trajectory forecasting is an important and challenging problem in the fields of computer vision, machine learning, and robotics. One fundamental difficulty stands in the evolution of the trajectory that becomes more and more uncertain and unpredictable as the time horizon grows, subsequently increasing the complexity of the problem. To overcome this issue, in this paper, we propose Di-Lo… ▽ More

    Submitted 15 March, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

  5. arXiv:2212.00767  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    Exploiting Proximity-Aware Tasks for Embodied Social Navigation

    Authors: Enrico Cancelli, Tommaso Campari, Luciano Serafini, Angel X. Chang, Lamberto Ballan

    Abstract: Learning how to navigate among humans in an occluded and spatially constrained indoor environment, is a key ability required to embodied agent to be integrated into our society. In this paper, we propose an end-to-end architecture that exploits Proximity-Aware Tasks (referred as to Risk and Proximity Compass) to inject into a reinforcement learning navigation policy the ability to infer common-sen… ▽ More

    Submitted 10 March, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

  6. arXiv:2210.14714  [pdf, other

    cs.CV cs.MM

    TAMFormer: Multi-Modal Transformer with Learned Attention Mask for Early Intent Prediction

    Authors: Nada Osman, Guglielmo Camporese, Lamberto Ballan

    Abstract: Human intention prediction is a growing area of research where an activity in a video has to be anticipated by a vision-based system. To this end, the model creates a representation of the past, and subsequently, it produces future hypotheses about upcoming scenarios. In this work, we focus on pedestrians' early intention prediction in which, from a current observation of an urban scene, the model… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  7. arXiv:2206.00481  [pdf, other

    cs.CV cs.LG

    Where are my Neighbors? Exploiting Patches Relations in Self-Supervised Vision Transformer

    Authors: Guglielmo Camporese, Elena Izzo, Lamberto Ballan

    Abstract: Vision Transformers (ViTs) enabled the use of the transformer architecture on vision tasks showing impressive performances when trained on big datasets. However, on relatively small datasets, ViTs are less accurate given their lack of inductive bias. To this end, we propose a simple but still effective Self-Supervised Learning (SSL) strategy to train ViTs, that without any external annotation or e… ▽ More

    Submitted 13 October, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: Accepted to BMVC 2022

  8. arXiv:2204.11561  [pdf, other

    cs.CV cs.AI cs.LG

    Goal-driven Self-Attentive Recurrent Networks for Trajectory Prediction

    Authors: Luigi Filippo Chiara, Pasquale Coscia, Sourav Das, Simone Calderara, Rita Cucchiara, Lamberto Ballan

    Abstract: Human trajectory forecasting is a key component of autonomous vehicles, social-aware robots and advanced video-surveillance applications. This challenging task typically requires knowledge about past motion, the environment and likely destination areas. In this context, multi-modality is a fundamental aspect and its effective modeling can be beneficial to any architecture. Inferring accurate traje… ▽ More

    Submitted 25 April, 2022; originally announced April 2022.

    Comments: Accepted to CVPR 2022 Precognition Workshop

  9. arXiv:2203.04781  [pdf, other

    cs.CV cs.AI

    How many Observations are Enough? Knowledge Distillation for Trajectory Forecasting

    Authors: Alessio Monti, Angelo Porrello, Simone Calderara, Pasquale Coscia, Lamberto Ballan, Rita Cucchiara

    Abstract: Accurate prediction of future human positions is an essential task for modern video-surveillance systems. Current state-of-the-art models usually rely on a "history" of past tracked locations (e.g., 3 to 5 seconds) to predict a plausible sequence of future locations (e.g., up to the next 5 seconds). We feel that this common schema neglects critical traits of realistic applications: as the collecti… ▽ More

    Submitted 9 March, 2022; originally announced March 2022.

    Comments: Accepted by CVPR 2022

  10. arXiv:2203.02583  [pdf, other

    cs.CV

    Online Learning of Reusable Abstract Models for Object Goal Navigation

    Authors: Tommaso Campari, Leonardo Lamanna, Paolo Traverso, Luciano Serafini, Lamberto Ballan

    Abstract: In this paper, we present a novel approach to incrementally learn an Abstract Model of an unknown environment, and show how an agent can reuse the learned model for tackling the Object Goal Navigation task. The Abstract Model is a finite state machine in which each state is an abstraction of a state of the environment, as perceived by the agent in a certain position and orientation. The perception… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: Paper accepted at CVPR2022

  11. arXiv:2109.00829  [pdf, other

    cs.CV cs.AI cs.LG

    SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos

    Authors: Nada Osman, Guglielmo Camporese, Pasquale Coscia, Lamberto Ballan

    Abstract: Action anticipation in egocentric videos is a difficult task due to the inherently multi-modal nature of human actions. Additionally, some actions happen faster or slower than others depending on the actor or surrounding context which could vary each time and lead to different predictions. Based on this idea, we build upon RULSTM architecture, which is specifically designed for anticipating human… ▽ More

    Submitted 2 September, 2021; originally announced September 2021.

    Comments: Accepted to EPIC@ICCV 2021

  12. arXiv:2104.09159  [pdf, other

    cs.CV cs.AI cs.LG

    Conditional Variational Capsule Network for Open Set Recognition

    Authors: Yunrui Guo, Guglielmo Camporese, Wen**g Yang, Alessandro Sperduti, Lamberto Ballan

    Abstract: In open set recognition, a classifier has to detect unknown classes that are not known at training time. In order to recognize new categories, the classifier has to project the input samples of known classes in very compact and separated regions of the features space for discriminating samples of unknown classes. Recently proposed Capsule Networks have shown to outperform alternatives in many fiel… ▽ More

    Submitted 17 August, 2021; v1 submitted 19 April, 2021; originally announced April 2021.

    Comments: Accepted to ICCV 2021

  13. arXiv:2104.01071  [pdf, other

    eess.IV cs.CV

    Prediction of Tuberculosis using U-Net and segmentation techniques

    Authors: Dennis Núñez-Fernández, Lamberto Ballan, Gabriel Jiménez-Avalos, Jorge Coronel, Patricia Sheen, Mirko Zimic

    Abstract: One of the most serious public health problems in Peru and worldwide is Tuberculosis (TB), which is produced by a bacterium known as Mycobacterium tuberculosis. The purpose of this work is to facilitate and automate the diagnosis of tuberculosis using the MODS method and using lens-free microscopy, as it is easier to calibrate and easier to use by untrained personnel compared to lens microscopy. T… ▽ More

    Submitted 2 April, 2021; originally announced April 2021.

    Comments: AI for Public Health Workshop at ICLR 2021. arXiv admin note: text overlap with arXiv:2007.02482

  14. arXiv:2008.09403  [pdf, ps, other

    cs.CV

    Exploiting Scene-specific Features for Object Goal Navigation

    Authors: Tommaso Campari, Paolo Eccher, Luciano Serafini, Lamberto Ballan

    Abstract: Can the intrinsic relation between an object and the room in which it is usually located help agents in the Visual Navigation Task? We study this question in the context of Object Navigation, a problem in which an agent has to reach an object of a specific class while moving in a complex domestic environment. In this paper, we introduce a new reduced dataset that speeds up the training of navigati… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

    Comments: Accepted at ACVR2020 ECCV2020 Workshop

  15. arXiv:2007.02482  [pdf, other

    eess.IV cs.CV

    Automatic semantic segmentation for prediction of tuberculosis using lens-free microscopy images

    Authors: Dennis Núñez-Fernández, Lamberto Ballan, Gabriel Jiménez-Avalos, Jorge Coronel, Mirko Zimic

    Abstract: Tuberculosis (TB), caused by a germ called Mycobacterium tuberculosis, is one of the most serious public health problems in Peru and the world. The development of this project seeks to facilitate and automate the diagnosis of tuberculosis by the MODS method and using lens-free microscopy, due they are easier to calibrate and easier to use (by untrained personnel) in comparison with lens microscopy… ▽ More

    Submitted 5 July, 2020; originally announced July 2020.

    Comments: ML for Global Health Workshop at ICML 2020

  16. arXiv:2007.02457  [pdf, other

    eess.IV cs.CV

    Using Capsule Neural Network to predict Tuberculosis in lens-free microscopic images

    Authors: Dennis Núñez-Fernández, Lamberto Ballan, Gabriel Jiménez-Avalos, Jorge Coronel, Mirko Zimic

    Abstract: Tuberculosis, caused by a bacteria called Mycobacterium tuberculosis, is one of the most serious public health problems worldwide. This work seeks to facilitate and automate the prediction of tuberculosis by the MODS method and using lens-free microscopy, which is easy to use by untrained personnel. We employ the CapsNet architecture in our collected dataset and show that it has a better accuracy… ▽ More

    Submitted 5 July, 2020; originally announced July 2020.

    Comments: HSYS Workshop at ICML 2020

  17. AC-VRNN: Attentive Conditional-VRNN for Multi-Future Trajectory Prediction

    Authors: Alessia Bertugli, Simone Calderara, Pasquale Coscia, Lamberto Ballan, Rita Cucchiara

    Abstract: Anticipating human motion in crowded scenarios is essential for develo** intelligent transportation systems, social-aware robots and advanced video surveillance applications. A key component of this task is represented by the inherently multi-modal nature of human paths which makes socially acceptable multiple futures when human interactions are involved. To this end, we propose a generative arc… ▽ More

    Submitted 8 July, 2021; v1 submitted 17 May, 2020; originally announced May 2020.

    Comments: Accepted at Computer Vision and Image Understanding (CVIU)

  18. arXiv:2004.07711  [pdf, other

    cs.CV cs.LG

    Knowledge Distillation for Action Anticipation via Label Smoothing

    Authors: Guglielmo Camporese, Pasquale Coscia, Antonino Furnari, Giovanni Maria Farinella, Lamberto Ballan

    Abstract: Human capability to anticipate near future from visual observations and non-verbal cues is essential for develo** intelligent systems that need to interact with people. Several research areas, such as human-robot interaction (HRI), assisted living or autonomous driving need to foresee future events to avoid crashes or help people. Egocentric scenarios are classic examples where action anticipati… ▽ More

    Submitted 18 December, 2020; v1 submitted 16 April, 2020; originally announced April 2020.

    Comments: Accepted to ICPR 2020

  19. arXiv:1910.05770  [pdf, other

    cs.CV cs.LG cs.MM

    A CNN-RNN Framework for Image Annotation from Visual Cues and Social Network Metadata

    Authors: Tobia Tesan, Pasquale Coscia, Lamberto Ballan

    Abstract: Images represent a commonly used form of visual communication among people. Nevertheless, image classification may be a challenging task when dealing with unclear or non-common images needing more context to be correctly annotated. Metadata accompanying images on social-media represent an ideal source of additional information for retrieving proper neighborhoods easing image annotation task. To th… ▽ More

    Submitted 30 March, 2020; v1 submitted 13 October, 2019; originally announced October 2019.

  20. arXiv:1909.08840  [pdf, other

    cs.CV

    Social and Scene-Aware Trajectory Prediction in Crowded Spaces

    Authors: Matteo Lisotto, Pasquale Coscia, Lamberto Ballan

    Abstract: Mimicking human ability to forecast future positions or interpret complex interactions in urban scenarios, such as streets, shop** malls or squares, is essential to develop socially compliant robots or self-driving cars. Autonomous systems may gain advantage on anticipating human motion to avoid collisions or to naturally behave alongside people. To foresee plausible trajectories, we construct a… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: Accepted to ICCV 2019 Workshop on Assistive Computer Vision and Robotics (ACVR)

  21. Learning without Prejudice: Avoiding Bias in Webly-Supervised Action Recognition

    Authors: Christian Rupprecht, Ansh Kapil, Nan Liu, Lamberto Ballan, Federico Tombari

    Abstract: Webly-supervised learning has recently emerged as an alternative paradigm to traditional supervised learning based on large-scale datasets with manual annotations. The key idea is that models such as CNNs can be learned from the noisy visual data available on the web. In this work we aim to exploit web data for video understanding tasks such as action recognition and detection. One of the main pro… ▽ More

    Submitted 14 June, 2017; originally announced June 2017.

    Comments: Submitted to CVIU SI: Computer Vision and the Web

  22. arXiv:1706.01788  [pdf, other

    cs.MM cs.CR

    Localization of JPEG double compression through multi-domain convolutional neural networks

    Authors: Irene Amerini, Tiberio Uricchio, Lamberto Ballan, Roberto Caldelli

    Abstract: When an attacker wants to falsify an image, in most of cases she/he will perform a JPEG recompression. Different techniques have been developed based on diverse theoretical assumptions but very effective solutions have not been developed yet. Recently, machine learning based approaches have been started to appear in the field of image forensics to solve diverse tasks such as acquisition source ide… ▽ More

    Submitted 6 June, 2017; originally announced June 2017.

    Comments: Accepted to CVPRW 2017, Workshop on Media Forensics

  23. arXiv:1705.02503  [pdf, other

    cs.CV

    Context-Aware Trajectory Prediction

    Authors: Federico Bartoli, Giuseppe Lisanti, Lamberto Ballan, Alberto Del Bimbo

    Abstract: Human motion and behaviour in crowded spaces is influenced by several factors, such as the dynamics of other moving agents in the scene, as well as the static elements that might be perceived as points of attraction or obstacles. In this work, we present a new model for human trajectory prediction which is able to take advantage of both human-human and human-space interactions. The future trajecto… ▽ More

    Submitted 6 May, 2017; originally announced May 2017.

    Comments: Submitted to BMVC 2017

  24. arXiv:1705.01781  [pdf, other

    cs.CV

    Am I Done? Predicting Action Progress in Videos

    Authors: Federico Becattini, Tiberio Uricchio, Lorenzo Seidenari, Lamberto Ballan, Alberto Del Bimbo

    Abstract: In this paper we deal with the problem of predicting action progress in videos. We argue that this is an extremely important task since it can be valuable for a wide range of interaction applications. To this end we introduce a novel approach, named ProgressNet, capable of predicting when an action takes place in a video, where it is located within the frames, and how far it has progressed during… ▽ More

    Submitted 9 March, 2020; v1 submitted 4 May, 2017; originally announced May 2017.

  25. Automatic Image Annotation via Label Transfer in the Semantic Space

    Authors: Tiberio Uricchio, Lamberto Ballan, Lorenzo Seidenari, Alberto Del Bimbo

    Abstract: Automatic image annotation is among the fundamental problems in computer vision and pattern recognition, and it is becoming increasingly important in order to develop algorithms that are able to search and browse large-scale image collections. In this paper, we propose a label propagation framework based on Kernel Canonical Correlation Analysis (KCCA), which builds a latent semantic space where co… ▽ More

    Submitted 1 June, 2017; v1 submitted 16 May, 2016; originally announced May 2016.

    Comments: To appear in Pattern Recognition

  26. arXiv:1603.06987  [pdf, other

    cs.CV

    Knowledge Transfer for Scene-specific Motion Prediction

    Authors: Lamberto Ballan, Francesco Castaldo, Alexandre Alahi, Francesco Palmieri, Silvio Savarese

    Abstract: When given a single frame of the video, humans can not only interpret the content of the scene, but also they are able to forecast the near future. This ability is mostly driven by their rich prior knowledge about the visual world, both in terms of (i) the dynamics of moving agents, as well as (ii) the semantic of the scene. In this work we exploit the interplay between these two key elements to p… ▽ More

    Submitted 25 July, 2016; v1 submitted 22 March, 2016; originally announced March 2016.

    Comments: Accepted to ECCV 2016

  27. arXiv:1508.07647  [pdf, other

    cs.CV

    Love Thy Neighbors: Image Annotation by Exploiting Image Metadata

    Authors: Justin Johnson, Lamberto Ballan, Fei-Fei Li

    Abstract: Some images that are difficult to recognize on their own may become more clear in the context of a neighborhood of related images with similar social-network metadata. We build on this intuition to improve multilabel image annotation. Our model uses image metadata nonparametrically to generate neighborhoods of related images using Jaccard similarities, then uses a deep neural network to blend visu… ▽ More

    Submitted 21 September, 2015; v1 submitted 30 August, 2015; originally announced August 2015.

    Comments: Accepted to ICCV 2015

  28. Capturing Hands in Action using Discriminative Salient Points and Physics Simulation

    Authors: Dimitrios Tzionas, Luca Ballan, Abhilash Srikantha, Pablo Aponte, Marc Pollefeys, Juergen Gall

    Abstract: Hand motion capture is a popular research field, recently gaining more attention due to the ubiquity of RGB-D sensors. However, even most recent approaches focus on the case of a single isolated hand. In this work, we focus on hands that interact with other hands or objects and present a framework that successfully captures motion in such interaction scenarios for both rigid and articulated object… ▽ More

    Submitted 7 March, 2016; v1 submitted 6 June, 2015; originally announced June 2015.

    Comments: Accepted for publication by the International Journal of Computer Vision (IJCV) on 16.02.2016 (submitted on 17.10.14). A combination into a single framework of an ECCV'12 multicamera-RGB and a monocular-RGBD GCPR'14 hand tracking paper with several extensions, additional experiments and details

  29. arXiv:1503.08248  [pdf, other

    cs.IR cs.CV cs.MM cs.SI

    Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval

    Authors: Xirong Li, Tiberio Uricchio, Lamberto Ballan, Marco Bertini, Cees G. M. Snoek, Alberto Del Bimbo

    Abstract: Where previous reviews on content-based image retrieval emphasize on what can be seen in an image to bridge the semantic gap, this survey considers what people tag about an image. A comprehensive treatise of three closely linked problems, i.e., image tag assignment, refinement, and tag-based image retrieval is presented. While existing works vary in terms of their targeted tasks and methodology, t… ▽ More

    Submitted 23 March, 2016; v1 submitted 27 March, 2015; originally announced March 2015.

    Comments: to appear in ACM Computing Surveys

    ACM Class: H.3.1; H.3.3

    Journal ref: ACM Computing Surveys, Volume 49 Issue 1, 14:1-14:39, June 2016

  30. A Data-Driven Approach for Tag Refinement and Localization in Web Videos

    Authors: Lamberto Ballan, Marco Bertini, Giuseppe Serra, Alberto Del Bimbo

    Abstract: Tagging of visual content is becoming more and more widespread as web-based services and social networks have popularized tagging functionalities among their users. These user-generated tags are used to ease browsing and exploration of media collections, e.g. using tag clouds, or to retrieve multimedia content. However, not all media are equally tagged by users. Using the current systems is easy t… ▽ More

    Submitted 28 May, 2015; v1 submitted 2 July, 2014; originally announced July 2014.

    Comments: Preprint submitted to Computer Vision and Image Understanding (CVIU)