-
SuperFormer: Volumetric Transformer Architectures for MRI Super-Resolution
Authors:
Cristhian Forigua,
Maria Escobar,
Pablo Arbelaez
Abstract:
This paper presents a novel framework for processing volumetric medical information using Visual Transformers (ViTs). First, We extend the state-of-the-art Swin Transformer model to the 3D medical domain. Second, we propose a new approach for processing volumetric information and encoding position in ViTs for 3D applications. We instantiate the proposed framework and present SuperFormer, a volumet…
▽ More
This paper presents a novel framework for processing volumetric medical information using Visual Transformers (ViTs). First, We extend the state-of-the-art Swin Transformer model to the 3D medical domain. Second, we propose a new approach for processing volumetric information and encoding position in ViTs for 3D applications. We instantiate the proposed framework and present SuperFormer, a volumetric transformer-based approach for Magnetic Resonance Imaging (MRI) Super-Resolution. Our method leverages the 3D information of the MRI domain and uses a local self-attention mechanism with a 3D relative positional encoding to recover anatomical details. In addition, our approach takes advantage of multi-domain information from volume and feature domains and fuses them to reconstruct the High-Resolution MRI. We perform an extensive validation on the Human Connectome Project dataset and demonstrate the superiority of volumetric transformers over 3D CNN-based methods. Our code and pretrained models are available at https://github.com/BCV-Uniandes/SuperFormer.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Authors:
Kristen Grauman,
Andrew Westbury,
Lorenzo Torresani,
Kris Kitani,
Jitendra Malik,
Triantafyllos Afouras,
Kumar Ashutosh,
Vijay Baiyya,
Siddhant Bansal,
Bikram Boote,
Eugene Byrne,
Zach Chavis,
Joya Chen,
Feng Cheng,
Fu-Jen Chu,
Sean Crane,
Avijit Dasgupta,
**g Dong,
Maria Escobar,
Cristhian Forigua,
Abrham Gebreselasie,
Sanjay Haresh,
**g Huang,
Md Mohaiminul Islam,
Suyog Jain
, et al. (76 additional authors not shown)
Abstract:
We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from…
▽ More
We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair). 740 participants from 13 cities worldwide performed these activities in 123 different natural scene contexts, yielding long-form captures from 1 to 42 minutes each and 1,286 hours of video combined. The multimodal nature of the dataset is unprecedented: the video is accompanied by multichannel audio, eye gaze, 3D point clouds, camera poses, IMU, and multiple paired language descriptions -- including a novel "expert commentary" done by coaches and teachers and tailored to the skilled-activity domain. To push the frontier of first-person video understanding of skilled human activity, we also present a suite of benchmark tasks and their annotations, including fine-grained activity understanding, proficiency estimation, cross-view translation, and 3D hand/body pose. All resources are open sourced to fuel new research in the community. Project page: http://ego-exo4d-data.org/
△ Less
Submitted 29 April, 2024; v1 submitted 30 November, 2023;
originally announced November 2023.
-
EgoCOL: Egocentric Camera pose estimation for Open-world 3D object Localization @Ego4D challenge 2023
Authors:
Cristhian Forigua,
Maria Escobar,
Jordi Pont-Tuset,
Kevis-Kokitsi Maninis,
Pablo Arbeláez
Abstract:
We present EgoCOL, an egocentric camera pose estimation method for open-world 3D object localization. Our method leverages sparse camera pose reconstructions in a two-fold manner, video and scan independently, to estimate the camera pose of egocentric frames in 3D renders with high recall and precision. We extensively evaluate our method on the Visual Query (VQ) 3D object localization Ego4D benchm…
▽ More
We present EgoCOL, an egocentric camera pose estimation method for open-world 3D object localization. Our method leverages sparse camera pose reconstructions in a two-fold manner, video and scan independently, to estimate the camera pose of egocentric frames in 3D renders with high recall and precision. We extensively evaluate our method on the Visual Query (VQ) 3D object localization Ego4D benchmark. EgoCOL can estimate 62% and 59% more camera poses than the Ego4D baseline in the Ego4D Visual Queries 3D Localization challenge at CVPR 2023 in the val and test sets, respectively. Our code is publicly available at https://github.com/BCV-Uniandes/EgoCOL
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Comparing Variation in Tokenizer Outputs Using a Series of Problematic and Challenging Biomedical Sentences
Authors:
Christopher Meaney,
Therese A Stukel,
Peter C Austin,
Michael Escobar
Abstract:
Background & Objective: Biomedical text data are increasingly available for research. Tokenization is an initial step in many biomedical text mining pipelines. Tokenization is the process of parsing an input biomedical sentence (represented as a digital character sequence) into a discrete set of word/token symbols, which convey focused semantic/syntactic meaning. The objective of this study is to…
▽ More
Background & Objective: Biomedical text data are increasingly available for research. Tokenization is an initial step in many biomedical text mining pipelines. Tokenization is the process of parsing an input biomedical sentence (represented as a digital character sequence) into a discrete set of word/token symbols, which convey focused semantic/syntactic meaning. The objective of this study is to explore variation in tokenizer outputs when applied across a series of challenging biomedical sentences.
Method: Diaz [2015] introduce 24 challenging example biomedical sentences for comparing tokenizer performance. In this study, we descriptively explore variation in outputs of eight tokenizers applied to each example biomedical sentence. The tokenizers compared in this study are the NLTK white space tokenizer, the NLTK Penn Tree Bank tokenizer, Spacy and SciSpacy tokenizers, Stanza/Stanza-Craft tokenizers, the UDPipe tokenizer, and R-tokenizers.
Results: For many examples, tokenizers performed similarly effectively; however, for certain examples, there were meaningful variation in returned outputs. The white space tokenizer often performed differently than other tokenizers. We observed performance similarities for tokenizers implementing rule-based systems (e.g. pattern matching and regular expressions) and tokenizers implementing neural architectures for token classification. Oftentimes, the challenging tokens resulting in the greatest variation in outputs, are those words which convey substantive and focused biomedical/clinical meaning (e.g. x-ray, IL-10, TCR/CD3, CD4+ CD8+, and (Ca2+)-regulated).
Conclusion: When state-of-the-art, open-source tokenizers from Python and R were applied to a series of challenging biomedical example sentences, we observed subtle variation in the returned outputs.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
BoDiffusion: Diffusing Sparse Observations for Full-Body Human Motion Synthesis
Authors:
Angela Castillo,
Maria Escobar,
Guillaume Jeanneret,
Albert Pumarola,
Pablo Arbeláez,
Ali Thabet,
Artsiom Sanakoyeu
Abstract:
Mixed reality applications require tracking the user's full-body motion to enable an immersive experience. However, typical head-mounted devices can only track head and hand movements, leading to a limited reconstruction of full-body motion due to variability in lower body configurations. We propose BoDiffusion -- a generative diffusion model for motion synthesis to tackle this under-constrained r…
▽ More
Mixed reality applications require tracking the user's full-body motion to enable an immersive experience. However, typical head-mounted devices can only track head and hand movements, leading to a limited reconstruction of full-body motion due to variability in lower body configurations. We propose BoDiffusion -- a generative diffusion model for motion synthesis to tackle this under-constrained reconstruction problem. We present a time and space conditioning scheme that allows BoDiffusion to leverage sparse tracking inputs while generating smooth and realistic full-body motion sequences. To the best of our knowledge, this is the first approach that uses the reverse diffusion process to model full-body tracking as a conditional sequence generation task. We conduct experiments on the large-scale motion-capture dataset AMASS and show that our approach outperforms the state-of-the-art approaches by a significant margin in terms of full-body motion realism and joint reconstruction error.
△ Less
Submitted 21 April, 2023;
originally announced April 2023.
-
Video Swin Transformers for Egocentric Video Understanding @ Ego4D Challenges 2022
Authors:
Maria Escobar,
Laura Daza,
Cristina González,
Jordi Pont-Tuset,
Pablo Arbeláez
Abstract:
We implemented Video Swin Transformer as a base architecture for the tasks of Point-of-No-Return temporal localization and Object State Change Classification. Our method achieved competitive performance on both challenges.
We implemented Video Swin Transformer as a base architecture for the tasks of Point-of-No-Return temporal localization and Object State Change Classification. Our method achieved competitive performance on both challenges.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
Generalized Real-World Super-Resolution through Adversarial Robustness
Authors:
Angela Castillo,
María Escobar,
Juan C. Pérez,
Andrés Romero,
Radu Timofte,
Luc Van Gool,
Pablo Arbeláez
Abstract:
Real-world Super-Resolution (SR) has been traditionally tackled by first learning a specific degradation model that resembles the noise and corruption artifacts in low-resolution imagery. Thus, current methods lack generalization and lose their accuracy when tested on unseen types of corruption. In contrast to the traditional proposal, we present Robust Super-Resolution (RSR), a method that levera…
▽ More
Real-world Super-Resolution (SR) has been traditionally tackled by first learning a specific degradation model that resembles the noise and corruption artifacts in low-resolution imagery. Thus, current methods lack generalization and lose their accuracy when tested on unseen types of corruption. In contrast to the traditional proposal, we present Robust Super-Resolution (RSR), a method that leverages the generalization capability of adversarial attacks to tackle real-world SR. Our novel framework poses a paradigm shift in the development of real-world SR methods. Instead of learning a dataset-specific degradation, we employ adversarial attacks to create difficult examples that target the model's weaknesses. Afterward, we use these adversarial examples during training to improve our model's capacity to process noisy inputs. We perform extensive experimentation on synthetic and real-world images and empirically demonstrate that our RSR method generalizes well across datasets without re-training for specific noise priors. By using a single robust model, we outperform state-of-the-art specialized methods on real-world benchmarks.
△ Less
Submitted 25 August, 2021;
originally announced August 2021.
-
SIMBA: Specific Identity Markers for Bone Age Assessment
Authors:
Cristina González,
María Escobar,
Laura Daza,
Felipe Torres,
Gustavo Triana,
Pablo Arbeláez
Abstract:
Bone Age Assessment (BAA) is a task performed by radiologists to diagnose abnormal growth in a child. In manual approaches, radiologists take into account different identity markers when calculating bone age, i.e., chronological age and gender. However, the current automated Bone Age Assessment methods do not completely exploit the information present in the patient's metadata. With this lack of a…
▽ More
Bone Age Assessment (BAA) is a task performed by radiologists to diagnose abnormal growth in a child. In manual approaches, radiologists take into account different identity markers when calculating bone age, i.e., chronological age and gender. However, the current automated Bone Age Assessment methods do not completely exploit the information present in the patient's metadata. With this lack of available methods as motivation, we present SIMBA: Specific Identity Markers for Bone Age Assessment. SIMBA is a novel approach for the task of BAA based on the use of identity markers. For this purpose, we build upon the state-of-the-art model, fusing the information present in the identity markers with the visual features created from the original hand radiograph. We then use this robust representation to estimate the patient's relative bone age: the difference between chronological age and bone age. We validate SIMBA on the Radiological Hand Pose Estimation dataset and find that it outperforms previous state-of-the-art methods. SIMBA sets a trend of a new wave of Computer-aided Diagnosis methods that incorporate all of the data that is available regarding a patient. To promote further research in this area and ensure reproducibility we will provide the source code as well as the pre-trained models of SIMBA.
△ Less
Submitted 13 July, 2020; v1 submitted 10 July, 2020;
originally announced July 2020.
-
Critical Point Calculations by Numerical Inversion of Functions
Authors:
C. N. Parajara,
G. M. Platt,
F. D. Moura Neto,
M. Escobar,
G. B. Libotte
Abstract:
In this work, we propose a new approach to the problem of critical point calculation, based on the formulation of Heidemann and Khalil (1980). This leads to a $2 \times 2$ system of nonlinear algebraic equations in temperature and molar volume, which makes possible the prediction of critical points of the mixture through an adaptation of the technique of inversion of functions from the plane to th…
▽ More
In this work, we propose a new approach to the problem of critical point calculation, based on the formulation of Heidemann and Khalil (1980). This leads to a $2 \times 2$ system of nonlinear algebraic equations in temperature and molar volume, which makes possible the prediction of critical points of the mixture through an adaptation of the technique of inversion of functions from the plane to the plane, proposed by Malta, Saldanha, and Tomei (1993). The results are compared to those obtained by three methodologies: ($i$) the classical method of Heidemann and Khalil (1980), which uses a double-loop structure, also in terms of temperature and molar volume; ($ii$) the algorithm of Dimitrakopoulos, Jia, and Li (2014), which employs a damped Newton algorithm and ($iii$) the methodology proposed by Nichita and Gomez (2010), based on a stochastic algorithm. The proposed methodology proves to be robust and accurate in the prediction of critical points, as well as provides a global view of the nonlinear problem.
△ Less
Submitted 30 May, 2020;
originally announced June 2020.
-
MT-Adapted Datasheets for Datasets: Template and Repository
Authors:
Marta R. Costa-jussà,
Roger Creus,
Oriol Domingo,
Albert Domínguez,
Miquel Escobar,
Cayetana López,
Marina Garcia,
Margarita Geleta
Abstract:
In this report we are taking the standardized model proposed by Gebru et al. (2018) for documenting the popular machine translation datasets of the EuroParl (Koehn, 2005) and News-Commentary (Barrault et al., 2019). Within this documentation process, we have adapted the original datasheet to the particular case of data consumers within the Machine Translation area. We are also proposing a reposito…
▽ More
In this report we are taking the standardized model proposed by Gebru et al. (2018) for documenting the popular machine translation datasets of the EuroParl (Koehn, 2005) and News-Commentary (Barrault et al., 2019). Within this documentation process, we have adapted the original datasheet to the particular case of data consumers within the Machine Translation area. We are also proposing a repository for collecting the adapted datasheets in this research area
△ Less
Submitted 27 May, 2020;
originally announced May 2020.
-
Damage detection in a unidimensional truss using the firefly optimization algorithm and finite elements
Authors:
Camilo Manrique Escobar,
Octavio Andrés González-Estrada,
Heller Guillermo Sánchez Acevedo
Abstract:
In this paper, we investigate the damage detection of structures seen as an optimization problem, using modal characterization to evaluate the dynamic response of the structure given a damage model. We implemented the firefly optimization algorithm with a simple numerical damage model to assess the performance of the method and its advantages for structural health monitoring (SHM). We show some im…
▽ More
In this paper, we investigate the damage detection of structures seen as an optimization problem, using modal characterization to evaluate the dynamic response of the structure given a damage model. We implemented the firefly optimization algorithm with a simple numerical damage model to assess the performance of the method and its advantages for structural health monitoring (SHM). We show some implementation details and discuss the obtained results.
△ Less
Submitted 24 April, 2018; v1 submitted 9 June, 2017;
originally announced June 2017.