-
DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Authors:
Alexander Khazatsky,
Karl Pertsch,
Suraj Nair,
Ashwin Balakrishna,
Sudeep Dasari,
Siddharth Karamcheti,
Soroush Nasiriany,
Mohan Kumar Srirama,
Lawrence Yunliang Chen,
Kirsty Ellis,
Peter David Fagan,
Joey Hejna,
Masha Itkina,
Marion Lepert,
Yecheng Jason Ma,
Patrick Tree Miller,
Jimmy Wu,
Suneel Belkhale,
Shivin Dass,
Huy Ha,
Arhan Jain,
Abraham Lee,
Youngwoon Lee,
Marius Memmel,
Sungjae Park
, et al. (74 additional authors not shown)
Abstract:
The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu…
▽ More
The creation of large, diverse, high-quality robot manipulation datasets is an important step** stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
TartanDrive 2.0: More Modalities and Better Infrastructure to Further Self-Supervised Learning Research in Off-Road Driving Tasks
Authors:
Matthew Sivaprakasam,
Parv Maheshwari,
Mateo Guaman Castro,
Samuel Triest,
Micah Nye,
Steve Willits,
Andrew Saba,
Wenshan Wang,
Sebastian Scherer
Abstract:
We present TartanDrive 2.0, a large-scale off-road driving dataset for self-supervised learning tasks. In 2021 we released TartanDrive 1.0, which is one of the largest datasets for off-road terrain. As a follow-up to our original dataset, we collected seven hours of data at speeds of up to 15m/s with the addition of three new LiDAR sensors alongside the original camera, inertial, GPS, and proprioc…
▽ More
We present TartanDrive 2.0, a large-scale off-road driving dataset for self-supervised learning tasks. In 2021 we released TartanDrive 1.0, which is one of the largest datasets for off-road terrain. As a follow-up to our original dataset, we collected seven hours of data at speeds of up to 15m/s with the addition of three new LiDAR sensors alongside the original camera, inertial, GPS, and proprioceptive sensors. We also release the tools we use for collecting, processing, and querying the data, including our metadata system designed to further the utility of our data. Custom infrastructure allows end users to reconfigure the data to cater to their own platforms. These tools and infrastructure alongside the dataset are useful for a variety of tasks in the field of off-road autonomy and, by releasing them, we encourage collaborative data aggregation. These resources lower the barrier to entry to utilizing large-scale datasets, thereby hel** facilitate the advancement of robotics in areas such as self-supervised learning, multi-modal perception, inverse reinforcement learning, and representation learning. The dataset is available at https://github.com/castacks/tartan drive 2.0.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Invariant Causal Prediction with Locally Linear Models
Authors:
Alexander Mey,
Rui Manuel Castro
Abstract:
We consider the task of identifying the causal parents of a target variable among a set of candidate variables from observational data. Our main assumption is that the candidate variables are observed in different environments which may, for example, correspond to different settings of a machine or different time intervals in a dynamical process. Under certain assumptions different environments ca…
▽ More
We consider the task of identifying the causal parents of a target variable among a set of candidate variables from observational data. Our main assumption is that the candidate variables are observed in different environments which may, for example, correspond to different settings of a machine or different time intervals in a dynamical process. Under certain assumptions different environments can be regarded as interventions on the observed system. We assume a linear relationship between target and covariates, which can be different in each environment with the only restriction that the causal structure is invariant across environments. This is an extension of the ICP ($\textbf{I}$nvariant $\textbf{C}$ausal $\textbf{P}$rediction) principle by Peters et al. [2016], who assumed a fixed linear relationship across all environments. Within our proposed setting we provide sufficient conditions for identifiability of the causal parents and introduce a practical method called LoLICaP ($\textbf{Lo}$cally $\textbf{L}$inear $\textbf{I}$nvariant $\textbf{Ca}$usal $\textbf{P}$rediction), which is based on a hypothesis test for parent identification using a ratio of minimum and maximum statistics. We then show in a simplified setting that the statistical power of LoLICaP converges exponentially fast in the sample size, and finally we analyze the behavior of LoLICaP experimentally in more general settings.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Rastro-DM: data mining with a trail
Authors:
Marcus Vinicius Borela de Castro,
Remis Balaniuk
Abstract:
This paper proposes a methodology for documenting data mining (DM) projects, Rastro-DM (Trail Data Mining), with a focus not on the model that is generated, but on the processes behind its construction, in order to leave a trail (Rastro in Portuguese) of planned actions, training completed, results obtained, and lessons learned. The proposed practices are complementary to structuring methodologies…
▽ More
This paper proposes a methodology for documenting data mining (DM) projects, Rastro-DM (Trail Data Mining), with a focus not on the model that is generated, but on the processes behind its construction, in order to leave a trail (Rastro in Portuguese) of planned actions, training completed, results obtained, and lessons learned. The proposed practices are complementary to structuring methodologies of DM, such as CRISP-DM, which establish a methodological and paradigmatic framework for the DM process. The application of best practices and their benefits is illustrated in a project called 'Cladop' that was created for the classification of PDF documents associated with the investigative process of damages to the Brazilian Federal Public Treasury. Building the Rastro-DM kit in the context of a project is a small step that can lead to an institutional leap to be achieved by sharing and using the trail across the enterprise.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Authors:
Open X-Embodiment Collaboration,
Abby O'Neill,
Abdul Rehman,
Abhinav Gupta,
Abhiram Maddukuri,
Abhishek Gupta,
Abhishek Padalkar,
Abraham Lee,
Acorn Pooley,
Agrim Gupta,
Ajay Mandlekar,
A**kya Jain,
Albert Tung,
Alex Bewley,
Alex Herzog,
Alex Irpan,
Alexander Khazatsky,
Anant Rai,
Anchit Gupta,
Andrew Wang,
Andrey Kolobov,
Anikait Singh,
Animesh Garg,
Aniruddha Kembhavi,
Annie Xie
, et al. (267 additional authors not shown)
Abstract:
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method…
▽ More
Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io.
△ Less
Submitted 1 June, 2024; v1 submitted 13 October, 2023;
originally announced October 2023.
-
Virtual Harassment, Real Understanding: Using a Serious Game and Bayesian Networks to Study Cyberbullying
Authors:
Jaime Pérez,
Mario Castro,
Edmond Awad,
Gregorio López
Abstract:
Cyberbullying among minors is a pressing concern in our digital society, necessitating effective prevention and intervention strategies. Traditional data collection methods often intrude on privacy and yield limited insights. This study explores an innovative approach, employing a serious game - designed with purposes beyond entertainment - as a non-intrusive tool for data collection and education…
▽ More
Cyberbullying among minors is a pressing concern in our digital society, necessitating effective prevention and intervention strategies. Traditional data collection methods often intrude on privacy and yield limited insights. This study explores an innovative approach, employing a serious game - designed with purposes beyond entertainment - as a non-intrusive tool for data collection and education. In contrast to traditional correlation-based analyses, we propose a causality-based approach using Bayesian Networks to unravel complex relationships in the collected data and quantify result uncertainties. This robust analytical tool yields interpretable outcomes, enhances transparency in assumptions, and fosters open scientific discourse. Preliminary pilot studies with the serious game show promising results, surpassing the informative capacity of traditional demographic and psychological questionnaires, suggesting its potential as an alternative methodology. Additionally, we demonstrate how our approach facilitates the examination of risk profiles and the identification of intervention strategies to mitigate this cybercrime. We also address research limitations and potential enhancements, considering the noise and variability of data in social studies and video games. This research advances our understanding of cyberbullying and showcase the potential of serious games and causality-based approaches in studying complex social issues.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Multi-FedLS: a Framework for Cross-Silo Federated Learning Applications on Multi-Cloud Environments
Authors:
Rafaela C. Brum,
Maria Clicia Stelling de Castro,
Luciana Arantes,
Lúcia Maria de A. Drummond,
Pierre Sens
Abstract:
Federated Learning (FL) is a distributed Machine Learning (ML) technique that can benefit from cloud environments while preserving data privacy. We propose Multi-FedLS, a framework that manages multi-cloud resources, reducing execution time and financial costs of Cross-Silo Federated Learning applications by using preemptible VMs, cheaper than on-demand ones but that can be revoked at any time. Ou…
▽ More
Federated Learning (FL) is a distributed Machine Learning (ML) technique that can benefit from cloud environments while preserving data privacy. We propose Multi-FedLS, a framework that manages multi-cloud resources, reducing execution time and financial costs of Cross-Silo Federated Learning applications by using preemptible VMs, cheaper than on-demand ones but that can be revoked at any time. Our framework encloses four modules: Pre-Scheduling, Initial Map**, Fault Tolerance, and Dynamic Scheduler. This paper extends our previous work \cite{brum2022sbac} by formally describing the Multi-FedLS resource manager framework and its modules. Experiments were conducted with three Cross-Silo FL applications on CloudLab and a proof-of-concept confirms that Multi-FedLS can be executed on a multi-cloud composed by AWS and GCP, two commercial cloud providers. Results show that the problem of executing Cross-Silo FL applications in multi-cloud environments with preemptible VMs can be efficiently resolved using a mathematical formulation, fault tolerance techniques, and a simple heuristic to choose a new VM in case of revocation.
△ Less
Submitted 17 August, 2023;
originally announced August 2023.
-
Generation of Probabilistic Synthetic Data for Serious Games: A Case Study on Cyberbullying
Authors:
Jaime Pérez,
Mario Castro,
Edmond Awad,
Gregorio López
Abstract:
Synthetic data generation has been a growing area of research in recent years. However, its potential applications in serious games have not been thoroughly explored. Advances in this field could anticipate data modelling and analysis, as well as speed up the development process. To try to fill this gap in the literature, we propose a simulator architecture for generating probabilistic synthetic d…
▽ More
Synthetic data generation has been a growing area of research in recent years. However, its potential applications in serious games have not been thoroughly explored. Advances in this field could anticipate data modelling and analysis, as well as speed up the development process. To try to fill this gap in the literature, we propose a simulator architecture for generating probabilistic synthetic data for serious games based on interactive narratives. This architecture is designed to be generic and modular so that it can be used by other researchers on similar problems. To simulate the interaction of synthetic players with questions, we use a cognitive testing model based on the Item Response Theory framework. We also show how probabilistic graphical models (in particular Bayesian networks) can be used to introduce expert knowledge and external data into the simulation. Finally, we apply the proposed architecture and methods in a use case of a serious game focused on cyberbullying. We perform Bayesian inference experiments using a hierarchical model to demonstrate the identifiability and robustness of the generated data.
△ Less
Submitted 3 July, 2023; v1 submitted 2 June, 2023;
originally announced June 2023.
-
Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem
Authors:
Behnaz Arzani,
Siva Kesava Reddy Kakarla,
Miguel Castro,
Srikanth Kandula,
Saeed Maleki,
Luke Marshall
Abstract:
We show communication schedulers' recent work proposed for ML collectives does not scale to the increasing problem sizes that arise from training larger models. These works also often produce suboptimal schedules. We make a connection with similar problems in traffic engineering and propose a new method, TECCL, that finds better quality schedules (e.g., finishes collectives faster and/or while sen…
▽ More
We show communication schedulers' recent work proposed for ML collectives does not scale to the increasing problem sizes that arise from training larger models. These works also often produce suboptimal schedules. We make a connection with similar problems in traffic engineering and propose a new method, TECCL, that finds better quality schedules (e.g., finishes collectives faster and/or while sending fewer bytes) and does so more quickly on larger topologies. We present results on many different GPU topologies that show substantial improvement over the state-of-the-art.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
LATTE: Label-efficient Incident Phenoty** from Longitudinal Electronic Health Records
Authors:
Jun Wen,
Jue Hou,
Clara-Lea Bonzel,
Yihan Zhao,
Victor M. Castro,
Vivian S. Gainer,
Dana Weisenfeld,
Tianrun Cai,
Yuk-Lam Ho,
Vidul A. Panickan,
Lauren Costa,
Chuan Hong,
J. Michael Gaziano,
Katherine P. Liao,
Junwei Lu,
Kelly Cho,
Tianxi Cai
Abstract:
Electronic health record (EHR) data are increasingly used to support real-world evidence (RWE) studies. Yet its ability to generate reliable RWE is limited by the lack of readily available precise information on the timing of clinical events such as the onset time of heart failure. We propose a LAbel-efficienT incidenT phEnoty** (LATTE) algorithm to accurately annotate the timing of clinical eve…
▽ More
Electronic health record (EHR) data are increasingly used to support real-world evidence (RWE) studies. Yet its ability to generate reliable RWE is limited by the lack of readily available precise information on the timing of clinical events such as the onset time of heart failure. We propose a LAbel-efficienT incidenT phEnoty** (LATTE) algorithm to accurately annotate the timing of clinical events from longitudinal EHR data. By leveraging the pre-trained semantic embedding vectors from large-scale EHR data as prior knowledge, LATTE selects predictive EHR features in a concept re-weighting module by mining their relationship to the target event and compresses their information into longitudinal visit embeddings through a visit attention learning network. LATTE employs a recurrent neural network to capture the sequential dependency between the target event and visit embeddings before/after it. To improve label efficiency, LATTE constructs highly informative longitudinal silver-standard labels from large-scale unlabeled patients to perform unsupervised pre-training and semi-supervised joint training. Finally, LATTE enhances cross-site portability via contrastive representation learning. LATTE is evaluated on three analyses: the onset of type-2 diabetes, heart failure, and the onset and relapses of multiple sclerosis. We use various evaluation metrics present in the literature including the $ABC_{gain}$, the proportion of reduction in the area between the observed event indicator and the predicted cumulative incidences in reference to the prediction per incident prevalence. LATTE consistently achieves substantial improvement over benchmark methods such as SAMGEP and RETAIN in all settings.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Honeycomb: ordered key-value store acceleration on an FPGA-based SmartNIC
Authors:
Junyi Liu,
Aleksandar Dragojevic,
Shane Flemming,
Antonios Katsarakis,
Dario Korolija,
Igor Zablotchi,
Ho-cheung Ng,
Anuj Kalia,
Miguel Castro
Abstract:
In-memory ordered key-value stores are an important building block in modern distributed applications. We present Honeycomb, a hybrid software-hardware system for accelerating read-dominated workloads on ordered key-value stores that provides linearizability for all operations including scans. Honeycomb stores a B-Tree in host memory, and executes SCAN and GET on an FPGA-based SmartNIC, and PUT, U…
▽ More
In-memory ordered key-value stores are an important building block in modern distributed applications. We present Honeycomb, a hybrid software-hardware system for accelerating read-dominated workloads on ordered key-value stores that provides linearizability for all operations including scans. Honeycomb stores a B-Tree in host memory, and executes SCAN and GET on an FPGA-based SmartNIC, and PUT, UPDATE and DELETE on the CPU. This approach enables large stores and simplifies the FPGA implementation but raises the challenge of data access and synchronization across the slow PCIe bus. We describe how Honeycomb overcomes this challenge with careful data structure design, caching, request parallelism with out-of-order request execution, wait-free read operations, and batching synchronization between the CPU and the FPGA. For read-heavy YCSB workloads, Honeycomb improves the throughput of a state-of-the-art ordered key-value store by at least 1.8x. For scan-heavy workloads inspired by cloud storage, Honeycomb improves throughput by more than 2x. The cost-performance, which is more important for large-scale deployments, is improved by at least 1.5x on these workloads.
△ Less
Submitted 6 April, 2023; v1 submitted 24 March, 2023;
originally announced March 2023.
-
Artificial-intelligence-based molecular classification of diffuse gliomas using rapid, label-free optical imaging
Authors:
Todd C. Hollon,
Cheng Jiang,
Asadur Chowdury,
Mustafa Nasir-Moin,
Akhil Kondepudi,
Alexander Aabedi,
Arjun Adapa,
Wajd Al-Holou,
Jason Heth,
Oren Sagher,
Pedro Lowenstein,
Maria Castro,
Lisa Irina Wadiura,
Georg Widhalm,
Volker Neuschmelting,
David Reinecke,
Niklas von Spreckelsen,
Mitchel S. Berger,
Shawn L. Hervey-Jumper,
John G. Golfinos,
Matija Snuderl,
Sandra Camelo-Piragua,
Christian Freudiger,
Honglak Lee,
Daniel A. Orringer
Abstract:
Molecular classification has transformed the management of brain tumors by enabling more accurate prognostication and personalized treatment. However, timely molecular diagnostic testing for patients with brain tumors is limited, complicating surgical and adjuvant treatment and obstructing clinical trial enrollment. In this study, we developed DeepGlioma, a rapid ($< 90$ seconds), artificial-intel…
▽ More
Molecular classification has transformed the management of brain tumors by enabling more accurate prognostication and personalized treatment. However, timely molecular diagnostic testing for patients with brain tumors is limited, complicating surgical and adjuvant treatment and obstructing clinical trial enrollment. In this study, we developed DeepGlioma, a rapid ($< 90$ seconds), artificial-intelligence-based diagnostic screening system to streamline the molecular diagnosis of diffuse gliomas. DeepGlioma is trained using a multimodal dataset that includes stimulated Raman histology (SRH); a rapid, label-free, non-consumptive, optical imaging method; and large-scale, public genomic data. In a prospective, multicenter, international testing cohort of patients with diffuse glioma ($n=153$) who underwent real-time SRH imaging, we demonstrate that DeepGlioma can predict the molecular alterations used by the World Health Organization to define the adult-type diffuse glioma taxonomy (IDH mutation, 1p19q co-deletion and ATRX mutation), achieving a mean molecular classification accuracy of $93.3\pm 1.6\%$. Our results represent how artificial intelligence and optical histology can be used to provide a rapid and scalable adjunct to wet lab methods for the molecular screening of patients with diffuse glioma.
△ Less
Submitted 23 March, 2023;
originally announced March 2023.
-
Role-playing software architecture styles
Authors:
Laura M. Castro
Abstract:
Software Architecture, from definition to maintenance and evolution, is a complex aspect of software development and, consequently, a challenging subject when it comes to teaching it, and learning it.
Many research efforts have been devoted to designing teaching approaches, strategies and tools. Most of them, however, focus on the knowledge itself and the ways to convey it to students, rather th…
▽ More
Software Architecture, from definition to maintenance and evolution, is a complex aspect of software development and, consequently, a challenging subject when it comes to teaching it, and learning it.
Many research efforts have been devoted to designing teaching approaches, strategies and tools. Most of them, however, focus on the knowledge itself and the ways to convey it to students, rather than on the different learning styles of students themselves.
Teaching methods which predominantly rely on verbal and written communication, are very well aligned with some learning styles. However, students with learning styles that benefit more from physical activity or first-hand experience, need to defer to cognitive processes that are less natural to them.
In this work, we propose an innovative use of role-playing as teaching strategy for architecture models of reference (i.e. layered, pipe and filter, client-server, etc.). This role-playing of different software architectures, in which students play the part of specific components in the system, intends to complement other classical teaching materials, such as in-person or recorded lectures, lab assignments, or development projects.
Addressing all learning styles within a classroom is key to ensure that we favour and foster the students' different learning processes, and give everyone an even playfield in which to best develop their capabilities as Software Architects.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Adaptive Selective Sampling for Online Prediction with Experts
Authors:
Rui M. Castro,
Fredrik Hellström,
Tim van Erven
Abstract:
We consider online prediction of a binary sequence with expert advice. For this setting, we devise label-efficient forecasting algorithms, which use a selective sampling scheme that enables collecting much fewer labels than standard procedures, while still retaining optimal worst-case regret guarantees. These algorithms are based on exponentially weighted forecasters, suitable for settings with an…
▽ More
We consider online prediction of a binary sequence with expert advice. For this setting, we devise label-efficient forecasting algorithms, which use a selective sampling scheme that enables collecting much fewer labels than standard procedures, while still retaining optimal worst-case regret guarantees. These algorithms are based on exponentially weighted forecasters, suitable for settings with and without a perfect expert. For a scenario where one expert is strictly better than the others in expectation, we show that the label complexity of the label-efficient forecaster scales roughly as the square root of the number of rounds. Finally, we present numerical experiments empirically showing that the normalized regret of the label-efficient forecaster can asymptotically match known minimax rates for pool-based active learning, suggesting it can optimally adapt to benign settings.
△ Less
Submitted 20 October, 2023; v1 submitted 16 February, 2023;
originally announced February 2023.
-
Serious Games and AI: Challenges and Opportunities for Computational Social Science
Authors:
Jaime Pérez,
Mario Castro,
Gregorio López
Abstract:
The video game industry plays an essential role in the entertainment sphere of our society. However, from Monopoly to Flight Simulators, serious games have also been appealing tools for learning a new language, conveying values, or training skills. Furthermore, the resurgence of Artificial Intelligence (AI) and data science in the last decade has created a unique opportunity since the amount of da…
▽ More
The video game industry plays an essential role in the entertainment sphere of our society. However, from Monopoly to Flight Simulators, serious games have also been appealing tools for learning a new language, conveying values, or training skills. Furthermore, the resurgence of Artificial Intelligence (AI) and data science in the last decade has created a unique opportunity since the amount of data collected through a game is immense, as is the amount of data needed to feed such AI algorithms. This paper aims to identify relevant research lines using Serious Games as a novel research tool, especially in Computational Social Sciences. To contextualize, we also conduct a (non-systematic) literature review of this field. We conclude that the synergy between games and data can foster the use of AI for good and open up new strategies to empower humanity and support social research with novel computational tools. We also discuss the challenges and new opportunities that arise from aspiring to such lofty goals.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
Learning Risk-Aware Costmaps via Inverse Reinforcement Learning for Off-Road Navigation
Authors:
Samuel Triest,
Mateo Guaman Castro,
Parv Maheshwari,
Matthew Sivaprakasam,
Wenshan Wang,
Sebastian Scherer
Abstract:
The process of designing costmaps for off-road driving tasks is often a challenging and engineering-intensive task. Recent work in costmap design for off-road driving focuses on training deep neural networks to predict costmaps from sensory observations using corpora of expert driving data. However, such approaches are generally subject to over-confident mispredictions and are rarely evaluated in-…
▽ More
The process of designing costmaps for off-road driving tasks is often a challenging and engineering-intensive task. Recent work in costmap design for off-road driving focuses on training deep neural networks to predict costmaps from sensory observations using corpora of expert driving data. However, such approaches are generally subject to over-confident mispredictions and are rarely evaluated in-the-loop on physical hardware. We present an inverse reinforcement learning-based method of efficiently training deep cost functions that are uncertainty-aware. We do so by leveraging recent advances in highly parallel model-predictive control and robotic risk estimation. In addition to demonstrating improvement at reproducing expert trajectories, we also evaluate the efficacy of these methods in challenging off-road navigation scenarios. We observe that our method significantly outperforms a geometric baseline, resulting in 44% improvement in expert path reconstruction and 57% fewer interventions in practice. We also observe that varying the risk tolerance of the vehicle results in qualitatively different navigation behaviors, especially with respect to higher-risk scenarios such as slopes and tall grass.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
Lightweight Structure-Aware Attention for Visual Understanding
Authors:
Heeseung Kwon,
Francisco M. Castro,
Manuel J. Marin-Jimenez,
Nicolas Guil,
Karteek Alahari
Abstract:
Vision Transformers (ViTs) have become a dominant paradigm for visual representation learning with self-attention operators. Although these operators provide flexibility to the model with their adjustable attention kernels, they suffer from inherent limitations: (1) the attention kernel is not discriminative enough, resulting in high redundancy of the ViT layers, and (2) the complexity in computat…
▽ More
Vision Transformers (ViTs) have become a dominant paradigm for visual representation learning with self-attention operators. Although these operators provide flexibility to the model with their adjustable attention kernels, they suffer from inherent limitations: (1) the attention kernel is not discriminative enough, resulting in high redundancy of the ViT layers, and (2) the complexity in computation and memory is quadratic in the sequence length. In this paper, we propose a novel attention operator, called lightweight structure-aware attention (LiSA), which has a better representation power with log-linear complexity. Our operator learns structural patterns by using a set of relative position embeddings (RPEs). To achieve log-linear complexity, the RPEs are approximated with fast Fourier transforms. Our experiments and ablation studies demonstrate that ViTs based on the proposed operator outperform self-attention and other existing operators, achieving state-of-the-art results on ImageNet, and competitive results on other visual understanding benchmarks such as COCO and Something-Something-V2. The source code of our approach will be released online.
△ Less
Submitted 29 November, 2022;
originally announced November 2022.
-
Semi-autonomous Prosthesis Control Using Minimal Depth Information and Vibrotactile Feedback
Authors:
Miguel Nobre Castro,
Strahinja Dosen
Abstract:
A semi-autonomous prosthesis control based on computer vision can be used to improve performance while decreasing the cognitive burden, especially when using advanced systems with multiple functions. However, a drawback of this approach is that it relies on the complex processing of a significant amount of data (e.g., a point cloud provided by a depth sensor), which can be a challenge when deployi…
▽ More
A semi-autonomous prosthesis control based on computer vision can be used to improve performance while decreasing the cognitive burden, especially when using advanced systems with multiple functions. However, a drawback of this approach is that it relies on the complex processing of a significant amount of data (e.g., a point cloud provided by a depth sensor), which can be a challenge when deploying such a system onto an embedded prosthesis controller. In the present study, therefore, we propose a novel method to reconstruct the shape of the target object using minimal data. Specifically, four concurrent laser scanner lines provide partial contours of the object cross-section. Simple geometry is then used to reconstruct the dimensions and orientation of spherical, cylindrical and cuboid objects. The prototype system was implemented using depth sensor to simulate the scan lines and vibrotactile feedback to aid the user during aiming of the laser towards the target object. The prototype was tested on ten able-bodied volunteers who used the semi-autonomous prosthesis to grasp a set of ten objects of different shape, size and orientation. The novel prototype was compared against the benchmark system, which used the full depth data. The results showed that novel system could be used to successfully handle all the objects, and that the performance improved with training, although it was still somewhat worse compared to the benchmark. The present study is therefore an important step towards building a compact system for embedded depth sensing specialized for prosthesis gras**.
△ Less
Submitted 2 October, 2022;
originally announced October 2022.
-
How Does It Feel? Self-Supervised Costmap Learning for Off-Road Vehicle Traversability
Authors:
Mateo Guaman Castro,
Samuel Triest,
Wenshan Wang,
Jason M. Gregory,
Felix Sanchez,
John G. Rogers III,
Sebastian Scherer
Abstract:
Estimating terrain traversability in off-road environments requires reasoning about complex interaction dynamics between the robot and these terrains. However, it is challenging to create informative labels to learn a model in a supervised manner for these interactions. We propose a method that learns to predict traversability costmaps by combining exteroceptive environmental information with prop…
▽ More
Estimating terrain traversability in off-road environments requires reasoning about complex interaction dynamics between the robot and these terrains. However, it is challenging to create informative labels to learn a model in a supervised manner for these interactions. We propose a method that learns to predict traversability costmaps by combining exteroceptive environmental information with proprioceptive terrain interaction feedback in a self-supervised manner. Additionally, we propose a novel way of incorporating robot velocity in the costmap prediction pipeline. We validate our method in multiple short and large-scale navigation tasks on challenging off-road terrains using two different large, all-terrain robots. Our short-scale navigation results show that using our learned costmaps leads to overall smoother navigation, and provides the robot with a more fine-grained understanding of the robot-terrain interactions. Our large-scale navigation trials show that we can reduce the number of interventions by up to 57% compared to an occupancy-based navigation baseline in challenging off-road courses ranging from 400 m to 3150 m. Appendix and full experiment videos can be found in our website: https://mateoguaman.github.io/hdif.
△ Less
Submitted 14 February, 2023; v1 submitted 22 September, 2022;
originally announced September 2022.
-
HammingMesh: A Network Topology for Large-Scale Deep Learning
Authors:
Torsten Hoefler,
Tommaso Bonato,
Daniele De Sensi,
Salvatore Di Girolamo,
Shigang Li,
Marco Heddes,
Jon Belk,
Deepak Goel,
Miguel Castro,
Steve Scott
Abstract:
Numerous microarchitectural optimizations unlocked tremendous processing power for deep neural networks that in turn fueled the AI revolution. With the exhaustion of such optimizations, the growth of modern AI is now gated by the performance of training systems, especially their data movement. Instead of focusing on single accelerators, we investigate data-movement characteristics of large-scale t…
▽ More
Numerous microarchitectural optimizations unlocked tremendous processing power for deep neural networks that in turn fueled the AI revolution. With the exhaustion of such optimizations, the growth of modern AI is now gated by the performance of training systems, especially their data movement. Instead of focusing on single accelerators, we investigate data-movement characteristics of large-scale training at full system scale. Based on our workload analysis, we design HammingMesh, a novel network topology that provides high bandwidth at low cost with high job scheduling flexibility. Specifically, HammingMesh can support full bandwidth and isolation to deep learning training jobs with two dimensions of parallelism. Furthermore, it also supports high global bandwidth for generic traffic. Thus, HammingMesh will power future large-scale deep learning systems with extreme bandwidth requirements.
△ Less
Submitted 21 October, 2022; v1 submitted 3 September, 2022;
originally announced September 2022.
-
Immersion Metrics for Virtual Reality
Authors:
Matias N. Selzer,
Silvia M. Castro
Abstract:
Technological advances in recent years have promoted the development of virtual reality systems that have a wide variety of hardware and software characteristics, providing varying degrees of immersion. Immersion is an objective property of the virtual reality system that depends on both its hardware and software characteristics. Virtual reality systems are currently attempting to improve immersio…
▽ More
Technological advances in recent years have promoted the development of virtual reality systems that have a wide variety of hardware and software characteristics, providing varying degrees of immersion. Immersion is an objective property of the virtual reality system that depends on both its hardware and software characteristics. Virtual reality systems are currently attempting to improve immersion as much as possible. However, there is no metric to measure the level of immersion of a virtual reality system based on its characteristics. To date, the influence of these hardware and software variables on immersion has only been considered individually or in small groups. The way these system variables simultaneously affect immersion has not been analyzed either. In this paper, we propose immersion metrics for virtual reality systems based on their hardware and software variables, as well as the development process that led to their formulation. From the conducted experiment and the obtained data, we followed a methodology to find immersion models based on the variables of the system. The immersion metrics presented in this work offer a useful tool in the area of virtual reality and immersive technologies, not only to measure the immersion of any virtual reality system but also to analyze the relationship and importance of the variables of these systems.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
Learning Reward Machines: A Study in Partially Observable Reinforcement Learning
Authors:
Rodrigo Toro Icarte,
Ethan Waldie,
Toryn Q. Klassen,
Richard Valenzano,
Margarita P. Castro,
Sheila A. McIlraith
Abstract:
Reinforcement learning (RL) is a central problem in artificial intelligence. This problem consists of defining artificial agents that can learn optimal behaviour by interacting with an environment -- where the optimal behaviour is defined with respect to a reward signal that the agent seeks to maximize. Reward machines (RMs) provide a structured, automata-based representation of a reward function…
▽ More
Reinforcement learning (RL) is a central problem in artificial intelligence. This problem consists of defining artificial agents that can learn optimal behaviour by interacting with an environment -- where the optimal behaviour is defined with respect to a reward signal that the agent seeks to maximize. Reward machines (RMs) provide a structured, automata-based representation of a reward function that enables an RL agent to decompose an RL problem into structured subproblems that can be efficiently learned via off-policy learning. Here we show that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems. We pose the task of learning RMs as a discrete optimization problem where the objective is to find an RM that decomposes the problem into a set of subproblems such that the combination of their optimal memoryless policies is an optimal policy for the original problem. We show the effectiveness of this approach on three partially observable domains, where it significantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations, and broader potential.
△ Less
Submitted 17 December, 2021;
originally announced December 2021.
-
Full-Velocity Radar Returns by Radar-Camera Fusion
Authors:
Yunfei Long,
Daniel Morris,
Xiaoming Liu,
Marcos Castro,
Punarjay Chakravarty,
Praveen Narayanan
Abstract:
A distinctive feature of Doppler radar is the measurement of velocity in the radial direction for radar points. However, the missing tangential velocity component hampers object velocity estimation as well as temporal integration of radar sweeps in dynamic scenes. Recognizing that fusing camera with radar provides complementary information to radar, in this paper we present a closed-form solution…
▽ More
A distinctive feature of Doppler radar is the measurement of velocity in the radial direction for radar points. However, the missing tangential velocity component hampers object velocity estimation as well as temporal integration of radar sweeps in dynamic scenes. Recognizing that fusing camera with radar provides complementary information to radar, in this paper we present a closed-form solution for the point-wise, full-velocity estimate of Doppler returns using the corresponding optical flow from camera images. Additionally, we address the association problem between radar returns and camera images with a neural network that is trained to estimate radar-camera correspondences. Experimental results on the nuScenes dataset verify the validity of the method and show significant improvements over the state-of-the-art in velocity estimation and accumulation of radar points.
△ Less
Submitted 24 August, 2021;
originally announced August 2021.
-
Detecting Oxbow Code in Erlang Codebases with the Highest Degree of Certainty
Authors:
Fernando Benavides Rodríguez,
Laura M. Castro
Abstract:
The presence of source code that is no longer needed is a handicap to project maintainability. The larger and longer-lived the project, the higher the chances of accumulating dead code in its different forms.
Manually detecting unused code is time-consuming, tedious, error-prone, and requires a great level of deep knowledge about the codebase. In this paper, we examine the kinds of dead code (sp…
▽ More
The presence of source code that is no longer needed is a handicap to project maintainability. The larger and longer-lived the project, the higher the chances of accumulating dead code in its different forms.
Manually detecting unused code is time-consuming, tedious, error-prone, and requires a great level of deep knowledge about the codebase. In this paper, we examine the kinds of dead code (specifically, oxbow code) that can appear in Erlang projects, and formulate rules to identify them with high accuracy.
We also present an open-source static analyzer that implements these rules, allowing for the automatic detection and confident removal of oxbow code in Erlang codebases, actively contributing to increasing their quality and maintainability.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
-
Radar-Camera Pixel Depth Association for Depth Completion
Authors:
Yunfei Long,
Daniel Morris,
Xiaoming Liu,
Marcos Castro,
Punarjay Chakravarty,
Praveen Narayanan
Abstract:
While radar and video data can be readily fused at the detection level, fusing them at the pixel level is potentially more beneficial. This is also more challenging in part due to the sparsity of radar, but also because automotive radar beams are much wider than a typical pixel combined with a large baseline between camera and radar, which results in poor association between radar pixels and color…
▽ More
While radar and video data can be readily fused at the detection level, fusing them at the pixel level is potentially more beneficial. This is also more challenging in part due to the sparsity of radar, but also because automotive radar beams are much wider than a typical pixel combined with a large baseline between camera and radar, which results in poor association between radar pixels and color pixel. A consequence is that depth completion methods designed for LiDAR and video fare poorly for radar and video. Here we propose a radar-to-pixel association stage which learns a map** from radar returns to pixels. This map** also serves to densify radar returns. Using this as a first stage, followed by a more traditional depth completion method, we are able to achieve image-guided depth completion with radar and video. We demonstrate performance superior to camera and radar alone on the nuScenes dataset. Our source code is available at https://github.com/longyunf/rc-pda.
△ Less
Submitted 4 June, 2021;
originally announced June 2021.
-
IA-CCF: Individual Accountability for Permissioned Ledgers
Authors:
Alex Shamis,
Peter Pietzuch,
Miguel Castro,
Cédric Fournet,
Edward Ashton,
Amaury Chamayou,
Sylvan Clebsch,
Antoine Delignat-Lavaud,
Matthew Kerner,
Julien Maffre,
Manuel Costa,
Mark Russinovich
Abstract:
Permissioned ledger systems allow a consortium of members that do not trust one another to execute transactions safely on a set of replicas. Such systems typically use Byzantine fault tolerance (BFT) protocols to distribute trust, which only ensures safety when fewer than 1/3 of the replicas misbehave. Providing guarantees beyond this threshold is a challenge: current systems assume that the ledge…
▽ More
Permissioned ledger systems allow a consortium of members that do not trust one another to execute transactions safely on a set of replicas. Such systems typically use Byzantine fault tolerance (BFT) protocols to distribute trust, which only ensures safety when fewer than 1/3 of the replicas misbehave. Providing guarantees beyond this threshold is a challenge: current systems assume that the ledger is corrupt and fail to identify misbehaving replicas or hold the members that operate them accountable -- instead all members share the blame.
We describe IA-CCF, a new permissioned ledger system that provides individual accountability. It can assign blame to the individual members that operate misbehaving replicas regardless of the number of misbehaving replicas or members. IA-CCF achieves this by signing and logging BFT protocol messages in the ledger, and by using Merkle trees to provide clients with succinct, universally-verifiable receipts as evidence of successful transaction execution. Anyone can audit the ledger against a set of receipts to discover inconsistencies and identify replicas that signed contradictory statements. IA-CCF also supports changes to consortium membership and replicas by tracking signing keys using a sub-ledger of governance transactions. IA-CCF provides strong disincentives to misbehavior with low overhead: it executes 47,000 tx/s while providing clients with receipts in two network round trips.
△ Less
Submitted 8 March, 2022; v1 submitted 27 May, 2021;
originally announced May 2021.
-
Predicting post-operative right ventricular failure using video-based deep learning
Authors:
Rohan Shad,
Nicolas Quach,
Robyn Fong,
Patpilai Kasinpila,
Cayley Bowles,
Miguel Castro,
Ashrith Guha,
Eddie Suarez,
Stefan Jovinge,
Sang** Lee,
Theodore Boeve,
Myriam Amsallem,
Xiu Tang,
Francois Haddad,
Yasuhiro Shudo,
Y. Joseph Woo,
Jeffrey Teuteberg,
John P. Cunningham,
Curt P. Langlotz,
William Hiesinger
Abstract:
Non-invasive and cost effective in nature, the echocardiogram allows for a comprehensive assessment of the cardiac musculature and valves. Despite progressive improvements over the decades, the rich temporally resolved data in echocardiography videos remain underutilized. Human reads of echocardiograms reduce the complex patterns of cardiac wall motion, to a small list of measurements of heart fun…
▽ More
Non-invasive and cost effective in nature, the echocardiogram allows for a comprehensive assessment of the cardiac musculature and valves. Despite progressive improvements over the decades, the rich temporally resolved data in echocardiography videos remain underutilized. Human reads of echocardiograms reduce the complex patterns of cardiac wall motion, to a small list of measurements of heart function. Furthermore, all modern echocardiography artificial intelligence (AI) systems are similarly limited by design - automating measurements of the same reductionist metrics rather than utilizing the wealth of data embedded within each echo study. This underutilization is most evident in situations where clinical decision making is guided by subjective assessments of disease acuity, and tools that predict disease onset within clinically actionable timeframes are unavailable. Predicting the likelihood of develo** post-operative right ventricular failure (RV failure) in the setting of mechanical circulatory support is one such clinical example. To address this, we developed a novel video AI system trained to predict post-operative right ventricular failure (RV failure), using the full spatiotemporal density of information from pre-operative echocardiography scans. We achieve an AUC of 0.729, specificity of 52% at 80% sensitivity and 46% sensitivity at 80% specificity. Furthermore, we show that our ML system significantly outperforms a team of human experts tasked with predicting RV failure on independent clinical evaluation. Finally, the methods we describe are generalizable to any cardiac clinical decision support application where treatment or patient selection is guided by qualitative echocardiography assessments.
△ Less
Submitted 27 February, 2021;
originally announced March 2021.
-
The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors
Authors:
William H. Guss,
Mario Ynocente Castro,
Sam Devlin,
Brandon Houghton,
Noboru Sean Kuno,
Crissman Loomis,
Stephanie Milani,
Sharada Mohanty,
Keisuke Nakata,
Ruslan Salakhutdinov,
John Schulman,
Shinya Shiroshita,
Nicholay Topin,
Avinash Ummadisingu,
Oriol Vinyals
Abstract:
Although deep reinforcement learning has led to breakthroughs in many difficult domains, these successes have required an ever-increasing number of samples, affording only a shrinking segment of the AI community access to their development. Resolution of these limitations requires new, sample-efficient methods. To facilitate research in this direction, we propose this second iteration of the MineR…
▽ More
Although deep reinforcement learning has led to breakthroughs in many difficult domains, these successes have required an ever-increasing number of samples, affording only a shrinking segment of the AI community access to their development. Resolution of these limitations requires new, sample-efficient methods. To facilitate research in this direction, we propose this second iteration of the MineRL Competition. The primary goal of the competition is to foster the development of algorithms which can efficiently leverage human demonstrations to drastically reduce the number of samples needed to solve complex, hierarchical, and sparse environments. To that end, participants compete under a limited environment sample-complexity budget to develop systems which solve the MineRL ObtainDiamond task in Minecraft, a sequential decision making environment requiring long-term planning, hierarchical control, and efficient exploration methods. The competition is structured into two rounds in which competitors are provided several paired versions of the dataset and environment with different game textures and shaders. At the end of each round, competitors submit containerized versions of their learning algorithms to the AIcrowd platform where they are trained from scratch on a hold-out dataset-environment pair for a total of 4-days on a pre-specified hardware platform. In this follow-up iteration to the NeurIPS 2019 MineRL Competition, we implement new features to expand the scale and reach of the competition. In response to the feedback of the previous participants, we introduce a second minor track focusing on solutions without access to environment interactions of any kind except during test-time. Further we aim to prompt domain agnostic submissions by implementing several novel competition mechanics including action-space randomization and desemantization of observations and actions.
△ Less
Submitted 26 January, 2021;
originally announced January 2021.
-
Discovering Avoidable Planner Failures of Autonomous Vehicles using Counterfactual Analysis in Behaviorally Diverse Simulation
Authors:
Daisuke Nishiyama,
Mario Ynocente Castro,
Shirou Maruyama,
Shinya Shiroshita,
Karim Hamzaoui,
Yi Ouyang,
Guy Rosman,
Jonathan DeCastro,
Kuan-Hui Lee,
Adrien Gaidon
Abstract:
Automated Vehicles require exhaustive testing in simulation to detect as many safety-critical failures as possible before deployment on public roads. In this work, we focus on the core decision-making component of autonomous robots: their planning algorithm. We introduce a planner testing framework that leverages recent progress in simulating behaviorally diverse traffic participants. Using large…
▽ More
Automated Vehicles require exhaustive testing in simulation to detect as many safety-critical failures as possible before deployment on public roads. In this work, we focus on the core decision-making component of autonomous robots: their planning algorithm. We introduce a planner testing framework that leverages recent progress in simulating behaviorally diverse traffic participants. Using large scale search, we generate, detect, and characterize dynamic scenarios leading to collisions. In particular, we propose methods to distinguish between unavoidable and avoidable accidents, focusing especially on automatically finding planner-specific defects that must be corrected before deployment. Through experiments in complex multi-agent intersection scenarios, we show that our method can indeed find a wide range of critical planner failures.
△ Less
Submitted 24 November, 2020;
originally announced November 2020.
-
Behaviorally Diverse Traffic Simulation via Reinforcement Learning
Authors:
Shinya Shiroshita,
Shirou Maruyama,
Daisuke Nishiyama,
Mario Ynocente Castro,
Karim Hamzaoui,
Guy Rosman,
Jonathan DeCastro,
Kuan-Hui Lee,
Adrien Gaidon
Abstract:
Traffic simulators are important tools in autonomous driving development. While continuous progress has been made to provide developers more options for modeling various traffic participants, tuning these models to increase their behavioral diversity while maintaining quality is often very challenging. This paper introduces an easily-tunable policy generation algorithm for autonomous driving agent…
▽ More
Traffic simulators are important tools in autonomous driving development. While continuous progress has been made to provide developers more options for modeling various traffic participants, tuning these models to increase their behavioral diversity while maintaining quality is often very challenging. This paper introduces an easily-tunable policy generation algorithm for autonomous driving agents. The proposed algorithm balances diversity and driving skills by leveraging the representation and exploration abilities of deep reinforcement learning via a distinct policy set selector. Moreover, we present an algorithm utilizing intrinsic rewards to widen behavioral differences in the training. To provide quantitative assessments, we develop two trajectory-based evaluation metrics which measure the differences among policies and behavioral coverage. We experimentally show the effectiveness of our methods on several challenging intersection scenes.
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
A novel method for Causal Structure Discovery from EHR data, a demonstration on type-2 diabetes mellitus
Authors:
Xinpeng Shen,
Sisi Ma,
Prashanthi Vemuri,
M. Regina Castro,
Pedro J. Caraballo,
Gyorgy J. Simon
Abstract:
Introduction: The discovery of causal mechanisms underlying diseases enables better diagnosis, prognosis and treatment selection. Clinical trials have been the gold standard for determining causality, but they are resource intensive, sometimes infeasible or unethical. Electronic Health Records (EHR) contain a wealth of real-world data that holds promise for the discovery of disease mechanisms, yet…
▽ More
Introduction: The discovery of causal mechanisms underlying diseases enables better diagnosis, prognosis and treatment selection. Clinical trials have been the gold standard for determining causality, but they are resource intensive, sometimes infeasible or unethical. Electronic Health Records (EHR) contain a wealth of real-world data that holds promise for the discovery of disease mechanisms, yet the existing causal structure discovery (CSD) methods fall short on leveraging them due to the special characteristics of the EHR data. We propose a new data transformation method and a novel CSD algorithm to overcome the challenges posed by these characteristics. Materials and methods: We demonstrated the proposed methods on an application to type-2 diabetes mellitus. We used a large EHR data set from Mayo Clinic to internally evaluate the proposed transformation and CSD methods and used another large data set from an independent health system, Fairview Health Services, as external validation. We compared the performance of our proposed method to Fast Greedy Equivalence Search (FGES), a state-of-the-art CSD method in terms of correctness, stability and completeness. We tested the generalizability of the proposed algorithm through external validation. Results and conclusions: The proposed method improved over the existing methods by successfully incorporating study design considerations, was robust in face of unreliable EHR timestamps and inferred causal effect directions more correctly and reliably. The proposed data transformation successfully improved the clinical correctness of the discovered graph and the consistency of edge orientation across bootstrap samples. It resulted in superior accuracy, stability, and completeness.
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
It was never about the language: paradigm impact on software design decisions
Authors:
Laura M. Castro
Abstract:
Programming languages development has intensified in recent years. New ones are created; new features, often cross-paradigm, are featured in old ones. This new programming landscape makes language selection a more complex decision, both from the companies points of view (technical, recruiting) and from the developers point of view (career development). In this paper, however, we argue that program…
▽ More
Programming languages development has intensified in recent years. New ones are created; new features, often cross-paradigm, are featured in old ones. This new programming landscape makes language selection a more complex decision, both from the companies points of view (technical, recruiting) and from the developers point of view (career development). In this paper, however, we argue that programming languages have a secondary role in software development design decisions. We illustrate, based on a practical example, how the main influencer are higher-level traits: those traditionally assigned with programming paradigms. Following this renovated perspective, concerns about language choice are shifted for all parties. Beyond particular syntax, grammar, execution model or code organization, the main consequence of the predominance of one paradigm or another in the mind of the developer is the way solutions are designed.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
iLGaCo: Incremental Learning of Gait Covariate Factors
Authors:
Zihao Mu,
Francisco M. Castro,
Manuel J. Marin-Jimenez,
Nicolas Guil,
Yan-ran Li,
Shiqi Yu
Abstract:
Gait is a popular biometric pattern used for identifying people based on their way of walking. Traditionally, gait recognition approaches based on deep learning are trained using the whole training dataset. In fact, if new data (classes, view-points, walking conditions, etc.) need to be included, it is necessary to re-train again the model with old and new data samples.
In this paper, we propose…
▽ More
Gait is a popular biometric pattern used for identifying people based on their way of walking. Traditionally, gait recognition approaches based on deep learning are trained using the whole training dataset. In fact, if new data (classes, view-points, walking conditions, etc.) need to be included, it is necessary to re-train again the model with old and new data samples.
In this paper, we propose iLGaCo, the first incremental learning approach of covariate factors for gait recognition, where the deep model can be updated with new information without re-training it from scratch by using the whole dataset. Instead, our approach performs a shorter training process with the new data and a small subset of previous samples. This way, our model learns new information while retaining previous knowledge.
We evaluate iLGaCo on CASIA-B dataset in two incremental ways: adding new view-points and adding new walking conditions. In both cases, our results are close to the classical `training-from-scratch' approach, obtaining a marginal drop in accuracy ranging from 0.2% to 1.2%, what shows the efficacy of our approach. In addition, the comparison of iLGaCo with other incremental learning methods, such as LwF and iCarl, shows a significant improvement in accuracy, between 6% and 15% depending on the experiment.
△ Less
Submitted 31 August, 2020;
originally announced August 2020.
-
Distributed Reinforcement Learning of Targeted Gras** with Active Vision for Mobile Manipulators
Authors:
Yasuhiro Fujita,
Kota Uenishi,
Avinash Ummadisingu,
Prabhat Nagarajan,
Shimpei Masuda,
Mario Ynocente Castro
Abstract:
Develo** personal robots that can perform a diverse range of manipulation tasks in unstructured environments necessitates solving several challenges for robotic gras** systems. We take a step towards this broader goal by presenting the first RL-based system, to our knowledge, for a mobile manipulator that can (a) achieve targeted gras** generalizing to unseen target objects, (b) learn comple…
▽ More
Develo** personal robots that can perform a diverse range of manipulation tasks in unstructured environments necessitates solving several challenges for robotic gras** systems. We take a step towards this broader goal by presenting the first RL-based system, to our knowledge, for a mobile manipulator that can (a) achieve targeted gras** generalizing to unseen target objects, (b) learn complex gras** strategies for cluttered scenes with occluded objects, and (c) perform active vision through its movable wrist camera to better locate objects. The system is informed of the desired target object in the form of a single, arbitrary-pose RGB image of that object, enabling the system to generalize to unseen objects without retraining. To achieve such a system, we combine several advances in deep reinforcement learning and present a large-scale distributed training system using synchronous SGD that seamlessly scales to multi-node, multi-GPU infrastructure to make rapid prototy** easier. We train and evaluate our system in a simulated environment, identify key components for improving performance, analyze its behaviors, and transfer to a real-world setup.
△ Less
Submitted 14 October, 2020; v1 submitted 15 July, 2020;
originally announced July 2020.
-
Fast General Distributed Transactions with Opacity using Global Time
Authors:
Alex Shamis,
Matthew Renzelmann,
Stanko Novakovic,
Georgios Chatzopoulos,
Anders T. Gjerdrum,
Dan Alistarh,
Aleksandar Dragojevic,
Dushyanth Narayanan,
Miguel Castro
Abstract:
Transactions can simplify distributed applications by hiding data distribution, concurrency, and failures from the application developer. Ideally the developer would see the abstraction of a single large machine that runs transactions sequentially and never fails. This requires the transactional subsystem to provide opacity (strict serializability for both committed and aborted transactions), as w…
▽ More
Transactions can simplify distributed applications by hiding data distribution, concurrency, and failures from the application developer. Ideally the developer would see the abstraction of a single large machine that runs transactions sequentially and never fails. This requires the transactional subsystem to provide opacity (strict serializability for both committed and aborted transactions), as well as transparent fault tolerance with high availability. As even the best abstractions are unlikely to be used if they perform poorly, the system must also provide high performance.
Existing distributed transactional designs either weaken this abstraction or are not designed for the best performance within a data center. This paper extends the design of FaRM - which provides strict serializability only for committed transactions - to provide opacity while maintaining FaRM's high throughput, low latency, and high availability within a modern data center. It uses timestamp ordering based on real time with clocks synchronized to within tens of microseconds across a cluster, and a failover protocol to ensure correctness across clock master failures. FaRM with opacity can commit 5.4 million neworder transactions per second when running the TPC-C transaction mix on 90 machines with 3-way replication.
△ Less
Submitted 25 June, 2020;
originally announced June 2020.
-
A1: A Distributed In-Memory Graph Database
Authors:
Chiranjeeb Buragohain,
Knut Magne Risvik,
Paul Brett,
Miguel Castro,
Wonhee Cho,
Joshua Cowhig,
Nikolas Gloy,
Karthik Kalyanaraman,
Richendra Khanna,
John Pao,
Matthew Renzelmann,
Alex Shamis,
Timothy Tan,
Shuheng Zheng
Abstract:
A1 is an in-memory distributed database used by the Bing search engine to support complex queries over structured data. The key enablers for A1 are availability of cheap DRAM and high speed RDMA (Remote Direct Memory Access) networking in commodity hardware. A1 uses FaRM as its underlying storage layer and builds the graph abstraction and query engine on top. The combination of in-memory storage a…
▽ More
A1 is an in-memory distributed database used by the Bing search engine to support complex queries over structured data. The key enablers for A1 are availability of cheap DRAM and high speed RDMA (Remote Direct Memory Access) networking in commodity hardware. A1 uses FaRM as its underlying storage layer and builds the graph abstraction and query engine on top. The combination of in-memory storage and RDMA access requires rethinking how data is allocated, organized and queried in a large distributed system. A single A1 cluster can store tens of billions of vertices and edges and support a throughput of 350+ million of vertex reads per second with end to end query latency in single digit milliseconds. In this paper we describe the A1 data model, RDMA optimized data structures and query execution.
△ Less
Submitted 12 April, 2020;
originally announced April 2020.
-
Neural Latent Space Model for Dynamic Networks and Temporal Knowledge Graphs
Authors:
Tony Gracious,
Shubham Gupta,
Arun Kanthali,
Rui M. Castro,
Ambedkar Dukkipati
Abstract:
Although static networks have been extensively studied in machine learning, data mining, and AI communities for many decades, the study of dynamic networks has recently taken center stage due to the prominence of social media and its effects on the dynamics of social networks. In this paper, we propose a statistical model for dynamically evolving networks, together with a variational inference app…
▽ More
Although static networks have been extensively studied in machine learning, data mining, and AI communities for many decades, the study of dynamic networks has recently taken center stage due to the prominence of social media and its effects on the dynamics of social networks. In this paper, we propose a statistical model for dynamically evolving networks, together with a variational inference approach. Our model, Neural Latent Space Model with Variational Inference, encodes edge dependencies across different time snapshots. It represents nodes via latent vectors and uses interaction matrices to model the presence of edges. These matrices can be used to incorporate multiple relations in heterogeneous networks by having a separate matrix for each of the relations. To capture the temporal dynamics, both node vectors and interaction matrices are allowed to evolve with time. Existing network analysis methods use representation learning techniques for modelling networks. These techniques are different for homogeneous and heterogeneous networks because heterogeneous networks can have multiple types of edges and nodes as opposed to a homogeneous network. Unlike these, we propose a unified model for homogeneous and heterogeneous networks in a variational inference framework. Moreover, the learned node latent vectors and interaction matrices may be interpretable and therefore provide insights on the mechanisms behind network evolution. We experimented with a single step and multi-step link forecasting on real-world networks of homogeneous, bipartite, and heterogeneous nature, and demonstrated that our model significantly outperforms existing models.
△ Less
Submitted 18 December, 2020; v1 submitted 26 November, 2019;
originally announced November 2019.
-
Equip** SBMs with RBMs: An Explainable Approach for Analysis of Networks with Covariates
Authors:
Shubham Gupta,
Gururaj K.,
Ambedkar Dukkipati,
Rui M. Castro
Abstract:
Networks with node covariates offer two advantages to community detection methods, namely, (i) exploit covariates to improve the quality of communities, and more importantly, (ii) explain the discovered communities by identifying the relative importance of different covariates in them. Recent methods have almost exclusively focused on the first point above. However, the quantitative improvements o…
▽ More
Networks with node covariates offer two advantages to community detection methods, namely, (i) exploit covariates to improve the quality of communities, and more importantly, (ii) explain the discovered communities by identifying the relative importance of different covariates in them. Recent methods have almost exclusively focused on the first point above. However, the quantitative improvements offered by them are often due to complex black-box models like deep neural networks at the expense of explainability. Approaches that focus on the second point are either domain-specific or have poor performance in practice. This paper proposes explainable, domain-independent statistical models for networks with node covariates that additionally offer good quantitative performance. Our models combine the strengths of Stochastic Block Models and Restricted Boltzmann Machines to provide interpretable insights about the communities. They support both pure and mixed community memberships. Besides providing explainability, our approach's main strength is that it does not explicitly assume a causal direction between community memberships and node covariates, making it applicable in diverse domains. We derive efficient inference procedures for our models, which can, in some cases, run in linear time in the number of nodes and edges. Experiments on several synthetic and real-world networks demonstrate that our models achieve close to state-of-the-art performance on community detection and link prediction tasks while also providing explanations for the discovered communities.
△ Less
Submitted 5 April, 2021; v1 submitted 11 November, 2019;
originally announced November 2019.
-
A Transition-Aware Method for the Simulation of Compliant Contact with Regularized Friction
Authors:
Alejandro M. Castro,
Ante Qu,
Naveen Kuppuswamy,
Alex Alspach,
Michael Sherman
Abstract:
Multibody simulation with frictional contact has been a challenging subject of research for the past thirty years. Rigid-body assumptions are commonly used to approximate the physics of contact, and together with Coulomb friction, lead to challenging-to-solve nonlinear complementarity problems (NCP). On the other hand, robot grippers often introduce significant compliance. Compliant contact, combi…
▽ More
Multibody simulation with frictional contact has been a challenging subject of research for the past thirty years. Rigid-body assumptions are commonly used to approximate the physics of contact, and together with Coulomb friction, lead to challenging-to-solve nonlinear complementarity problems (NCP). On the other hand, robot grippers often introduce significant compliance. Compliant contact, combined with regularized friction, can be modeled entirely with ODEs, avoiding NCP solves. Unfortunately, regularized friction introduces high-frequency stiff dynamics and even implicit methods struggle with these systems, especially during slip-stick transitions. To improve the performance of implicit integration for these systems we introduce a Transition-Aware Line Search (TALS), which greatly improves the convergence of the Newton-Raphson iterations performed by implicit integrators. We find that TALS works best with semi-implicit integration, but that the explicit treatment of normal compliance can be problematic. To address this, we develop a Transition-Aware Modified Semi-Implicit (TAMSI) integrator that has similar computational cost to semi-implicit methods but implicitly couples compliant contact forces, leading to a more robust method. We evaluate the robustness, accuracy and performance of TAMSI and demonstrate our approach alongside relevant sim-to-real manipulation tasks.
△ Less
Submitted 19 April, 2020; v1 submitted 12 September, 2019;
originally announced September 2019.
-
A Metric of Software Size as a Tool for IT Governance
Authors:
Marcus Vinicius Borela de Castro,
Carlos Alberto Mamede Hernandes
Abstract:
This paper proposes a new metric for software functional size, which is derived from Function Point Analysis (FPA), but overcomes some of its known defi- ciencies. The statistical results show that the new metric, Functional Elements (EF), and its submetric, Functional Elements of Transaction (EFt), have higher correlation with the effort in software development than FPA in the context of the anal…
▽ More
This paper proposes a new metric for software functional size, which is derived from Function Point Analysis (FPA), but overcomes some of its known defi- ciencies. The statistical results show that the new metric, Functional Elements (EF), and its submetric, Functional Elements of Transaction (EFt), have higher correlation with the effort in software development than FPA in the context of the analyzed data. The paper illustrates the application of the new metric as a tool to improve IT governance specifically in assessment, monitoring, and giving directions to the software development area.
△ Less
Submitted 16 October, 2018;
originally announced October 2018.
-
Energy-based Tuning of Convolutional Neural Networks on Multi-GPUs
Authors:
Francisco M. Castro,
Nicolás Guil,
Manuel J. Marín-Jiménez,
Jesús Pérez-Serrano,
Manuel Ujaldón
Abstract:
Deep Learning (DL) applications are gaining momentum in the realm of Artificial Intelligence, particularly after GPUs have demonstrated remarkable skills for accelerating their challenging computational requirements. Within this context, Convolutional Neural Network (CNN) models constitute a representative example of success on a wide set of complex applications, particularly on datasets where the…
▽ More
Deep Learning (DL) applications are gaining momentum in the realm of Artificial Intelligence, particularly after GPUs have demonstrated remarkable skills for accelerating their challenging computational requirements. Within this context, Convolutional Neural Network (CNN) models constitute a representative example of success on a wide set of complex applications, particularly on datasets where the target can be represented through a hierarchy of local features of increasing semantic complexity. In most of the real scenarios, the roadmap to improve results relies on CNN settings involving brute force computation, and researchers have lately proven Nvidia GPUs to be one of the best hardware counterparts for acceleration. Our work complements those findings with an energy study on critical parameters for the deployment of CNNs on flagship image and video applications: object recognition and people identification by gait, respectively. We evaluate energy consumption on four different networks based on the two most popular ones (ResNet/AlexNet): ResNet (167 layers), a 2D CNN (15 layers), a CaffeNet (25 layers) and a ResNetIm (94 layers) using batch sizes of 64, 128 and 256, and then correlate those with speed-up and accuracy to determine optimal settings. Experimental results on a multi-GPU server endowed with twin Maxwell and twin Pascal Titan X GPUs demonstrate that energy correlates with performance and that Pascal may have up to 40% gains versus Maxwell. Larger batch sizes extend performance gains and energy savings, but we have to keep an eye on accuracy, which sometimes shows a preference for small batches. We expect this work to provide a preliminary guidance for a wide set of CNN and DL applications in modern HPC times, where the GFLOPS/w ratio constitutes the primary goal.
△ Less
Submitted 1 August, 2018;
originally announced August 2018.
-
End-to-End Incremental Learning
Authors:
Francisco M. Castro,
Manuel J. Marín-Jiménez,
Nicolás Guil,
Cordelia Schmid,
Karteek Alahari
Abstract:
Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally. This is due to current neural network architectures requiring the entire dataset, consisting of all the samples from the old as well as the new cla…
▽ More
Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from catastrophic forgetting, a dramatic decrease in overall performance when training with new classes added incrementally. This is due to current neural network architectures requiring the entire dataset, consisting of all the samples from the old as well as the new classes, to update the model -a requirement that becomes easily unsustainable as the number of classes grows. We address this issue with our approach to learn deep neural networks incrementally, using new data and only a small exemplar set corresponding to samples from the old classes. This is based on a loss composed of a distillation measure to retain the knowledge acquired from the old classes, and a cross-entropy loss to learn the new classes. Our incremental training is achieved while kee** the entire framework end-to-end, i.e., learning the data representation and the classifier jointly, unlike recent methods with no such guarantees. We evaluate our method extensively on the CIFAR-100 and ImageNet (ILSVRC 2012) image classification datasets, and show state-of-the-art performance.
△ Less
Submitted 3 September, 2018; v1 submitted 25 July, 2018;
originally announced July 2018.
-
Multimodal feature fusion for CNN-based gait recognition: an empirical comparison
Authors:
Francisco Manuel Castro,
Manuel Jesús Marín-Jiménez,
Nicolás Guil,
Nicolás Pérez de la Blanca
Abstract:
People identification in video based on the way they walk (i.e. gait) is a relevant task in computer vision using a non-invasive approach. Standard and current approaches typically derive gait signatures from sequences of binary energy maps of subjects extracted from images, but this process introduces a large amount of non-stationary noise, thus, conditioning their efficacy. In contrast, in this…
▽ More
People identification in video based on the way they walk (i.e. gait) is a relevant task in computer vision using a non-invasive approach. Standard and current approaches typically derive gait signatures from sequences of binary energy maps of subjects extracted from images, but this process introduces a large amount of non-stationary noise, thus, conditioning their efficacy. In contrast, in this paper we focus on the raw pixels, or simple functions derived from them, letting advanced learning techniques to extract relevant features. Therefore, we present a comparative study of different Convolutional Neural Network (CNN) architectures by using three different modalities (i.e. gray pixels, optical flow channels and depth maps) on two widely-adopted and challenging datasets: TUM-GAID and CASIA-B. In addition, we perform a comparative study between different early and late fusion methods used to combine the information obtained from each kind of modalities. Our experimental results suggest that (i) the raw pixel values represent a competitive input modality, compared to the traditional state-of-the-art silhouette-based features (e.g. GEI), since equivalent or better results are obtained; (ii) the fusion of the raw pixel information with information from optical flow and depth maps allows to obtain state-of-the-art results on the gait recognition task with an image resolution several times smaller than the previously reported results; and, (iii) the selection and the design of the CNN architecture are critical points that can make a difference between state-of-the-art results or poor ones.
△ Less
Submitted 20 February, 2020; v1 submitted 19 June, 2018;
originally announced June 2018.
-
IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning
Authors:
Ronan Riochet,
Mario Ynocente Castro,
Mathieu Bernard,
Adam Lerer,
Rob Fergus,
Véronique Izard,
Emmanuel Dupoux
Abstract:
In order to reach human performance on complexvisual tasks, artificial systems need to incorporate a sig-nificant amount of understanding of the world in termsof macroscopic objects, movements, forces, etc. Inspiredby work on intuitive physics in infants, we propose anevaluation benchmark which diagnoses how much a givensystem understands about physics by testing whether itcan tell apart well matc…
▽ More
In order to reach human performance on complexvisual tasks, artificial systems need to incorporate a sig-nificant amount of understanding of the world in termsof macroscopic objects, movements, forces, etc. Inspiredby work on intuitive physics in infants, we propose anevaluation benchmark which diagnoses how much a givensystem understands about physics by testing whether itcan tell apart well matched videos of possible versusimpossible events constructed with a game engine. Thetest requires systems to compute a physical plausibilityscore over an entire video. It is free of bias and cantest a range of basic physical reasoning concepts. Wethen describe two Deep Neural Networks systems aimedat learning intuitive physics in an unsupervised way,using only physically possible videos. The systems aretrained with a future semantic mask prediction objectiveand tested on the possible versus impossible discrimi-nation task. The analysis of their results compared tohuman data gives novel insights in the potentials andlimitations of next frame prediction architectures.
△ Less
Submitted 11 February, 2020; v1 submitted 20 March, 2018;
originally announced March 2018.
-
Geo-Network Coding Function Virtualization for Reliable Communication over Satellite
Authors:
Tan Do-Duy,
M. Angeles Vazquez Castro
Abstract:
In this paper, we propose a design solution for the implementation of Virtualized Network Coding Functionality (VNCF) over a service coverage area. Network Function Virtualization (NFV) and Network Coding (NC) architectural designs are integrated as a toolbox of NC design domains so that NC can be implemented over different underlying physical networks including satellite or hybrid networks.
The…
▽ More
In this paper, we propose a design solution for the implementation of Virtualized Network Coding Functionality (VNCF) over a service coverage area. Network Function Virtualization (NFV) and Network Coding (NC) architectural designs are integrated as a toolbox of NC design domains so that NC can be implemented over different underlying physical networks including satellite or hybrid networks.
The design includes identifying theoretical limits of NC over wireless networks in terms of achievable rate region and optimizing coding rates for nodes that implement VNCF. The overall design target is to achieve a given multicast transmission target reliability at receiver sides. In addition, the optimization problem uses databases with geo-tagged link statistics and geo-location information of network nodes in the deployment area for some computational complexity/energy constraints.
Numerical results provide validation of our design solution on how network conditions and system constraints impact the design and implementation of NC and how VNCF allows reliable communication over wireless networks with reliability and connectivity up to theoretical limits.
△ Less
Submitted 12 March, 2018;
originally announced March 2018.
-
Network Coding Function Virtualization
Authors:
Tan Do-Duy,
M. Angeles Vazquez Castro
Abstract:
Network Functions Virtualization (NFV) and Network Coding (NC) have attracted much attention in recent years as key concepts for providing 5G networks with flexibility and differentiated reliability, respectively. In this paper, we present the integration of NC architectural design and NFV. In order to do so we first describe what we call a virtualization process upon our proposed architectural de…
▽ More
Network Functions Virtualization (NFV) and Network Coding (NC) have attracted much attention in recent years as key concepts for providing 5G networks with flexibility and differentiated reliability, respectively. In this paper, we present the integration of NC architectural design and NFV. In order to do so we first describe what we call a virtualization process upon our proposed architectural design of NC that should help to offer the reliability functionality to a network. The process consists of identifying the required functional entities of NC and analyzing when the functionality should be activated towards complexity/energy efficiency. The relevance of our proposed NC function virtualization is its applicability to any underlying physical network, satellite or hybrid thus enabling softwarization, and rapid innovative deployment. Finally, we validate our framework to a study case of geo-control of network reliability that is based on device's geographical location-based signal/network information.
△ Less
Submitted 12 March, 2018;
originally announced March 2018.
-
Efficient Communication over Cellular Networks with Network Coding in Emergency Scenarios
Authors:
Tan Do-Duy,
M. Angeles Vazquez Castro
Abstract:
Emergency communications requires reliability and flexibility for disaster recovery and relief operation. Based upon existing commercial portable devices (e.g., smartphones, tablets, laptops), we propose a network architecture that uses cellular networks and WiFi connections to deliver large files in emergency scenarios under the impairments of wireless channel such as packet losses and intermitte…
▽ More
Emergency communications requires reliability and flexibility for disaster recovery and relief operation. Based upon existing commercial portable devices (e.g., smartphones, tablets, laptops), we propose a network architecture that uses cellular networks and WiFi connections to deliver large files in emergency scenarios under the impairments of wireless channel such as packet losses and intermittent connection issues. Network coding (NC) is exploited to improve the delivery probability. We first review the state-of-the-art of NC for emergency communications. Then, we present the proposed network architecture which utilizes multiple radio interfaces of portable devices to support data delivery. A random linear NC scheme is exploited at source to enhance the reliability for content delivery against packet losses. Besides, an analytical model for the successful decoding probability in linear NC is derived. Finally, we evaluate the effectiveness of the proposed architecture with NC in terms of the delivery ratio of content for intermittent connectivity scenarios.
△ Less
Submitted 12 March, 2018;
originally announced March 2018.
-
QoS Constrained Power Minimization in the Multiple Stream MIMO Broadcast Channel
Authors:
José P. González-Coma,
Michael Joham,
Paula M. Castro,
Luis Castedo
Abstract:
This work addresses the design of optimal linear transmit filters for the Multiple Input-Multiple Output (MIMO) Broadcast Channel (BC) when several spatial streams are allocated to each user.We also consider that the Channel State Information (CSI) is perfect at the receivers but is only partial at the transmitter. A statistical model for the partial CSI is assumed and exploited for the filter des…
▽ More
This work addresses the design of optimal linear transmit filters for the Multiple Input-Multiple Output (MIMO) Broadcast Channel (BC) when several spatial streams are allocated to each user.We also consider that the Channel State Information (CSI) is perfect at the receivers but is only partial at the transmitter. A statistical model for the partial CSI is assumed and exploited for the filter design. Similarly to the single-stream per user case, the problem is solved via Mean Square Error (MSE) dualities and interference functions. However, including more streams per user involves an additional complexity level since we must determine how to distribute the per-user rates among the streams. Such problem is solved using a projected gradient algorithm.
△ Less
Submitted 18 April, 2016;
originally announced April 2016.
-
Automatic learning of gait signatures for people identification
Authors:
F. M. Castro,
M. J. Marin-Jimenez,
N. Guil,
N. Perez de la Blanca
Abstract:
This work targets people identification in video based on the way they walk (i.e. gait). While classical methods typically derive gait signatures from sequences of binary silhouettes, in this work we explore the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (i.e. optical flow components). We carry out a thorough experimental evaluatio…
▽ More
This work targets people identification in video based on the way they walk (i.e. gait). While classical methods typically derive gait signatures from sequences of binary silhouettes, in this work we explore the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (i.e. optical flow components). We carry out a thorough experimental evaluation of the proposed CNN architecture on the challenging TUM-GAID dataset. The experimental results indicate that using spatio-temporal cuboids of optical flow as input data for CNN allows to obtain state-of-the-art results on the gait task with an image resolution eight times lower than the previously reported results (i.e. 80x60 pixels).
△ Less
Submitted 14 June, 2016; v1 submitted 3 March, 2016;
originally announced March 2016.
-
Fisher Motion Descriptor for Multiview Gait Recognition
Authors:
F. M. Castro,
M. J. Marín-Jiménez,
N. Guil,
R. Muñoz-Salinas
Abstract:
The goal of this paper is to identify individuals by analyzing their gait. Instead of using binary silhouettes as input data (as done in many previous works) we propose and evaluate the use of motion descriptors based on densely sampled short-term trajectories. We take advantage of state-of-the-art people detectors to define custom spatial configurations of the descriptors around the target person…
▽ More
The goal of this paper is to identify individuals by analyzing their gait. Instead of using binary silhouettes as input data (as done in many previous works) we propose and evaluate the use of motion descriptors based on densely sampled short-term trajectories. We take advantage of state-of-the-art people detectors to define custom spatial configurations of the descriptors around the target person, obtaining a rich representation of the gait motion. The local motion features (described by the Divergence-Curl-Shear descriptor) extracted on the different spatial areas of the person are combined into a single high-level gait descriptor by using the Fisher Vector encoding. The proposed approach, coined Pyramidal Fisher Motion, is experimentally validated on `CASIA' dataset (parts B and C), `TUM GAID' dataset, `CMU MoBo' dataset and the recent `AVA Multiview Gait' dataset. The results show that this new approach achieves state-of-the-art results in the problem of gait recognition, allowing to recognize walking people from diverse viewpoints on single and multiple camera setups, wearing different clothes, carrying bags, walking at diverse speeds and not limited to straight walking paths.
△ Less
Submitted 26 January, 2016;
originally announced January 2016.