-
Representation Learning via Consistent Assignment of Views over Random Partitions
Authors:
Thalles Silva,
Adín Ramírez Rivera
Abstract:
We present Consistent Assignment of Views over Random Partitions (CARP), a self-supervised clustering method for representation learning of visual features. CARP learns prototypes in an end-to-end online fashion using gradient descent without additional non-differentiable modules to solve the cluster assignment problem. CARP optimizes a new pretext task based on random partitions of prototypes tha…
▽ More
We present Consistent Assignment of Views over Random Partitions (CARP), a self-supervised clustering method for representation learning of visual features. CARP learns prototypes in an end-to-end online fashion using gradient descent without additional non-differentiable modules to solve the cluster assignment problem. CARP optimizes a new pretext task based on random partitions of prototypes that regularizes the model and enforces consistency between views' assignments. Additionally, our method improves training stability and prevents collapsed solutions in joint-embedding training. Through an extensive evaluation, we demonstrate that CARP's representations are suitable for learning downstream tasks. We evaluate CARP's representations capabilities in 17 datasets across many standard protocols, including linear evaluation, few-shot classification, k-NN, k-means, image retrieval, and copy detection. We compare CARP performance to 11 existing self-supervised methods. We extensively ablate our method and demonstrate that our proposed random partition pretext task improves the quality of the learned representations by devising multiple random classification tasks. In transfer learning tasks, CARP achieves the best performance on average against many SSL methods trained for a longer time.
△ Less
Submitted 27 October, 2023; v1 submitted 19 October, 2023;
originally announced October 2023.
-
SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering
Authors:
Bruno Souza,
Marius Aasan,
Helio Pedrini,
Adín Ramírez Rivera
Abstract:
The intersection of vision and language is of major interest due to the increased focus on seamless integration between recognition and reasoning. Scene graphs (SGs) have emerged as a useful tool for multimodal image analysis, showing impressive performance in tasks such as Visual Question Answering (VQA). In this work, we demonstrate that despite the effectiveness of scene graphs in VQA tasks, cu…
▽ More
The intersection of vision and language is of major interest due to the increased focus on seamless integration between recognition and reasoning. Scene graphs (SGs) have emerged as a useful tool for multimodal image analysis, showing impressive performance in tasks such as Visual Question Answering (VQA). In this work, we demonstrate that despite the effectiveness of scene graphs in VQA tasks, current methods that utilize idealized annotated scene graphs struggle to generalize when using predicted scene graphs extracted from images. To address this issue, we introduce the SelfGraphVQA framework. Our approach extracts a scene graph from an input image using a pre-trained scene graph generator and employs semantically-preserving augmentation with self-supervised techniques. This method improves the utilization of graph representations in VQA tasks by circumventing the need for costly and potentially biased annotated data. By creating alternative views of the extracted graphs through image augmentations, we can learn joint embeddings by optimizing the informational content in their representations using an un-normalized contrastive approach. As we work with SGs, we experiment with three distinct maximization strategies: node-wise, graph-wise, and permutation-equivariant regularization. We empirically showcase the effectiveness of the extracted scene graph for VQA and demonstrate that these approaches enhance overall performance by highlighting the significance of visual information. This offers a more practical solution for VQA tasks that rely on SGs for complex reasoning questions.
△ Less
Submitted 3 October, 2023;
originally announced October 2023.
-
Self-supervised Learning of Contextualized Local Visual Embeddings
Authors:
Thalles Santos Silva,
Helio Pedrini,
Adín Ramírez Rivera
Abstract:
We present Contextualized Local Visual Embeddings (CLoVE), a self-supervised convolutional-based method that learns representations suited for dense prediction tasks. CLoVE deviates from current methods and optimizes a single loss function that operates at the level of contextualized local embeddings learned from output feature maps of convolution neural network (CNN) encoders. To learn contextual…
▽ More
We present Contextualized Local Visual Embeddings (CLoVE), a self-supervised convolutional-based method that learns representations suited for dense prediction tasks. CLoVE deviates from current methods and optimizes a single loss function that operates at the level of contextualized local embeddings learned from output feature maps of convolution neural network (CNN) encoders. To learn contextualized embeddings, CLoVE proposes a normalized mult-head self-attention layer that combines local features from different parts of an image based on similarity. We extensively benchmark CLoVE's pre-trained representations on multiple datasets. CLoVE reaches state-of-the-art performance for CNN-based architectures in 4 dense prediction downstream tasks, including object detection, instance segmentation, keypoint detection, and dense pose estimation.
△ Less
Submitted 4 October, 2023; v1 submitted 30 September, 2023;
originally announced October 2023.
-
Camera Calibration from a Single Imaged Ellipsoid: A Moon Calibration Algorithm
Authors:
Kalani R. Danas Rivera,
Mason A. Peck
Abstract:
This work introduces a method that applies images of the extended bodies in the solar system to spacecraft camera calibration. The extended bodies consist of planets and moons that are well-modeled by triaxial ellipsoids. When imaged, the triaxial ellipsoid projects to a conic section which is generally an ellipse. This work combines the imaged ellipse with information on the observer's target-rel…
▽ More
This work introduces a method that applies images of the extended bodies in the solar system to spacecraft camera calibration. The extended bodies consist of planets and moons that are well-modeled by triaxial ellipsoids. When imaged, the triaxial ellipsoid projects to a conic section which is generally an ellipse. This work combines the imaged ellipse with information on the observer's target-relative state to achieve camera calibration from a single imaged ellipsoid. As such, this work is the first to accomplish camera calibration from a single, non-spherical imaged ellipsoid. The camera calibration algorithm is applied to synthetic images of ellipsoids as well as planetary images of Saturn's moons as captured by the Cassini spacecraft. From a single image, the algorithm estimates the focal length and principal point of Cassini's Narrow Angle Camera within 1.0 mm and 10 pixels, respectively. With multiple images, the one standard deviation uncertainty in focal length and principal point estimates reduce to 0.5 mm and 3.1 pixels, respectively. Though created for spacecraft camera calibration in mind, this work also generalizes to terrestrial camera calibration using any number of imaged ellipsoids.
△ Less
Submitted 2 July, 2023;
originally announced July 2023.
-
RepFair-GAN: Mitigating Representation Bias in GANs Using Gradient Clip**
Authors:
Patrik Joslin Kenfack,
Kamil Sabbagh,
Adín Ramírez Rivera,
Adil Khan
Abstract:
Fairness has become an essential problem in many domains of Machine Learning (ML), such as classification, natural language processing, and Generative Adversarial Networks (GANs). In this research effort, we study the unfairness of GANs. We formally define a new fairness notion for generative models in terms of the distribution of generated samples sharing the same protected attributes (gender, ra…
▽ More
Fairness has become an essential problem in many domains of Machine Learning (ML), such as classification, natural language processing, and Generative Adversarial Networks (GANs). In this research effort, we study the unfairness of GANs. We formally define a new fairness notion for generative models in terms of the distribution of generated samples sharing the same protected attributes (gender, race, etc.). The defined fairness notion (representational fairness) requires the distribution of the sensitive attributes at the test time to be uniform, and, in particular for GAN model, we show that this fairness notion is violated even when the dataset contains equally represented groups, i.e., the generator favors generating one group of samples over the others at the test time. In this work, we shed light on the source of this representation bias in GANs along with a straightforward method to overcome this problem. We first show on two widely used datasets (MNIST, SVHN) that when the norm of the gradient of one group is more important than the other during the discriminator's training, the generator favours sampling data from one group more than the other at test time. We then show that controlling the groups' gradient norm by performing group-wise gradient norm clip** in the discriminator during the training leads to a more fair data generation in terms of representational fairness compared to existing models while preserving the quality of generated samples.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
Global and Local Features through Gaussian Mixture Models on Image Semantic Segmentation
Authors:
Darwin Saire,
Adín Ramírez Rivera
Abstract:
The semantic segmentation task aims at dense classification at the pixel-wise level. Deep models exhibited progress in tackling this task. However, one remaining problem with these approaches is the loss of spatial precision, often produced at the segmented objects' boundaries. Our proposed model addresses this problem by providing an internal structure for the feature representations while extrac…
▽ More
The semantic segmentation task aims at dense classification at the pixel-wise level. Deep models exhibited progress in tackling this task. However, one remaining problem with these approaches is the loss of spatial precision, often produced at the segmented objects' boundaries. Our proposed model addresses this problem by providing an internal structure for the feature representations while extracting a global representation that supports the former. To fit the internal structure, during training, we predict a Gaussian Mixture Model from the data, which, merged with the skip connections and the decoding stage, helps avoid wrong inductive biases. Furthermore, our results show that we can improve semantic segmentation by providing both learning representations (global and local) with a clustering behavior and combining them. Finally, we present results demonstrating our advances in Cityscapes and Synthia datasets.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
A deep learning approach to halo merger tree construction
Authors:
Sandra Robles,
Jonathan S. Gómez,
Adín Ramírez Rivera,
Nelson D. Padilla,
Diego Dujovne
Abstract:
A key ingredient for semi-analytic models (SAMs) of galaxy formation is the mass assembly history of haloes, encoded in a tree structure. The most commonly used method to construct halo merger histories is based on the outcomes of high-resolution, computationally intensive N-body simulations. We show that machine learning (ML) techniques, in particular Generative Adversarial Networks (GANs), are a…
▽ More
A key ingredient for semi-analytic models (SAMs) of galaxy formation is the mass assembly history of haloes, encoded in a tree structure. The most commonly used method to construct halo merger histories is based on the outcomes of high-resolution, computationally intensive N-body simulations. We show that machine learning (ML) techniques, in particular Generative Adversarial Networks (GANs), are a promising new tool to tackle this problem with a modest computational cost and retaining the best features of merger trees from simulations. We train our GAN model with a limited sample of merger trees from the Evolution and Assembly of GaLaxies and their Environments (EAGLE) simulation suite, constructed using two halo finders-tree builder algorithms: SUBFIND-D-TREES and ROCKSTAR-ConsistentTrees. Our GAN model successfully learns to generate well-constructed merger tree structures with high temporal resolution, and to reproduce the statistical features of the sample of merger trees used for training, when considering up to three variables in the training process. These inputs, whose representations are also learned by our GAN model, are mass of the halo progenitors and the final descendant, progenitor type (main halo or satellite) and distance of a progenitor to that in the main branch. The inclusion of the latter two inputs greatly improves the final learned representation of the halo mass growth history, especially for SUBFIND-like ML trees. When comparing equally sized samples of ML merger trees with those of the EAGLE simulation, we find better agreement for SUBFIND-like ML trees. Finally, our GAN-based framework can be utilised to construct merger histories of low- and intermediate-mass haloes, the most abundant in cosmological simulations.
△ Less
Submitted 27 June, 2022; v1 submitted 31 May, 2022;
originally announced May 2022.
-
Massively Scalable Wavelength Diverse Integrated Photonic Linear Neuron
Authors:
Matthew van Niekerk,
Anthony Rizzo,
Hector Rubio Rivera,
Gerald Leake,
Daniel Coleman,
Christopher Tison,
Michael Fanto,
Keren Bergman,
Stefan Preble
Abstract:
As computing resource demands continue to escalate in the face of big data, cloud-connectivity and the internet of things, it has become imperative to develop new low-power, scalable architectures. Neuromorphic photonics, or photonic neural networks, have become a feasible solution for the physical implementation of efficient algorithms directly on-chip. This application is primarily due to the li…
▽ More
As computing resource demands continue to escalate in the face of big data, cloud-connectivity and the internet of things, it has become imperative to develop new low-power, scalable architectures. Neuromorphic photonics, or photonic neural networks, have become a feasible solution for the physical implementation of efficient algorithms directly on-chip. This application is primarily due to the linear nature of light and the scalability of silicon photonics, specifically leveraging the wide-scale complementary metal-oxide-semiconductor (CMOS) manufacturing infrastructure used to fabricate microelectronics chips. Current neuromorphic photonic implementations stem from two paradigms: wavelength coherent and incoherent. Here, we introduce a novel architecture that supports coherent and incoherent operation to increase the capability and capacity of photonic neural networks with a dramatic reduction in footprint compared to previous demonstrations. As a proof-of-principle, we experimentally demonstrate simple addition and subtraction operations on a foundry-fabricated silicon photonic chip. Additionally, we experimentally validate an on-chip network to predict the logical 2-bit gates AND, OR, and XOR to accuracies of $96.8\%, 99\%,$ and $98.5\%$, respectively. This architecture is compatible with highly wavelength parallel sources, enabling massively scalable photonic neural networks.
△ Less
Submitted 25 August, 2022; v1 submitted 11 May, 2022;
originally announced May 2022.
-
Sustainable Development Goal Target Interactions in the Philippines: A Two-Method Approach
Authors:
Vena Pearl Bongolan,
Spencer C. Soria,
Roselle Leah K. Rivera
Abstract:
In 2015, the United Nations adopted 17 Sustainable Development Goals (SDGs) with 169 targets for transformation toward a more sustainable future by 2030. This study seeks to evaluate and analyze SDG target interactions in the Philippines to resolve conflicting targets, and prioritize targets that reinforce others and have no conflicts. To evaluate all 14196 target interactions, two methods are emp…
▽ More
In 2015, the United Nations adopted 17 Sustainable Development Goals (SDGs) with 169 targets for transformation toward a more sustainable future by 2030. This study seeks to evaluate and analyze SDG target interactions in the Philippines to resolve conflicting targets, and prioritize targets that reinforce others and have no conflicts. To evaluate all 14196 target interactions, two methods are employed. First, experts with over five years of SDG-related experience evaluated interactions using a 7-point scale. Second, a non-parametric Spearman rank correlation is used on official indicator data with resulting coefficients serving as interaction scores. Interaction scores are then coded as synergies (interact positively), trade-offs (negatively) or non-classified (neutrally). Targets are also modelled as nodes and interactions as edges in graphs presented in sdginteractions.herokuapp.com. Results from the two methods were synthesized to formulate recommendations for concerned parties. This includes resolving negative intra-goal target interactions involving targets 3.1 'Reduce maternal mortality', 3.6 'Reduce road injuries and deaths', and 3.7 'Universal access to sexual and reproductive care, family planning, and education'. Ugly targets (at least one negative interaction) including target 3.6, 3.7, and 8.2 'Diversify, innovate, and upgrade for economic productivity' need to be resolved. Targets that reinforce their corresponding SDGs should be prioritized, including 1.1 'Eradicate extreme poverty', 4.2 'Equal access to quality pre-primary education', 6.2 'End open defecation and provide access to sanitation and hygiene', 8.1 'Sustainable economic growth', and 9.4 'Upgrade all industries and infrastructures for sustainability'. Beautiful targets (no negative interactions) should also be prioritized, including target 8.5 and 17.5 'Invest in least developed countries'.
△ Less
Submitted 28 May, 2022; v1 submitted 22 March, 2022;
originally announced March 2022.
-
Representation Learning via Consistent Assignment of Views to Clusters
Authors:
Thalles Silva,
Adín Ramírez Rivera
Abstract:
We introduce Consistent Assignment for Representation Learning (CARL), an unsupervised learning method to learn visual representations by combining ideas from self-supervised contrastive learning and deep clustering. By viewing contrastive learning from a clustering perspective, CARL learns unsupervised representations by learning a set of general prototypes that serve as energy anchors to enforce…
▽ More
We introduce Consistent Assignment for Representation Learning (CARL), an unsupervised learning method to learn visual representations by combining ideas from self-supervised contrastive learning and deep clustering. By viewing contrastive learning from a clustering perspective, CARL learns unsupervised representations by learning a set of general prototypes that serve as energy anchors to enforce different views of a given image to be assigned to the same prototype. Unlike contemporary work on contrastive learning with deep clustering, CARL proposes to learn the set of general prototypes in an online fashion, using gradient descent without the necessity of using non-differentiable algorithms or K-Means to solve the cluster assignment problem. CARL surpasses its competitors in many representations learning benchmarks, including linear evaluation, semi-supervised learning, and transfer learning.
△ Less
Submitted 16 March, 2022; v1 submitted 31 December, 2021;
originally announced December 2021.
-
Applications and Techniques for Fast Machine Learning in Science
Authors:
Allison McCarn Deiana,
Nhan Tran,
Joshua Agar,
Michaela Blott,
Giuseppe Di Guglielmo,
Javier Duarte,
Philip Harris,
Scott Hauck,
Mia Liu,
Mark S. Neubauer,
Jennifer Ngadiuba,
Seda Ogrenci-Memik,
Maurizio Pierini,
Thea Aarrestad,
Steffen Bahr,
Jurgen Becker,
Anne-Sophie Berthold,
Richard J. Bonventre,
Tomas E. Muller Bravo,
Markus Diefenthaler,
Zhen Dong,
Nick Fritzsche,
Amir Gholami,
Ekaterina Govorkova,
Kyle J Hazelwood
, et al. (62 additional authors not shown)
Abstract:
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML ac…
▽ More
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlap** challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
SDG Target Interactions: The Philippine Analysis of Indivisible and Cancelling Targets
Authors:
Vena Pearl Bongolan,
Arian Allenson M. Valdez,
Roselle Leah K. Rivera
Abstract:
The United Nations developed the 17 Sustainable Development Goals (SDGs), with 169 targets, to serve as a plan for solving the world's problems and achieving a more sustainable future. This is modeled as a graph with the targets as nodes, and with the interaction between targets as the edges of the graph. An exhaustive binary comparison is done to analyze the intra- and inter-goal target interacti…
▽ More
The United Nations developed the 17 Sustainable Development Goals (SDGs), with 169 targets, to serve as a plan for solving the world's problems and achieving a more sustainable future. This is modeled as a graph with the targets as nodes, and with the interaction between targets as the edges of the graph. An exhaustive binary comparison is done to analyze the intra- and inter-goal target interactions, entailing over 14000 comparisons. The task is to assign a 'color' to an edge: positive (indivisible), zero (consistent) or negative (cancelling). This is done via a panel of experts who will evaluate the target interactions, through a web application that was developed for coloring the edges. This is an on-going study, and so far, of the 1256 edges colored, only 36 are cancelling (negative), or 2.86%; more than 97% are positive interactions. So far, the "most negative" interactions involve: "Climate Change"; "Life Below Water"; "Peace, Justice and Strong Institutions"; and "Decent Work and Economic Growth". Most useful for planning might be the 'graph of beautiful targets' feature, which shows target with non-negative interactions, and how they connect to each other. These are the targets that may be worked on simultaneously, and currently has more than 130 nodes. This study can help researchers analyze which targets enable or constrain each other, what mitigation can be done to avoid conflicts, and can be configured for sub-national or regional study. Web app at: http://sdg-interactions.herokuapp.com/
△ Less
Submitted 31 October, 2021; v1 submitted 12 September, 2021;
originally announced September 2021.
-
Empirical Study of Multi-Task Hourglass Model for Semantic Segmentation Task
Authors:
Darwin Saire,
Adín Ramírez Rivera
Abstract:
The semantic segmentation (SS) task aims to create a dense classification by labeling at the pixel level each object present on images. Convolutional neural network (CNN) approaches have been widely used, and exhibited the best results in this task. However, the loss of spatial precision on the results is a main drawback that has not been solved. In this work, we propose to use a multi-task approa…
▽ More
The semantic segmentation (SS) task aims to create a dense classification by labeling at the pixel level each object present on images. Convolutional neural network (CNN) approaches have been widely used, and exhibited the best results in this task. However, the loss of spatial precision on the results is a main drawback that has not been solved. In this work, we propose to use a multi-task approach by complementing the semantic segmentation task with edge detection, semantic contour, and distance transform tasks. We propose that by sharing a common latent space, the complementary tasks can produce more robust representations that can enhance the semantic labels. We explore the influence of contour-based tasks on latent space, as well as their impact on the final results of SS. We demonstrate the effectiveness of learning in a multi-task setting for hourglass models in the Cityscapes, CamVid, and Freiburg Forest datasets by improving the state-of-the-art without any refinement post-processing.
△ Less
Submitted 27 May, 2021;
originally announced May 2021.
-
On the Pitfalls of Learning with Limited Data: A Facial Expression Recognition Case Study
Authors:
Miguel Rodríguez Santander,
Juan Hernández Albarracín,
Adín Ramírez Rivera
Abstract:
Deep learning models need large amounts of data for training. In video recognition and classification, significant advances were achieved with the introduction of new large databases. However, the creation of large-databases for training is infeasible in several scenarios. Thus, existing or small collected databases are typically joined and amplified to train these models. Nevertheless, training n…
▽ More
Deep learning models need large amounts of data for training. In video recognition and classification, significant advances were achieved with the introduction of new large databases. However, the creation of large-databases for training is infeasible in several scenarios. Thus, existing or small collected databases are typically joined and amplified to train these models. Nevertheless, training neural networks on limited data is not straightforward and comes with a set of problems. In this paper, we explore the effects of stacking databases, model initialization, and data amplification techniques when training with limited data on deep learning models' performance. We focused on the problem of Facial Expression Recognition from videos. We performed an extensive study with four databases at a different complexity and nine deep-learning architectures for video classification. We found that (i) complex training sets translate better to more stable test sets when trained with transfer learning and synthetically generated data, but their performance yields a high variance; (ii) training with more detailed data translates to more stable performance on novel scenarios (albeit with lower performance); (iii) merging heterogeneous data is not a straightforward improvement, as the type of augmentation and initialization is crucial; (iv) classical data augmentation cannot fill the holes created by joining largely separated datasets; and (v) inductive biases help to bridge the gap when paired with synthetic data, but this data is not enough when working with standard initialization techniques.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Hierarchical Transformer for Multilingual Machine Translation
Authors:
Albina Khusainova,
Adil Khan,
Adín Ramírez Rivera,
Vitaly Romanov
Abstract:
The choice of parameter sharing strategy in multilingual machine translation models determines how optimally parameter space is used and hence, directly influences ultimate translation quality. Inspired by linguistic trees that show the degree of relatedness between different languages, the new general approach to parameter sharing in multilingual machine translation was suggested recently. The ma…
▽ More
The choice of parameter sharing strategy in multilingual machine translation models determines how optimally parameter space is used and hence, directly influences ultimate translation quality. Inspired by linguistic trees that show the degree of relatedness between different languages, the new general approach to parameter sharing in multilingual machine translation was suggested recently. The main idea is to use these expert language hierarchies as a basis for multilingual architecture: the closer two languages are, the more parameters they share. In this work, we test this idea using the Transformer architecture and show that despite the success in previous work there are problems inherent to training such hierarchical models. We demonstrate that in case of carefully chosen training strategy the hierarchical architecture can outperform bilingual models and multilingual models with full parameter sharing.
△ Less
Submitted 5 March, 2021;
originally announced March 2021.
-
Video Reenactment as Inductive Bias for Content-Motion Disentanglement
Authors:
Juan F. Hernández Albarracín,
Adín Ramírez Rivera
Abstract:
Independent components within low-dimensional representations are essential inputs in several downstream tasks, and provide explanations over the observed data. Video-based disentangled factors of variation provide low-dimensional representations that can be identified and used to feed task-specific models. We introduce MTC-VAE, a self-supervised motion-transfer VAE model to disentangle motion and…
▽ More
Independent components within low-dimensional representations are essential inputs in several downstream tasks, and provide explanations over the observed data. Video-based disentangled factors of variation provide low-dimensional representations that can be identified and used to feed task-specific models. We introduce MTC-VAE, a self-supervised motion-transfer VAE model to disentangle motion and content from videos. Unlike previous work on video content-motion disentanglement, we adopt a chunk-wise modeling approach and take advantage of the motion information contained in spatiotemporal neighborhoods. Our model yields independent per-chunk representations that preserve temporal consistency. Hence, we reconstruct whole videos in a single forward-pass. We extend the ELBO's log-likelihood term and include a Blind Reenactment Loss as an inductive bias to leverage motion disentanglement, under the assumption that swap** motion features yields reenactment between two videos. We evaluate our model with recently-proposed disentanglement metrics and show that it outperforms a variety of methods for video motion-content disentanglement. Experiments on video reenactment show the effectiveness of our disentanglement in the input space where our model outperforms the baselines in reconstruction quality and motion alignment.
△ Less
Submitted 18 February, 2022; v1 submitted 30 January, 2021;
originally announced February 2021.
-
Anomaly Detection based on Zero-Shot Outlier Synthesis and Hierarchical Feature Distillation
Authors:
Adín Ramírez Rivera,
Adil Khan,
Imad E. I. Bekkouch,
Taimoor S. Sheikh
Abstract:
Anomaly detection suffers from unbalanced data since anomalies are quite rare. Synthetically generated anomalies are a solution to such ill or not fully defined data. However, synthesis requires an expressive representation to guarantee the quality of the generated data. In this paper, we propose a two-level hierarchical latent space representation that distills inliers' feature-descriptors (throu…
▽ More
Anomaly detection suffers from unbalanced data since anomalies are quite rare. Synthetically generated anomalies are a solution to such ill or not fully defined data. However, synthesis requires an expressive representation to guarantee the quality of the generated data. In this paper, we propose a two-level hierarchical latent space representation that distills inliers' feature-descriptors (through autoencoders) into more robust representations based on a variational family of distributions (through a variational autoencoder) for zero-shot anomaly generation. From the learned latent distributions, we select those that lie on the outskirts of the training data as synthetic-outlier generators. And, we synthesize from them, i.e., generate negative samples without seen them before, to train binary classifiers. We found that the use of the proposed hierarchical structure for feature distillation and fusion creates robust and general representations that allow us to synthesize pseudo outlier samples. And in turn, train robust binary classifiers for true outlier detection (without the need for actual outliers during training). We demonstrate the performance of our proposal on several benchmarks for anomaly detection.
△ Less
Submitted 10 October, 2020;
originally announced October 2020.
-
Multi-Stream Networks and Ground-Truth Generation for Crowd Counting
Authors:
Rodolfo Quispe,
Darwin Ttito,
Adín Ramírez Rivera,
Helio Pedrini
Abstract:
Crowd scene analysis has received a lot of attention recently due to the wide variety of applications, for instance, forensic science, urban planning, surveillance and security. In this context, a challenging task is known as crowd counting, whose main purpose is to estimate the number of people present in a single image. A Multi-Stream Convolutional Neural Network is developed and evaluated in th…
▽ More
Crowd scene analysis has received a lot of attention recently due to the wide variety of applications, for instance, forensic science, urban planning, surveillance and security. In this context, a challenging task is known as crowd counting, whose main purpose is to estimate the number of people present in a single image. A Multi-Stream Convolutional Neural Network is developed and evaluated in this work, which receives an image as input and produces a density map that represents the spatial distribution of people in an end-to-end fashion. In order to address complex crowd counting issues, such as extremely unconstrained scale and perspective changes, the network architecture utilizes receptive fields with different size filters for each stream. In addition, we investigate the influence of the two most common fashions on the generation of ground truths and propose a hybrid method based on tiny face detection and scale interpolation. Experiments conducted on two challenging datasets, UCF-CC-50 and ShanghaiTech, demonstrate that using our ground truth generation methods achieves superior results.
△ Less
Submitted 11 March, 2020; v1 submitted 23 February, 2020;
originally announced February 2020.
-
A Halo Merger Tree Generation and Evaluation Framework
Authors:
Sandra Robles,
Jonathan S. Gómez,
Adín Ramírez Rivera,
Jenny A. González,
Nelson D. Padilla,
Diego Dujovne
Abstract:
Semi-analytic models are best suited to compare galaxy formation and evolution theories with observations. These models rely heavily on halo merger trees, and their realistic features (i.e., no drastic changes on halo mass or jumps on physical locations). Our aim is to provide a new framework for halo merger tree generation that takes advantage of the results of large volume simulations, with a mo…
▽ More
Semi-analytic models are best suited to compare galaxy formation and evolution theories with observations. These models rely heavily on halo merger trees, and their realistic features (i.e., no drastic changes on halo mass or jumps on physical locations). Our aim is to provide a new framework for halo merger tree generation that takes advantage of the results of large volume simulations, with a modest computational cost. We treat halo merger tree construction as a matrix generation problem, and propose a Generative Adversarial Network that learns to generate realistic halo merger trees. We evaluate our proposal on merger trees from the EAGLE simulation suite, and show the quality of the generated trees.
△ Less
Submitted 21 June, 2019;
originally announced June 2019.
-
Graph Learning Network: A Structure Learning Algorithm
Authors:
Darwin Saire Pilco,
Adín Ramírez Rivera
Abstract:
Recently, graph neural networks (GNNs) have proved to be suitable in tasks on unstructured data. Particularly in tasks as community detection, node classification, and link prediction. However, most GNN models still operate with static relationships. We propose the Graph Learning Network (GLN), a simple yet effective process to learn node embeddings and structure prediction functions. Our model us…
▽ More
Recently, graph neural networks (GNNs) have proved to be suitable in tasks on unstructured data. Particularly in tasks as community detection, node classification, and link prediction. However, most GNN models still operate with static relationships. We propose the Graph Learning Network (GLN), a simple yet effective process to learn node embeddings and structure prediction functions. Our model uses graph convolutions to propose expected node features, and predict the best structure based on them. We repeat these steps recursively to enhance the prediction and the embeddings.
△ Less
Submitted 5 June, 2019; v1 submitted 29 May, 2019;
originally announced May 2019.
-
SART - Similarity, Analogies, and Relatedness for Tatar Language: New Benchmark Datasets for Word Embeddings Evaluation
Authors:
Albina Khusainova,
Adil Khan,
Adín Ramírez Rivera
Abstract:
There is a huge imbalance between languages currently spoken and corresponding resources to study them. Most of the attention naturally goes to the "big" languages: those which have the largest presence in terms of media and number of speakers. Other less represented languages sometimes do not even have a good quality corpus to study them. In this paper, we tackle this imbalance by presenting a ne…
▽ More
There is a huge imbalance between languages currently spoken and corresponding resources to study them. Most of the attention naturally goes to the "big" languages: those which have the largest presence in terms of media and number of speakers. Other less represented languages sometimes do not even have a good quality corpus to study them. In this paper, we tackle this imbalance by presenting a new set of evaluation resources for Tatar, a language of the Turkic language family which is mainly spoken in Tatarstan Republic, Russia.
We present three datasets: Similarity and Relatedness datasets that consist of human scored word pairs and can be used to evaluate semantic models; and Analogies dataset that comprises analogy questions and allows to explore semantic, syntactic, and morphological aspects of language modeling. All three datasets build upon existing datasets for the English language and follow the same structure. However, they are not mere translations. They take into account specifics of the Tatar language and expand beyond the original datasets. We evaluate state-of-the-art word embedding models for two languages using our proposed datasets for Tatar and the original datasets for English and report our findings on performance comparison.
△ Less
Submitted 31 March, 2019;
originally announced April 2019.
-
Faster and Smaller Two-Level Index for Network-based Trajectories
Authors:
Rodrigo Rivera,
Andrea Rodríguez,
Diego Seco
Abstract:
Two-level indexes have been widely used to handle trajectories of moving objects that are constrained to a network. The top-level of these indexes handles the spatial dimension, whereas the bottom level handles the temporal dimension. The latter turns out to be an instance of the interval-intersection problem, but it has been tackled by non-specialized spatial indexes. In this work, we propose the…
▽ More
Two-level indexes have been widely used to handle trajectories of moving objects that are constrained to a network. The top-level of these indexes handles the spatial dimension, whereas the bottom level handles the temporal dimension. The latter turns out to be an instance of the interval-intersection problem, but it has been tackled by non-specialized spatial indexes. In this work, we propose the use of a compact data structure on the bottom level of these indexes. Our experimental evaluation shows that our approach is both faster and smaller than existing solutions.
△ Less
Submitted 4 January, 2019;
originally announced January 2019.
-
Fast inference of deep neural networks in FPGAs for particle physics
Authors:
Javier Duarte,
Song Han,
Philip Harris,
Sergo **dariani,
Edward Kreinar,
Benjamin Kreis,
Jennifer Ngadiuba,
Maurizio Pierini,
Ryan Rivera,
Nhan Tran,
Zhenbin Wu
Abstract:
Recent results at the Large Hadron Collider (LHC) have pointed to enhanced physics capabilities through the improvement of the real-time event processing techniques. Machine learning methods are ubiquitous and have proven to be very powerful in LHC physics, and particle physics as a whole. However, exploration of the use of such techniques in low-latency, low-power FPGA hardware has only just begu…
▽ More
Recent results at the Large Hadron Collider (LHC) have pointed to enhanced physics capabilities through the improvement of the real-time event processing techniques. Machine learning methods are ubiquitous and have proven to be very powerful in LHC physics, and particle physics as a whole. However, exploration of the use of such techniques in low-latency, low-power FPGA hardware has only just begun. FPGA-based trigger and data acquisition (DAQ) systems have extremely low, sub-microsecond latency requirements that are unique to particle physics. We present a case study for neural network inference in FPGAs focusing on a classifier for jet substructure which would enable, among many other physics scenarios, searches for new dark sector particles and novel measurements of the Higgs boson. While we focus on a specific example, the lessons are far-reaching. We develop a package based on High-Level Synthesis (HLS) called hls4ml to build machine learning models in FPGAs. The use of HLS increases accessibility across a broad user community and allows for a drastic decrease in firmware development time. We map out FPGA resource usage and latency versus neural network hyperparameters to identify the problems in particle physics that would benefit from performing neural network inference with FPGAs. For our example jet substructure model, we fit well within the available resources of modern FPGAs with a latency on the scale of 100 ns.
△ Less
Submitted 28 June, 2018; v1 submitted 16 April, 2018;
originally announced April 2018.
-
Applications of Many-Core Technologies to On-line Event Reconstruction in High Energy Physics Experiments
Authors:
A. Gianelle,
S. Amerio,
D. Bastieri,
M. Corvo,
W. Ketchum,
T. Liu,
A. Lonardo,
D. Lucchesi,
S. Poprocki,
R. Rivera,
L. Tosoratto,
P. Vicini,
P. Wittich
Abstract:
Interest in many-core architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of many-core devices when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-up version of the algorithm used at CDF experiment at Tevatron…
▽ More
Interest in many-core architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of many-core devices when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-up version of the algorithm used at CDF experiment at Tevatron for online track reconstruction - the SVT algorithm - as a realistic test-case for low-latency trigger systems using new computing architectures for LHC experiment. We examine the complexity/performance trade-off in porting existing serial algorithms to many-core devices. We measure performance of different architectures (Intel Xeon Phi and AMD GPUs, in addition to NVidia GPUs) and different software environments (OpenCL, in addition to NVidia CUDA). Measurements of both data processing and data transfer latency are shown, considering different I/O strategies to/from the many-core devices.
△ Less
Submitted 4 December, 2013; v1 submitted 3 December, 2013;
originally announced December 2013.
-
Many-core applications to online track reconstruction in HEP experiments
Authors:
S. Amerio,
D. Bastieri,
M. Corvo,
A. Gianelle,
W. Ketchum,
T. Liu,
A. Lonardo,
D. Lucchesi,
S. Poprocki,
R. Rivera,
L. Tosoratto,
P. Vicini,
P. Wittich
Abstract:
Interest in parallel architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of Graphic Processing Units (GPUs) and Intel Many Integrated Core architecture (MIC) when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-…
▽ More
Interest in parallel architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of Graphic Processing Units (GPUs) and Intel Many Integrated Core architecture (MIC) when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-up version of the algorithm used at CDF experiment at Tevatron for online track reconstruction - the SVT algorithm - as a realistic test-case for low-latency trigger systems using new computing architectures for LHC experiment. We examine the complexity/performance trade-off in porting existing serial algorithms to many-core devices. Measurements of both data processing and data transfer latency are shown, considering different I/O strategies to/from the parallel devices.
△ Less
Submitted 11 November, 2013; v1 submitted 2 November, 2013;
originally announced November 2013.