Search | arXiv e-print repository

Representation Learning via Consistent Assignment of Views over Random Partitions

Authors: Thalles Silva, Adín Ramírez Rivera

Abstract: We present Consistent Assignment of Views over Random Partitions (CARP), a self-supervised clustering method for representation learning of visual features. CARP learns prototypes in an end-to-end online fashion using gradient descent without additional non-differentiable modules to solve the cluster assignment problem. CARP optimizes a new pretext task based on random partitions of prototypes tha… ▽ More We present Consistent Assignment of Views over Random Partitions (CARP), a self-supervised clustering method for representation learning of visual features. CARP learns prototypes in an end-to-end online fashion using gradient descent without additional non-differentiable modules to solve the cluster assignment problem. CARP optimizes a new pretext task based on random partitions of prototypes that regularizes the model and enforces consistency between views' assignments. Additionally, our method improves training stability and prevents collapsed solutions in joint-embedding training. Through an extensive evaluation, we demonstrate that CARP's representations are suitable for learning downstream tasks. We evaluate CARP's representations capabilities in 17 datasets across many standard protocols, including linear evaluation, few-shot classification, k-NN, k-means, image retrieval, and copy detection. We compare CARP performance to 11 existing self-supervised methods. We extensively ablate our method and demonstrate that our proposed random partition pretext task improves the quality of the learned representations by devising multiple random classification tasks. In transfer learning tasks, CARP achieves the best performance on average against many SSL methods trained for a longer time. △ Less

Submitted 27 October, 2023; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: To appear in NeurIPS 2023. Code available at https://github.com/sthalles/carp

arXiv:2310.01842 [pdf, other]

SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering

Authors: Bruno Souza, Marius Aasan, Helio Pedrini, Adín Ramírez Rivera

Abstract: The intersection of vision and language is of major interest due to the increased focus on seamless integration between recognition and reasoning. Scene graphs (SGs) have emerged as a useful tool for multimodal image analysis, showing impressive performance in tasks such as Visual Question Answering (VQA). In this work, we demonstrate that despite the effectiveness of scene graphs in VQA tasks, cu… ▽ More The intersection of vision and language is of major interest due to the increased focus on seamless integration between recognition and reasoning. Scene graphs (SGs) have emerged as a useful tool for multimodal image analysis, showing impressive performance in tasks such as Visual Question Answering (VQA). In this work, we demonstrate that despite the effectiveness of scene graphs in VQA tasks, current methods that utilize idealized annotated scene graphs struggle to generalize when using predicted scene graphs extracted from images. To address this issue, we introduce the SelfGraphVQA framework. Our approach extracts a scene graph from an input image using a pre-trained scene graph generator and employs semantically-preserving augmentation with self-supervised techniques. This method improves the utilization of graph representations in VQA tasks by circumventing the need for costly and potentially biased annotated data. By creating alternative views of the extracted graphs through image augmentations, we can learn joint embeddings by optimizing the informational content in their representations using an un-normalized contrastive approach. As we work with SGs, we experiment with three distinct maximization strategies: node-wise, graph-wise, and permutation-equivariant regularization. We empirically showcase the effectiveness of the extracted scene graph for VQA and demonstrate that these approaches enhance overall performance by highlighting the significance of visual information. This offers a more practical solution for VQA tasks that rely on SGs for complex reasoning questions. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: To appear in Vision-and-Language Algorithmic Reasoning Workshop at ICCV 2023

arXiv:2310.00527 [pdf, other]

Self-supervised Learning of Contextualized Local Visual Embeddings

Authors: Thalles Santos Silva, Helio Pedrini, Adín Ramírez Rivera

Abstract: We present Contextualized Local Visual Embeddings (CLoVE), a self-supervised convolutional-based method that learns representations suited for dense prediction tasks. CLoVE deviates from current methods and optimizes a single loss function that operates at the level of contextualized local embeddings learned from output feature maps of convolution neural network (CNN) encoders. To learn contextual… ▽ More We present Contextualized Local Visual Embeddings (CLoVE), a self-supervised convolutional-based method that learns representations suited for dense prediction tasks. CLoVE deviates from current methods and optimizes a single loss function that operates at the level of contextualized local embeddings learned from output feature maps of convolution neural network (CNN) encoders. To learn contextualized embeddings, CLoVE proposes a normalized mult-head self-attention layer that combines local features from different parts of an image based on similarity. We extensively benchmark CLoVE's pre-trained representations on multiple datasets. CLoVE reaches state-of-the-art performance for CNN-based architectures in 4 dense prediction downstream tasks, including object detection, instance segmentation, keypoint detection, and dense pose estimation. △ Less

Submitted 4 October, 2023; v1 submitted 30 September, 2023; originally announced October 2023.

Comments: Pre-print. 4th Visual Inductive Priors for Data-Efficient Deep Learning Workshop ICCV 2023. Code at https://github.com/sthalles/CLoVE

ACM Class: I.4.6; I.4.7

Journal ref: 4th Visual Inductive Priors for Data-Efficient Deep Learning Workshop ICCV 2023

arXiv:2307.00689 [pdf, other]

Camera Calibration from a Single Imaged Ellipsoid: A Moon Calibration Algorithm

Authors: Kalani R. Danas Rivera, Mason A. Peck

Abstract: This work introduces a method that applies images of the extended bodies in the solar system to spacecraft camera calibration. The extended bodies consist of planets and moons that are well-modeled by triaxial ellipsoids. When imaged, the triaxial ellipsoid projects to a conic section which is generally an ellipse. This work combines the imaged ellipse with information on the observer's target-rel… ▽ More This work introduces a method that applies images of the extended bodies in the solar system to spacecraft camera calibration. The extended bodies consist of planets and moons that are well-modeled by triaxial ellipsoids. When imaged, the triaxial ellipsoid projects to a conic section which is generally an ellipse. This work combines the imaged ellipse with information on the observer's target-relative state to achieve camera calibration from a single imaged ellipsoid. As such, this work is the first to accomplish camera calibration from a single, non-spherical imaged ellipsoid. The camera calibration algorithm is applied to synthetic images of ellipsoids as well as planetary images of Saturn's moons as captured by the Cassini spacecraft. From a single image, the algorithm estimates the focal length and principal point of Cassini's Narrow Angle Camera within 1.0 mm and 10 pixels, respectively. With multiple images, the one standard deviation uncertainty in focal length and principal point estimates reduce to 0.5 mm and 3.1 pixels, respectively. Though created for spacecraft camera calibration in mind, this work also generalizes to terrestrial camera calibration using any number of imaged ellipsoids. △ Less

Submitted 2 July, 2023; originally announced July 2023.

arXiv:2207.10653 [pdf, other]

RepFair-GAN: Mitigating Representation Bias in GANs Using Gradient Clip**

Authors: Patrik Joslin Kenfack, Kamil Sabbagh, Adín Ramírez Rivera, Adil Khan

Abstract: Fairness has become an essential problem in many domains of Machine Learning (ML), such as classification, natural language processing, and Generative Adversarial Networks (GANs). In this research effort, we study the unfairness of GANs. We formally define a new fairness notion for generative models in terms of the distribution of generated samples sharing the same protected attributes (gender, ra… ▽ More Fairness has become an essential problem in many domains of Machine Learning (ML), such as classification, natural language processing, and Generative Adversarial Networks (GANs). In this research effort, we study the unfairness of GANs. We formally define a new fairness notion for generative models in terms of the distribution of generated samples sharing the same protected attributes (gender, race, etc.). The defined fairness notion (representational fairness) requires the distribution of the sensitive attributes at the test time to be uniform, and, in particular for GAN model, we show that this fairness notion is violated even when the dataset contains equally represented groups, i.e., the generator favors generating one group of samples over the others at the test time. In this work, we shed light on the source of this representation bias in GANs along with a straightforward method to overcome this problem. We first show on two widely used datasets (MNIST, SVHN) that when the norm of the gradient of one group is more important than the other during the discriminator's training, the generator favours sampling data from one group more than the other at test time. We then show that controlling the groups' gradient norm by performing group-wise gradient norm clip** in the discriminator during the training leads to a more fair data generation in terms of representational fairness compared to existing models while preserving the quality of generated samples. △ Less

Submitted 13 July, 2022; originally announced July 2022.

arXiv:2207.09162 [pdf, other]

doi 10.1109/ACCESS.2022.3192605

Global and Local Features through Gaussian Mixture Models on Image Semantic Segmentation

Authors: Darwin Saire, Adín Ramírez Rivera

Abstract: The semantic segmentation task aims at dense classification at the pixel-wise level. Deep models exhibited progress in tackling this task. However, one remaining problem with these approaches is the loss of spatial precision, often produced at the segmented objects' boundaries. Our proposed model addresses this problem by providing an internal structure for the feature representations while extrac… ▽ More The semantic segmentation task aims at dense classification at the pixel-wise level. Deep models exhibited progress in tackling this task. However, one remaining problem with these approaches is the loss of spatial precision, often produced at the segmented objects' boundaries. Our proposed model addresses this problem by providing an internal structure for the feature representations while extracting a global representation that supports the former. To fit the internal structure, during training, we predict a Gaussian Mixture Model from the data, which, merged with the skip connections and the decoding stage, helps avoid wrong inductive biases. Furthermore, our results show that we can improve semantic segmentation by providing both learning representations (global and local) with a clustering behavior and combining them. Finally, we present results demonstrating our advances in Cityscapes and Synthia datasets. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: Pre-print to appear in IEEE Access. Code available at https://gitlab.com/mipl/phgmm

arXiv:2205.15988 [pdf, other]

doi 10.1093/mnras/stac1569

A deep learning approach to halo merger tree construction

Authors: Sandra Robles, Jonathan S. Gómez, Adín Ramírez Rivera, Nelson D. Padilla, Diego Dujovne

Abstract: A key ingredient for semi-analytic models (SAMs) of galaxy formation is the mass assembly history of haloes, encoded in a tree structure. The most commonly used method to construct halo merger histories is based on the outcomes of high-resolution, computationally intensive N-body simulations. We show that machine learning (ML) techniques, in particular Generative Adversarial Networks (GANs), are a… ▽ More A key ingredient for semi-analytic models (SAMs) of galaxy formation is the mass assembly history of haloes, encoded in a tree structure. The most commonly used method to construct halo merger histories is based on the outcomes of high-resolution, computationally intensive N-body simulations. We show that machine learning (ML) techniques, in particular Generative Adversarial Networks (GANs), are a promising new tool to tackle this problem with a modest computational cost and retaining the best features of merger trees from simulations. We train our GAN model with a limited sample of merger trees from the Evolution and Assembly of GaLaxies and their Environments (EAGLE) simulation suite, constructed using two halo finders-tree builder algorithms: SUBFIND-D-TREES and ROCKSTAR-ConsistentTrees. Our GAN model successfully learns to generate well-constructed merger tree structures with high temporal resolution, and to reproduce the statistical features of the sample of merger trees used for training, when considering up to three variables in the training process. These inputs, whose representations are also learned by our GAN model, are mass of the halo progenitors and the final descendant, progenitor type (main halo or satellite) and distance of a progenitor to that in the main branch. The inclusion of the latter two inputs greatly improves the final learned representation of the halo mass growth history, especially for SUBFIND-like ML trees. When comparing equally sized samples of ML merger trees with those of the EAGLE simulation, we find better agreement for SUBFIND-like ML trees. Finally, our GAN-based framework can be utilised to construct merger histories of low- and intermediate-mass haloes, the most abundant in cosmological simulations. △ Less

Submitted 27 June, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

Comments: 17 pages, 12 figures, 3 tables, 2 appendices. Minor editorial improvements, matches published version

Report number: KCL-PH-TH/2022-33

Journal ref: MNRAS 514, 3692-3708 (2022)

arXiv:2205.06180 [pdf, ps, other]

Massively Scalable Wavelength Diverse Integrated Photonic Linear Neuron

Authors: Matthew van Niekerk, Anthony Rizzo, Hector Rubio Rivera, Gerald Leake, Daniel Coleman, Christopher Tison, Michael Fanto, Keren Bergman, Stefan Preble

Abstract: As computing resource demands continue to escalate in the face of big data, cloud-connectivity and the internet of things, it has become imperative to develop new low-power, scalable architectures. Neuromorphic photonics, or photonic neural networks, have become a feasible solution for the physical implementation of efficient algorithms directly on-chip. This application is primarily due to the li… ▽ More As computing resource demands continue to escalate in the face of big data, cloud-connectivity and the internet of things, it has become imperative to develop new low-power, scalable architectures. Neuromorphic photonics, or photonic neural networks, have become a feasible solution for the physical implementation of efficient algorithms directly on-chip. This application is primarily due to the linear nature of light and the scalability of silicon photonics, specifically leveraging the wide-scale complementary metal-oxide-semiconductor (CMOS) manufacturing infrastructure used to fabricate microelectronics chips. Current neuromorphic photonic implementations stem from two paradigms: wavelength coherent and incoherent. Here, we introduce a novel architecture that supports coherent and incoherent operation to increase the capability and capacity of photonic neural networks with a dramatic reduction in footprint compared to previous demonstrations. As a proof-of-principle, we experimentally demonstrate simple addition and subtraction operations on a foundry-fabricated silicon photonic chip. Additionally, we experimentally validate an on-chip network to predict the logical 2-bit gates AND, OR, and XOR to accuracies of $96.8\%, 99\%,$ and $98.5\%$, respectively. This architecture is compatible with highly wavelength parallel sources, enabling massively scalable photonic neural networks. △ Less

Submitted 25 August, 2022; v1 submitted 11 May, 2022; originally announced May 2022.

arXiv:2203.11768 [pdf]

Sustainable Development Goal Target Interactions in the Philippines: A Two-Method Approach

Authors: Vena Pearl Bongolan, Spencer C. Soria, Roselle Leah K. Rivera

Abstract: In 2015, the United Nations adopted 17 Sustainable Development Goals (SDGs) with 169 targets for transformation toward a more sustainable future by 2030. This study seeks to evaluate and analyze SDG target interactions in the Philippines to resolve conflicting targets, and prioritize targets that reinforce others and have no conflicts. To evaluate all 14196 target interactions, two methods are emp… ▽ More In 2015, the United Nations adopted 17 Sustainable Development Goals (SDGs) with 169 targets for transformation toward a more sustainable future by 2030. This study seeks to evaluate and analyze SDG target interactions in the Philippines to resolve conflicting targets, and prioritize targets that reinforce others and have no conflicts. To evaluate all 14196 target interactions, two methods are employed. First, experts with over five years of SDG-related experience evaluated interactions using a 7-point scale. Second, a non-parametric Spearman rank correlation is used on official indicator data with resulting coefficients serving as interaction scores. Interaction scores are then coded as synergies (interact positively), trade-offs (negatively) or non-classified (neutrally). Targets are also modelled as nodes and interactions as edges in graphs presented in sdginteractions.herokuapp.com. Results from the two methods were synthesized to formulate recommendations for concerned parties. This includes resolving negative intra-goal target interactions involving targets 3.1 'Reduce maternal mortality', 3.6 'Reduce road injuries and deaths', and 3.7 'Universal access to sexual and reproductive care, family planning, and education'. Ugly targets (at least one negative interaction) including target 3.6, 3.7, and 8.2 'Diversify, innovate, and upgrade for economic productivity' need to be resolved. Targets that reinforce their corresponding SDGs should be prioritized, including 1.1 'Eradicate extreme poverty', 4.2 'Equal access to quality pre-primary education', 6.2 'End open defecation and provide access to sanitation and hygiene', 8.1 'Sustainable economic growth', and 9.4 'Upgrade all industries and infrastructures for sustainability'. Beautiful targets (no negative interactions) should also be prioritized, including target 8.5 and 17.5 'Invest in least developed countries'. △ Less

Submitted 28 May, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

arXiv:2112.15421 [pdf, other]

doi 10.1145/3477314.3507267

Representation Learning via Consistent Assignment of Views to Clusters

Authors: Thalles Silva, Adín Ramírez Rivera

Abstract: We introduce Consistent Assignment for Representation Learning (CARL), an unsupervised learning method to learn visual representations by combining ideas from self-supervised contrastive learning and deep clustering. By viewing contrastive learning from a clustering perspective, CARL learns unsupervised representations by learning a set of general prototypes that serve as energy anchors to enforce… ▽ More We introduce Consistent Assignment for Representation Learning (CARL), an unsupervised learning method to learn visual representations by combining ideas from self-supervised contrastive learning and deep clustering. By viewing contrastive learning from a clustering perspective, CARL learns unsupervised representations by learning a set of general prototypes that serve as energy anchors to enforce different views of a given image to be assigned to the same prototype. Unlike contemporary work on contrastive learning with deep clustering, CARL proposes to learn the set of general prototypes in an online fashion, using gradient descent without the necessity of using non-differentiable algorithms or K-Means to solve the cluster assignment problem. CARL surpasses its competitors in many representations learning benchmarks, including linear evaluation, semi-supervised learning, and transfer learning. △ Less

Submitted 16 March, 2022; v1 submitted 31 December, 2021; originally announced December 2021.

Comments: Pre-print. 37th ACM/SIGAPP Symposium on Applied Computing (SAC'22). Code at https://gitlab.com/mipl/carl/

arXiv:2110.13041 [pdf, other]

doi 10.3389/fdata.2022.787421

Applications and Techniques for Fast Machine Learning in Science

Authors: Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini, Thea Aarrestad, Steffen Bahr, Jurgen Becker, Anne-Sophie Berthold, Richard J. Bonventre, Tomas E. Muller Bravo, Markus Diefenthaler, Zhen Dong, Nick Fritzsche, Amir Gholami, Ekaterina Govorkova, Kyle J Hazelwood , et al. (62 additional authors not shown)

Abstract: In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML ac… ▽ More In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlap** challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs. △ Less

Submitted 25 October, 2021; originally announced October 2021.

Comments: 66 pages, 13 figures, 5 tables

Report number: FERMILAB-PUB-21-502-AD-E-SCD

Journal ref: Front. Big Data 5, 787421 (2022)

arXiv:2109.05532 [pdf]

SDG Target Interactions: The Philippine Analysis of Indivisible and Cancelling Targets

Authors: Vena Pearl Bongolan, Arian Allenson M. Valdez, Roselle Leah K. Rivera

Abstract: The United Nations developed the 17 Sustainable Development Goals (SDGs), with 169 targets, to serve as a plan for solving the world's problems and achieving a more sustainable future. This is modeled as a graph with the targets as nodes, and with the interaction between targets as the edges of the graph. An exhaustive binary comparison is done to analyze the intra- and inter-goal target interacti… ▽ More The United Nations developed the 17 Sustainable Development Goals (SDGs), with 169 targets, to serve as a plan for solving the world's problems and achieving a more sustainable future. This is modeled as a graph with the targets as nodes, and with the interaction between targets as the edges of the graph. An exhaustive binary comparison is done to analyze the intra- and inter-goal target interactions, entailing over 14000 comparisons. The task is to assign a 'color' to an edge: positive (indivisible), zero (consistent) or negative (cancelling). This is done via a panel of experts who will evaluate the target interactions, through a web application that was developed for coloring the edges. This is an on-going study, and so far, of the 1256 edges colored, only 36 are cancelling (negative), or 2.86%; more than 97% are positive interactions. So far, the "most negative" interactions involve: "Climate Change"; "Life Below Water"; "Peace, Justice and Strong Institutions"; and "Decent Work and Economic Growth". Most useful for planning might be the 'graph of beautiful targets' feature, which shows target with non-negative interactions, and how they connect to each other. These are the targets that may be worked on simultaneously, and currently has more than 130 nodes. This study can help researchers analyze which targets enable or constrain each other, what mitigation can be done to avoid conflicts, and can be configured for sub-national or regional study. Web app at: http://sdg-interactions.herokuapp.com/ △ Less

Submitted 31 October, 2021; v1 submitted 12 September, 2021; originally announced September 2021.

arXiv:2105.13531 [pdf, other]

doi 10.1109/ACCESS.2021.3085218

Empirical Study of Multi-Task Hourglass Model for Semantic Segmentation Task

Authors: Darwin Saire, Adín Ramírez Rivera

Abstract: The semantic segmentation (SS) task aims to create a dense classification by labeling at the pixel level each object present on images. Convolutional neural network (CNN) approaches have been widely used, and exhibited the best results in this task. However, the loss of spatial precision on the results is a main drawback that has not been solved. In this work, we propose to use a multi-task approa… ▽ More The semantic segmentation (SS) task aims to create a dense classification by labeling at the pixel level each object present on images. Convolutional neural network (CNN) approaches have been widely used, and exhibited the best results in this task. However, the loss of spatial precision on the results is a main drawback that has not been solved. In this work, we propose to use a multi-task approach by complementing the semantic segmentation task with edge detection, semantic contour, and distance transform tasks. We propose that by sharing a common latent space, the complementary tasks can produce more robust representations that can enhance the semantic labels. We explore the influence of contour-based tasks on latent space, as well as their impact on the final results of SS. We demonstrate the effectiveness of learning in a multi-task setting for hourglass models in the Cityscapes, CamVid, and Freiburg Forest datasets by improving the state-of-the-art without any refinement post-processing. △ Less

Submitted 27 May, 2021; originally announced May 2021.

Comments: To appear in IEEE Access. Code available at https://gitlab.com/mipl/mtl-ss

arXiv:2104.02653 [pdf, other]

doi 10.1016/j.eswa.2021.114991

On the Pitfalls of Learning with Limited Data: A Facial Expression Recognition Case Study

Authors: Miguel Rodríguez Santander, Juan Hernández Albarracín, Adín Ramírez Rivera

Abstract: Deep learning models need large amounts of data for training. In video recognition and classification, significant advances were achieved with the introduction of new large databases. However, the creation of large-databases for training is infeasible in several scenarios. Thus, existing or small collected databases are typically joined and amplified to train these models. Nevertheless, training n… ▽ More Deep learning models need large amounts of data for training. In video recognition and classification, significant advances were achieved with the introduction of new large databases. However, the creation of large-databases for training is infeasible in several scenarios. Thus, existing or small collected databases are typically joined and amplified to train these models. Nevertheless, training neural networks on limited data is not straightforward and comes with a set of problems. In this paper, we explore the effects of stacking databases, model initialization, and data amplification techniques when training with limited data on deep learning models' performance. We focused on the problem of Facial Expression Recognition from videos. We performed an extensive study with four databases at a different complexity and nine deep-learning architectures for video classification. We found that (i) complex training sets translate better to more stable test sets when trained with transfer learning and synthetically generated data, but their performance yields a high variance; (ii) training with more detailed data translates to more stable performance on novel scenarios (albeit with lower performance); (iii) merging heterogeneous data is not a straightforward improvement, as the type of augmentation and initialization is crucial; (iv) classical data augmentation cannot fill the holes created by joining largely separated datasets; and (v) inductive biases help to bridge the gap when paired with synthetic data, but this data is not enough when working with standard initialization techniques. △ Less

Submitted 2 April, 2021; originally announced April 2021.

Comments: To appear in Expert Systems with Applications

Journal ref: Expert Syst. Appl. 2021, 18 (1) 114991

arXiv:2103.03589 [pdf, other]

Hierarchical Transformer for Multilingual Machine Translation

Authors: Albina Khusainova, Adil Khan, Adín Ramírez Rivera, Vitaly Romanov

Abstract: The choice of parameter sharing strategy in multilingual machine translation models determines how optimally parameter space is used and hence, directly influences ultimate translation quality. Inspired by linguistic trees that show the degree of relatedness between different languages, the new general approach to parameter sharing in multilingual machine translation was suggested recently. The ma… ▽ More The choice of parameter sharing strategy in multilingual machine translation models determines how optimally parameter space is used and hence, directly influences ultimate translation quality. Inspired by linguistic trees that show the degree of relatedness between different languages, the new general approach to parameter sharing in multilingual machine translation was suggested recently. The main idea is to use these expert language hierarchies as a basis for multilingual architecture: the closer two languages are, the more parameters they share. In this work, we test this idea using the Transformer architecture and show that despite the success in previous work there are problems inherent to training such hierarchical models. We demonstrate that in case of carefully chosen training strategy the hierarchical architecture can outperform bilingual models and multilingual models with full parameter sharing. △ Less

Submitted 5 March, 2021; originally announced March 2021.

Comments: Accepted to VarDial 2021

arXiv:2102.00324 [pdf, other]

doi 10.1109/TIP.2022.3153140

Video Reenactment as Inductive Bias for Content-Motion Disentanglement

Authors: Juan F. Hernández Albarracín, Adín Ramírez Rivera

Abstract: Independent components within low-dimensional representations are essential inputs in several downstream tasks, and provide explanations over the observed data. Video-based disentangled factors of variation provide low-dimensional representations that can be identified and used to feed task-specific models. We introduce MTC-VAE, a self-supervised motion-transfer VAE model to disentangle motion and… ▽ More Independent components within low-dimensional representations are essential inputs in several downstream tasks, and provide explanations over the observed data. Video-based disentangled factors of variation provide low-dimensional representations that can be identified and used to feed task-specific models. We introduce MTC-VAE, a self-supervised motion-transfer VAE model to disentangle motion and content from videos. Unlike previous work on video content-motion disentanglement, we adopt a chunk-wise modeling approach and take advantage of the motion information contained in spatiotemporal neighborhoods. Our model yields independent per-chunk representations that preserve temporal consistency. Hence, we reconstruct whole videos in a single forward-pass. We extend the ELBO's log-likelihood term and include a Blind Reenactment Loss as an inductive bias to leverage motion disentanglement, under the assumption that swap** motion features yields reenactment between two videos. We evaluate our model with recently-proposed disentanglement metrics and show that it outperforms a variety of methods for video motion-content disentanglement. Experiments on video reenactment show the effectiveness of our disentanglement in the input space where our model outperforms the baselines in reconstruction quality and motion alignment. △ Less

Submitted 18 February, 2022; v1 submitted 30 January, 2021; originally announced February 2021.

Comments: Pre-print to appear in IEEE Trans. on Image Processing. Project page, high resolution images, and source code at https://mipl.gitlab.io/mtc-vae/

arXiv:2010.05119 [pdf, other]

doi 10.1109/TNNLS.2020.3027667

Anomaly Detection based on Zero-Shot Outlier Synthesis and Hierarchical Feature Distillation

Authors: Adín Ramírez Rivera, Adil Khan, Imad E. I. Bekkouch, Taimoor S. Sheikh

Abstract: Anomaly detection suffers from unbalanced data since anomalies are quite rare. Synthetically generated anomalies are a solution to such ill or not fully defined data. However, synthesis requires an expressive representation to guarantee the quality of the generated data. In this paper, we propose a two-level hierarchical latent space representation that distills inliers' feature-descriptors (throu… ▽ More Anomaly detection suffers from unbalanced data since anomalies are quite rare. Synthetically generated anomalies are a solution to such ill or not fully defined data. However, synthesis requires an expressive representation to guarantee the quality of the generated data. In this paper, we propose a two-level hierarchical latent space representation that distills inliers' feature-descriptors (through autoencoders) into more robust representations based on a variational family of distributions (through a variational autoencoder) for zero-shot anomaly generation. From the learned latent distributions, we select those that lie on the outskirts of the training data as synthetic-outlier generators. And, we synthesize from them, i.e., generate negative samples without seen them before, to train binary classifiers. We found that the use of the proposed hierarchical structure for feature distillation and fusion creates robust and general representations that allow us to synthesize pseudo outlier samples. And in turn, train robust binary classifiers for true outlier detection (without the need for actual outliers during training). We demonstrate the performance of our proposal on several benchmarks for anomaly detection. △ Less

Submitted 10 October, 2020; originally announced October 2020.

Comments: To appear in IEEE Trans. on Neural Networks and Learning Systems

arXiv:2002.09951 [pdf]

Multi-Stream Networks and Ground-Truth Generation for Crowd Counting

Authors: Rodolfo Quispe, Darwin Ttito, Adín Ramírez Rivera, Helio Pedrini

Abstract: Crowd scene analysis has received a lot of attention recently due to the wide variety of applications, for instance, forensic science, urban planning, surveillance and security. In this context, a challenging task is known as crowd counting, whose main purpose is to estimate the number of people present in a single image. A Multi-Stream Convolutional Neural Network is developed and evaluated in th… ▽ More Crowd scene analysis has received a lot of attention recently due to the wide variety of applications, for instance, forensic science, urban planning, surveillance and security. In this context, a challenging task is known as crowd counting, whose main purpose is to estimate the number of people present in a single image. A Multi-Stream Convolutional Neural Network is developed and evaluated in this work, which receives an image as input and produces a density map that represents the spatial distribution of people in an end-to-end fashion. In order to address complex crowd counting issues, such as extremely unconstrained scale and perspective changes, the network architecture utilizes receptive fields with different size filters for each stream. In addition, we investigate the influence of the two most common fashions on the generation of ground truths and propose a hybrid method based on tiny face detection and scale interpolation. Experiments conducted on two challenging datasets, UCF-CC-50 and ShanghaiTech, demonstrate that using our ground truth generation methods achieves superior results. △ Less

Submitted 11 March, 2020; v1 submitted 23 February, 2020; originally announced February 2020.

Comments: https://github.com/RQuispeC/multi-stream-crowd-counting-extended , The International Journal of Electrical and Computer Engineering Systems 2020

arXiv:1906.09382 [pdf, other]

A Halo Merger Tree Generation and Evaluation Framework

Authors: Sandra Robles, Jonathan S. Gómez, Adín Ramírez Rivera, Jenny A. González, Nelson D. Padilla, Diego Dujovne

Abstract: Semi-analytic models are best suited to compare galaxy formation and evolution theories with observations. These models rely heavily on halo merger trees, and their realistic features (i.e., no drastic changes on halo mass or jumps on physical locations). Our aim is to provide a new framework for halo merger tree generation that takes advantage of the results of large volume simulations, with a mo… ▽ More Semi-analytic models are best suited to compare galaxy formation and evolution theories with observations. These models rely heavily on halo merger trees, and their realistic features (i.e., no drastic changes on halo mass or jumps on physical locations). Our aim is to provide a new framework for halo merger tree generation that takes advantage of the results of large volume simulations, with a modest computational cost. We treat halo merger tree construction as a matrix generation problem, and propose a Generative Adversarial Network that learns to generate realistic halo merger trees. We evaluate our proposal on merger trees from the EAGLE simulation suite, and show the quality of the generated trees. △ Less

Submitted 21 June, 2019; originally announced June 2019.

Comments: 11 pages, 7 figures, 2 tables, 3 appendices. Presented at the ICML 2019 Workshop on Theoretical Physics for Deep Learning

arXiv:1905.12665 [pdf, other]

Graph Learning Network: A Structure Learning Algorithm

Authors: Darwin Saire Pilco, Adín Ramírez Rivera

Abstract: Recently, graph neural networks (GNNs) have proved to be suitable in tasks on unstructured data. Particularly in tasks as community detection, node classification, and link prediction. However, most GNN models still operate with static relationships. We propose the Graph Learning Network (GLN), a simple yet effective process to learn node embeddings and structure prediction functions. Our model us… ▽ More Recently, graph neural networks (GNNs) have proved to be suitable in tasks on unstructured data. Particularly in tasks as community detection, node classification, and link prediction. However, most GNN models still operate with static relationships. We propose the Graph Learning Network (GLN), a simple yet effective process to learn node embeddings and structure prediction functions. Our model uses graph convolutions to propose expected node features, and predict the best structure based on them. We repeat these steps recursively to enhance the prediction and the embeddings. △ Less

Submitted 5 June, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

Comments: Accepted for publication at ICML 2019 Workshop on Learning and Reasoning with Graph-Structured Data. Code available at https://gitlab.com/mipl/graph-learning-network

arXiv:1904.00365 [pdf, ps, other]

SART - Similarity, Analogies, and Relatedness for Tatar Language: New Benchmark Datasets for Word Embeddings Evaluation

Authors: Albina Khusainova, Adil Khan, Adín Ramírez Rivera

Abstract: There is a huge imbalance between languages currently spoken and corresponding resources to study them. Most of the attention naturally goes to the "big" languages: those which have the largest presence in terms of media and number of speakers. Other less represented languages sometimes do not even have a good quality corpus to study them. In this paper, we tackle this imbalance by presenting a ne… ▽ More There is a huge imbalance between languages currently spoken and corresponding resources to study them. Most of the attention naturally goes to the "big" languages: those which have the largest presence in terms of media and number of speakers. Other less represented languages sometimes do not even have a good quality corpus to study them. In this paper, we tackle this imbalance by presenting a new set of evaluation resources for Tatar, a language of the Turkic language family which is mainly spoken in Tatarstan Republic, Russia. We present three datasets: Similarity and Relatedness datasets that consist of human scored word pairs and can be used to evaluate semantic models; and Analogies dataset that comprises analogy questions and allows to explore semantic, syntactic, and morphological aspects of language modeling. All three datasets build upon existing datasets for the English language and follow the same structure. However, they are not mere translations. They take into account specifics of the Tatar language and expand beyond the original datasets. We evaluate state-of-the-art word embedding models for two languages using our proposed datasets for Tatar and the original datasets for English and report our findings on performance comparison. △ Less

Submitted 31 March, 2019; originally announced April 2019.

Comments: The datasets are available at https://github.com/tat-nlp/SART

arXiv:1901.01172 [pdf, other]

doi 10.1007/978-3-030-00479-8_28

Faster and Smaller Two-Level Index for Network-based Trajectories

Authors: Rodrigo Rivera, Andrea Rodríguez, Diego Seco

Abstract: Two-level indexes have been widely used to handle trajectories of moving objects that are constrained to a network. The top-level of these indexes handles the spatial dimension, whereas the bottom level handles the temporal dimension. The latter turns out to be an instance of the interval-intersection problem, but it has been tackled by non-specialized spatial indexes. In this work, we propose the… ▽ More Two-level indexes have been widely used to handle trajectories of moving objects that are constrained to a network. The top-level of these indexes handles the spatial dimension, whereas the bottom level handles the temporal dimension. The latter turns out to be an instance of the interval-intersection problem, but it has been tackled by non-specialized spatial indexes. In this work, we propose the use of a compact data structure on the bottom level of these indexes. Our experimental evaluation shows that our approach is both faster and smaller than existing solutions. △ Less

Submitted 4 January, 2019; originally announced January 2019.

Comments: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941

Journal ref: Proceedings of the 25th International Symposium on String Processing and Information Retrieval (SPIRE 2018)

arXiv:1804.06913 [pdf, other]

doi 10.1088/1748-0221/13/07/P07027

Fast inference of deep neural networks in FPGAs for particle physics

Authors: Javier Duarte, Song Han, Philip Harris, Sergo **dariani, Edward Kreinar, Benjamin Kreis, Jennifer Ngadiuba, Maurizio Pierini, Ryan Rivera, Nhan Tran, Zhenbin Wu

Abstract: Recent results at the Large Hadron Collider (LHC) have pointed to enhanced physics capabilities through the improvement of the real-time event processing techniques. Machine learning methods are ubiquitous and have proven to be very powerful in LHC physics, and particle physics as a whole. However, exploration of the use of such techniques in low-latency, low-power FPGA hardware has only just begu… ▽ More Recent results at the Large Hadron Collider (LHC) have pointed to enhanced physics capabilities through the improvement of the real-time event processing techniques. Machine learning methods are ubiquitous and have proven to be very powerful in LHC physics, and particle physics as a whole. However, exploration of the use of such techniques in low-latency, low-power FPGA hardware has only just begun. FPGA-based trigger and data acquisition (DAQ) systems have extremely low, sub-microsecond latency requirements that are unique to particle physics. We present a case study for neural network inference in FPGAs focusing on a classifier for jet substructure which would enable, among many other physics scenarios, searches for new dark sector particles and novel measurements of the Higgs boson. While we focus on a specific example, the lessons are far-reaching. We develop a package based on High-Level Synthesis (HLS) called hls4ml to build machine learning models in FPGAs. The use of HLS increases accessibility across a broad user community and allows for a drastic decrease in firmware development time. We map out FPGA resource usage and latency versus neural network hyperparameters to identify the problems in particle physics that would benefit from performing neural network inference with FPGAs. For our example jet substructure model, we fit well within the available resources of modern FPGAs with a latency on the scale of 100 ns. △ Less

Submitted 28 June, 2018; v1 submitted 16 April, 2018; originally announced April 2018.

Comments: 22 pages, 17 figures, 2 tables, JINST revision

Report number: FERMILAB-PUB-18-089-E

Journal ref: JINST 13 P07027 (2018)

arXiv:1312.0917 [pdf, other]

doi 10.1109/NSSMIC.2013.6829552

Applications of Many-Core Technologies to On-line Event Reconstruction in High Energy Physics Experiments

Authors: A. Gianelle, S. Amerio, D. Bastieri, M. Corvo, W. Ketchum, T. Liu, A. Lonardo, D. Lucchesi, S. Poprocki, R. Rivera, L. Tosoratto, P. Vicini, P. Wittich

Abstract: Interest in many-core architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of many-core devices when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-up version of the algorithm used at CDF experiment at Tevatron… ▽ More Interest in many-core architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of many-core devices when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-up version of the algorithm used at CDF experiment at Tevatron for online track reconstruction - the SVT algorithm - as a realistic test-case for low-latency trigger systems using new computing architectures for LHC experiment. We examine the complexity/performance trade-off in porting existing serial algorithms to many-core devices. We measure performance of different architectures (Intel Xeon Phi and AMD GPUs, in addition to NVidia GPUs) and different software environments (OpenCL, in addition to NVidia CUDA). Measurements of both data processing and data transfer latency are shown, considering different I/O strategies to/from the many-core devices. △ Less

Submitted 4 December, 2013; v1 submitted 3 December, 2013; originally announced December 2013.

Comments: Proceedings for 2013 IEEE NSS/MIC conference; fixed author list omission

arXiv:1311.0380 [pdf, other]

doi 10.1088/1742-6596/513/1/012002

Many-core applications to online track reconstruction in HEP experiments

Authors: S. Amerio, D. Bastieri, M. Corvo, A. Gianelle, W. Ketchum, T. Liu, A. Lonardo, D. Lucchesi, S. Poprocki, R. Rivera, L. Tosoratto, P. Vicini, P. Wittich

Abstract: Interest in parallel architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of Graphic Processing Units (GPUs) and Intel Many Integrated Core architecture (MIC) when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-… ▽ More Interest in parallel architectures applied to real time selections is growing in High Energy Physics (HEP) experiments. In this paper we describe performance measurements of Graphic Processing Units (GPUs) and Intel Many Integrated Core architecture (MIC) when applied to a typical HEP online task: the selection of events based on the trajectories of charged particles. We use as benchmark a scaled-up version of the algorithm used at CDF experiment at Tevatron for online track reconstruction - the SVT algorithm - as a realistic test-case for low-latency trigger systems using new computing architectures for LHC experiment. We examine the complexity/performance trade-off in porting existing serial algorithms to many-core devices. Measurements of both data processing and data transfer latency are shown, considering different I/O strategies to/from the parallel devices. △ Less

Submitted 11 November, 2013; v1 submitted 2 November, 2013; originally announced November 2013.

Comments: Proceedings for the 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP); missing acks added

Showing 1–25 of 25 results for author: Rivera, R