-
HyperFast: Instant Classification for Tabular Data
Authors:
David Bonet,
Daniel Mas Montserrat,
Xavier Giró-i-Nieto,
Alexander G. Ioannidis
Abstract:
Training deep learning models and performing hyperparameter tuning can be computationally demanding and time-consuming. Meanwhile, traditional machine learning methods like gradient-boosting algorithms remain the preferred choice for most tabular data applications, while neural network alternatives require extensive hyperparameter tuning or work only in toy datasets under limited settings. In this…
▽ More
Training deep learning models and performing hyperparameter tuning can be computationally demanding and time-consuming. Meanwhile, traditional machine learning methods like gradient-boosting algorithms remain the preferred choice for most tabular data applications, while neural network alternatives require extensive hyperparameter tuning or work only in toy datasets under limited settings. In this paper, we introduce HyperFast, a meta-trained hypernetwork designed for instant classification of tabular data in a single forward pass. HyperFast generates a task-specific neural network tailored to an unseen dataset that can be directly used for classification inference, removing the need for training a model. We report extensive experiments with OpenML and genomic data, comparing HyperFast to competing tabular data neural networks, traditional ML methods, AutoML systems, and boosting machines. HyperFast shows highly competitive results, while being significantly faster. Additionally, our approach demonstrates robust adaptability across a variety of classification tasks with little to no fine-tuning, positioning HyperFast as a strong solution for numerous applications and rapid model deployment. HyperFast introduces a promising paradigm for fast classification, with the potential to substantially decrease the computational burden of deep learning. Our code, which offers a scikit-learn-like interface, along with the trained HyperFast model, can be found at https://github.com/AI-sandbox/HyperFast.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Adversarial Learning for Feature Shift Detection and Correction
Authors:
Miriam Barrabes,
Daniel Mas Montserrat,
Margarita Geleta,
Xavier Giro-i-Nieto,
Alexander G. Ioannidis
Abstract:
Data shift is a phenomenon present in many real-world applications, and while there are multiple methods attempting to detect shifts, the task of localizing and correcting the features originating such shifts has not been studied in depth. Feature shifts can occur in many datasets, including in multi-sensor data, where some sensors are malfunctioning, or in tabular and structured data, including b…
▽ More
Data shift is a phenomenon present in many real-world applications, and while there are multiple methods attempting to detect shifts, the task of localizing and correcting the features originating such shifts has not been studied in depth. Feature shifts can occur in many datasets, including in multi-sensor data, where some sensors are malfunctioning, or in tabular and structured data, including biomedical, financial, and survey data, where faulty standardization and data processing pipelines can lead to erroneous features. In this work, we explore using the principles of adversarial learning, where the information from several discriminators trained to distinguish between two distributions is used to both detect the corrupted features and fix them in order to remove the distribution shift between datasets. We show that mainstream supervised classifiers, such as random forest or gradient boosting trees, combined with simple iterative heuristics, can localize and correct feature shifts, outperforming current statistical and neural network-based techniques. The code is available at https://github.com/AI-sandbox/DataFix.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Manipulation Detection in Satellite Images Using Vision Transformer
Authors:
János Horváth,
Sriram Baireddy,
Hanxiang Hao,
Daniel Mas Montserrat,
Edward J. Delp
Abstract:
A growing number of commercial satellite companies provide easily accessible satellite imagery. Overhead imagery is used by numerous industries including agriculture, forestry, natural disaster analysis, and meteorology. Satellite images, just as any other images, can be tampered with image manipulation tools. Manipulation detection methods created for images captured by "consumer cameras" tend to…
▽ More
A growing number of commercial satellite companies provide easily accessible satellite imagery. Overhead imagery is used by numerous industries including agriculture, forestry, natural disaster analysis, and meteorology. Satellite images, just as any other images, can be tampered with image manipulation tools. Manipulation detection methods created for images captured by "consumer cameras" tend to fail when used on satellite images due to the differences in image sensors, image acquisition, and processing. In this paper we propose an unsupervised technique that uses a Vision Transformer to detect spliced areas within satellite images. We introduce a new dataset which includes manipulated satellite images that contain spliced objects. We show that our proposed approach performs better than existing unsupervised splicing detection techniques.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.
-
Saliency-Aware Class-Agnostic Food Image Segmentation
Authors:
Sri Kalyan Yarlagadda,
Daniel Mas Montserrat,
David Guerra,
Carol J. Boushey,
Deborah A. Kerr,
Fengqing Zhu
Abstract:
Advances in image-based dietary assessment methods have allowed nutrition professionals and researchers to improve the accuracy of dietary assessment, where images of food consumed are captured using smartphones or wearable devices. These images are then analyzed using computer vision methods to estimate energy and nutrition content of the foods. Food image segmentation, which determines the regio…
▽ More
Advances in image-based dietary assessment methods have allowed nutrition professionals and researchers to improve the accuracy of dietary assessment, where images of food consumed are captured using smartphones or wearable devices. These images are then analyzed using computer vision methods to estimate energy and nutrition content of the foods. Food image segmentation, which determines the regions in an image where foods are located, plays an important role in this process. Current methods are data dependent, thus cannot generalize well for different food types. To address this problem, we propose a class-agnostic food image segmentation method. Our method uses a pair of eating scene images, one before start eating and one after eating is completed. Using information from both the before and after eating images, we can segment food images by finding the salient missing objects without any prior information about the food class. We model a paradigm of top down saliency which guides the attention of the human visual system (HVS) based on a task to find the salient missing objects in a pair of images. Our method is validated on food images collected from a dietary study which showed promising results.
△ Less
Submitted 13 February, 2021;
originally announced February 2021.
-
Generative Autoregressive Ensembles for Satellite Imagery Manipulation Detection
Authors:
Daniel Mas Montserrat,
János Horváth,
S. K. Yarlagadda,
Fengqing Zhu,
Edward J. Delp
Abstract:
Satellite imagery is becoming increasingly accessible due to the growing number of orbiting commercial satellites. Many applications make use of such images: agricultural management, meteorological prediction, damage assessment from natural disasters, or cartography are some of the examples. Unfortunately, these images can be easily tampered and modified with image manipulation tools damaging down…
▽ More
Satellite imagery is becoming increasingly accessible due to the growing number of orbiting commercial satellites. Many applications make use of such images: agricultural management, meteorological prediction, damage assessment from natural disasters, or cartography are some of the examples. Unfortunately, these images can be easily tampered and modified with image manipulation tools damaging downstream applications. Because the nature of the manipulation applied to the image is typically unknown, unsupervised methods that don't require prior knowledge of the tampering techniques used are preferred. In this paper, we use ensembles of generative autoregressive models to model the distribution of the pixels of the image in order to detect potential manipulations. We evaluate the performance of the presented approach obtaining accurate localization results compared to previously presented approaches.
△ Less
Submitted 8 October, 2020;
originally announced October 2020.
-
Manipulation Detection in Satellite Images Using Deep Belief Networks
Authors:
János Horváth,
Daniel Mas Montserrat,
Hanxiang Hao,
Edward J. Delp
Abstract:
Satellite images are more accessible with the increase of commercial satellites being orbited. These images are used in a wide range of applications including agricultural management, meteorological prediction, damage assessment from natural disasters, and cartography. Image manipulation tools including both manual editing tools and automated techniques can be easily used to tamper and modify sate…
▽ More
Satellite images are more accessible with the increase of commercial satellites being orbited. These images are used in a wide range of applications including agricultural management, meteorological prediction, damage assessment from natural disasters, and cartography. Image manipulation tools including both manual editing tools and automated techniques can be easily used to tamper and modify satellite imagery. One type of manipulation that we examine in this paper is the splice attack where a region from one image (or the same image) is inserted (spliced) into an image. In this paper, we present a one-class detection method based on deep belief networks (DBN) for splicing detection and localization without using any prior knowledge of the manipulations. We evaluate the performance of our approach and show that it provides good detection and localization accuracies in small forgeries compared to other approaches.
△ Less
Submitted 26 April, 2020;
originally announced April 2020.
-
Addressing Ancestry Disparities in Genomic Medicine: A Geographic-aware Algorithm
Authors:
Daniel Mas Montserrat,
Arvind Kumar,
Carlos Bustamante,
Alexander Ioannidis
Abstract:
With declining sequencing costs a promising and affordable tool is emerging in cancer diagnostics: genomics. By using association studies, genomic variants that predispose patients to specific cancers can be identified, while by using tumor genomics cancer types can be characterized for targeted treatment. However, a severe disparity is rapidly emerging in this new area of precision cancer diagnos…
▽ More
With declining sequencing costs a promising and affordable tool is emerging in cancer diagnostics: genomics. By using association studies, genomic variants that predispose patients to specific cancers can be identified, while by using tumor genomics cancer types can be characterized for targeted treatment. However, a severe disparity is rapidly emerging in this new area of precision cancer diagnosis and treatment planning, one which separates a few genetically well-characterized populations (predominantly European) from all other global populations. Here we discuss the problem of population-specific genetic associations, which is driving this disparity, and present a novel solution--coordinate-based local ancestry--for hel** to address it. We demonstrate our boosting-based method on whole genome data from divergent groups across Africa and in the process observe signals that may stem from the transcontinental Bantu-expansion.
△ Less
Submitted 25 April, 2020;
originally announced April 2020.
-
Deepfakes Detection with Automatic Face Weighting
Authors:
Daniel Mas Montserrat,
Hanxiang Hao,
S. K. Yarlagadda,
Sriram Baireddy,
Ruiting Shao,
János Horváth,
Emily Bartusiak,
Justin Yang,
David Güera,
Fengqing Zhu,
Edward J. Delp
Abstract:
Altered and manipulated multimedia is increasingly present and widely distributed via social media platforms. Advanced video manipulation tools enable the generation of highly realistic-looking altered multimedia. While many methods have been presented to detect manipulations, most of them fail when evaluated with data outside of the datasets used in research environments. In order to address this…
▽ More
Altered and manipulated multimedia is increasingly present and widely distributed via social media platforms. Advanced video manipulation tools enable the generation of highly realistic-looking altered multimedia. While many methods have been presented to detect manipulations, most of them fail when evaluated with data outside of the datasets used in research environments. In order to address this problem, the Deepfake Detection Challenge (DFDC) provides a large dataset of videos containing realistic manipulations and an evaluation system that ensures that methods work quickly and accurately, even when faced with challenging data. In this paper, we introduce a method based on convolutional neural networks (CNNs) and recurrent neural networks (RNNs) that extracts visual and temporal features from faces present in videos to accurately detect manipulations. The method is evaluated with the DFDC dataset, providing competitive results compared to other techniques.
△ Less
Submitted 4 May, 2020; v1 submitted 24 April, 2020;
originally announced April 2020.
-
LAI-Net: Local-Ancestry Inference with Neural Networks
Authors:
Daniel Mas Montserrat,
Carlos Bustamante,
Alexander Ioannidis
Abstract:
Local-ancestry inference (LAI), also referred to as ancestry deconvolution, provides high-resolution ancestry estimation along the human genome. In both research and industry, LAI is emerging as a critical step in DNA sequence analysis with applications extending from polygenic risk scores (used to predict traits in embryos and disease risk in adults) to genome-wide association studies, and from p…
▽ More
Local-ancestry inference (LAI), also referred to as ancestry deconvolution, provides high-resolution ancestry estimation along the human genome. In both research and industry, LAI is emerging as a critical step in DNA sequence analysis with applications extending from polygenic risk scores (used to predict traits in embryos and disease risk in adults) to genome-wide association studies, and from pharmacogenomics to inference of human population history. While many LAI methods have been developed, advances in computing hardware (GPUs) combined with machine learning techniques, such as neural networks, are enabling the development of new methods that are fast, robust and easily shared and stored. In this paper we develop the first neural network based LAI method, named LAI-Net, providing competitive accuracy with state-of-the-art methods and robustness to missing or noisy data, while having a small number of layers.
△ Less
Submitted 21 April, 2020;
originally announced April 2020.
-
Class-Conditional VAE-GAN for Local-Ancestry Simulation
Authors:
Daniel Mas Montserrat,
Carlos Bustamante,
Alexander Ioannidis
Abstract:
Local ancestry inference (LAI) allows identification of the ancestry of all chromosomal segments in admixed individuals, and it is a critical step in the analysis of human genomes with applications from pharmacogenomics and precision medicine to genome-wide association studies. In recent years, many LAI techniques have been developed in both industry and academic research. However, these methods r…
▽ More
Local ancestry inference (LAI) allows identification of the ancestry of all chromosomal segments in admixed individuals, and it is a critical step in the analysis of human genomes with applications from pharmacogenomics and precision medicine to genome-wide association studies. In recent years, many LAI techniques have been developed in both industry and academic research. However, these methods require large training data sets of human genomic sequences from the ancestries of interest. Such reference data sets are usually limited, proprietary, protected by privacy restrictions, or otherwise not accessible to the public. Techniques to generate training samples that resemble real haploid sequences from ancestries of interest can be useful tools in such scenarios, since a generalized model can often be shared, but the unique human sample sequences cannot. In this work we present a class-conditional VAE-GAN to generate new human genomic sequences that can be used to train local ancestry inference (LAI) algorithms. We evaluate the quality of our generated data by comparing the performance of a state-of-the-art LAI method when trained with generated versus real data.
△ Less
Submitted 27 November, 2019;
originally announced November 2019.
-
Multi-View Matching Network for 6D Pose Estimation
Authors:
Daniel Mas Montserrat,
Jianhang Chen,
Qian Lin,
Jan P. Allebach,
Edward J. Delp
Abstract:
Applications that interact with the real world such as augmented reality or robot manipulation require a good understanding of the location and pose of the surrounding objects. In this paper, we present a new approach to estimate the 6 Degree of Freedom (DoF) or 6D pose of objects from a single RGB image. Our approach can be paired with an object detection and segmentation method to estimate, refi…
▽ More
Applications that interact with the real world such as augmented reality or robot manipulation require a good understanding of the location and pose of the surrounding objects. In this paper, we present a new approach to estimate the 6 Degree of Freedom (DoF) or 6D pose of objects from a single RGB image. Our approach can be paired with an object detection and segmentation method to estimate, refine and track the pose of the objects by matching the input image with rendered images.
△ Less
Submitted 27 November, 2019;
originally announced November 2019.