Search | arXiv e-print repository

Density-based Isometric Map**

Authors: Bardia Yousefi, Mélina Khansari, Ryan Trask, Patrick Tallon, Carina Carino, Arman Afrasiyabi, Vikas Kundra, Lan Ma, Lei Ren, Keyvan Farahani, Michelle Hershman

Abstract: The isometric map** method employs the shortest path algorithm to estimate the Euclidean distance between points on High dimensional (HD) manifolds. This may not be sufficient for weakly uniformed HD data as it could lead to overestimating distances between far neighboring points, resulting in inconsistencies between the intrinsic (local) and extrinsic (global) distances during the projection. T… ▽ More The isometric map** method employs the shortest path algorithm to estimate the Euclidean distance between points on High dimensional (HD) manifolds. This may not be sufficient for weakly uniformed HD data as it could lead to overestimating distances between far neighboring points, resulting in inconsistencies between the intrinsic (local) and extrinsic (global) distances during the projection. To address this issue, we modify the shortest path algorithm by adding a novel constraint inspired by the Parzen-Rosenblatt (PR) window, which helps to maintain the uniformity of the constructed shortest-path graph in Isomap. Multiple imaging datasets overall of 72,236 cases, 70,000 MINST data, 1596 from multiple Chest-XRay pneumonia datasets, and three NSCLC CT/PET datasets with a total of 640 lung cancer patients, were used to benchmark and validate PR-Isomap. 431 imaging biomarkers were extracted from each modality. Our results indicate that PR-Isomap projects HD attributes into a lower-dimensional (LD) space while preserving information, visualized by the MNIST dataset indicating the maintaining local and global distances. PR-Isomap achieved the highest comparative accuracies of 80.9% (STD:5.8) for pneumonia and 78.5% (STD:4.4), 88.4% (STD:1.4), and 61.4% (STD:11.4) for three NSCLC datasets, with a confidence interval of 95% for outcome prediction. Similarly, the multivariate Cox model showed higher overall survival, measured with c-statistics and log-likelihood test, of PR-Isomap compared to other dimensionality reduction methods. Kaplan Meier survival curve also signifies the notable ability of PR-Isomap to distinguish between high-risk and low-risk patients using multimodal imaging biomarkers preserving HD imaging characteristics for precision medicine. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: This is the author's version

arXiv:2305.05023 [pdf, other]

Domain Agnostic Image-to-image Translation using Low-Resolution Conditioning

Authors: Mohamed Abid, Arman Afrasiyabi, Ihsen Hedhli, Jean-François Lalonde, Christian Gagné

Abstract: Generally, image-to-image translation (i2i) methods aim at learning map**s across domains with the assumption that the images used for translation share content (e.g., pose) but have their own domain-specific information (a.k.a. style). Conditioned on a target image, such methods extract the target style and combine it with the source image content, kee** coherence between the domains. In our… ▽ More Generally, image-to-image translation (i2i) methods aim at learning map**s across domains with the assumption that the images used for translation share content (e.g., pose) but have their own domain-specific information (a.k.a. style). Conditioned on a target image, such methods extract the target style and combine it with the source image content, kee** coherence between the domains. In our proposal, we depart from this traditional view and instead consider the scenario where the target domain is represented by a very low-resolution (LR) image, proposing a domain-agnostic i2i method for fine-grained problems, where the domains are related. More specifically, our domain-agnostic approach aims at generating an image that combines visual features from the source image with low-frequency information (e.g. pose, color) of the LR target image. To do so, we present a novel approach that relies on training the generative model to produce images that both share distinctive information of the associated source image and correctly match the LR target image when downscaled. We validate our method on the CelebA-HQ and AFHQ datasets by demonstrating improvements in terms of visual quality. Qualitative and quantitative results show that when dealing with intra-domain image translation, our method generates realistic samples compared to state-of-the-art methods such as StarGAN v2. Ablation studies also reveal that our method is robust to changes in color, it can be applied to out-of-distribution images, and it allows for manual control over the final results. △ Less

Submitted 10 May, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Comments: 19 pages, 23 figures. arXiv admin note: substantial text overlap with arXiv:2107.11262. Under consideration in Computer Vision and Image Understanding

arXiv:2304.09691 [pdf, other]

DarSwin: Distortion Aware Radial Swin Transformer

Authors: Akshaya Athwale, Ichrak Shili, Émile Bergeron, Arman Afrasiyabi, Justin Lagüe, Ola Ahmad, Jean-François Lalonde

Abstract: Wide-angle lenses are commonly used in perception tasks requiring a large field of view. Unfortunately, these lenses produce significant distortions, making conventional models that ignore the distortion effects unable to adapt to wide-angle images. In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. Our proposed imag… ▽ More Wide-angle lenses are commonly used in perception tasks requiring a large field of view. Unfortunately, these lenses produce significant distortions, making conventional models that ignore the distortion effects unable to adapt to wide-angle images. In this paper, we present a novel transformer-based model that automatically adapts to the distortion produced by wide-angle lenses. Our proposed image encoder architecture, dubbed DarSwin, leverages the physical characteristics of such lenses analytically defined by the radial distortion profile. In contrast to conventional transformer-based architectures, DarSwin comprises a radial patch partitioning, a distortion-based sampling technique for creating token embeddings, and an angular position encoding for radial patch merging. Compared to other baselines, DarSwin achieves the best results on different datasets with significant gains when trained on bounded levels of distortions (very low, low, medium, and high) and tested on all, including out-of-distribution distortions. While the base DarSwin architecture requires knowledge of the radial distortion profile, we show it can be combined with a self-calibration network that estimates such a profile from the input image itself, resulting in a completely uncalibrated pipeline. Finally, we also present DarSwin-Unet, which extends DarSwin, to an encoder-decoder architecture suitable for pixel-level tasks. We demonstrate its performance on depth estimation and show through extensive experiments that DarSwin-Unet can perform zero-shot adaptation to unseen distortions of different wide-angle lenses. The code and models are publicly available at https://lvsn.github.io/darswin/ △ Less

Submitted 7 January, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: 18 pages, 12 figures

MSC Class: 68T01

arXiv:2204.00949 [pdf, other]

Matching Feature Sets for Few-Shot Image Classification

Authors: Arman Afrasiyabi, Hugo Larochelle, Jean-François Lalonde, Christian Gagné

Abstract: In image classification, it is common practice to train deep networks to extract a single feature vector per input image. Few-shot classification methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. We argue that a set-based representation intrinsically builds a richer representation of… ▽ More In image classification, it is common practice to train deep networks to extract a single feature vector per input image. Few-shot classification methods also mostly follow this trend. In this work, we depart from this established direction and instead propose to extract sets of feature vectors for each image. We argue that a set-based representation intrinsically builds a richer representation of images from the base classes, which can subsequently better transfer to the few-shot classes. To do so, we propose to adapt existing feature extractors to instead produce sets of feature vectors from images. Our approach, dubbed SetFeat, embeds shallow self-attention mechanisms inside existing encoder architectures. The attention modules are lightweight, and as such our method results in encoders that have approximately the same number of parameters as their original versions. During training and inference, a set-to-set matching metric is used to perform image classification. The effectiveness of our proposed architecture and metrics is demonstrated via thorough experiments on standard few-shot datasets -- namely miniImageNet, tieredImageNet, and CUB -- in both the 1- and 5-shot scenarios. In all cases but one, our method outperforms the state-of-the-art. △ Less

Submitted 2 April, 2022; originally announced April 2022.

Comments: International Conference on Computer Vision and Pattern Recognition (CVPR), 2022

arXiv:2011.11872 [pdf, other]

Mixture-based Feature Space Learning for Few-shot Image Classification

Authors: Arman Afrasiyabi, Jean-François Lalonde, Christian Gagné

Abstract: We introduce Mixture-based Feature Space Learning (MixtFSL) for obtaining a rich and robust feature representation in the context of few-shot image classification. Previous works have proposed to model each base class either with a single point or with a mixture model by relying on offline clustering algorithms. In contrast, we propose to model base classes with mixture models by simultaneously tr… ▽ More We introduce Mixture-based Feature Space Learning (MixtFSL) for obtaining a rich and robust feature representation in the context of few-shot image classification. Previous works have proposed to model each base class either with a single point or with a mixture model by relying on offline clustering algorithms. In contrast, we propose to model base classes with mixture models by simultaneously training the feature extractor and learning the mixture model parameters in an online manner. This results in a richer and more discriminative feature space which can be employed to classify novel examples from very few samples. Two main stages are proposed to train the MixtFSL model. First, the multimodal mixtures for each base class and the feature extractor parameters are learned using a combination of two loss functions. Second, the resulting network and mixture models are progressively refined through a leader-follower learning procedure, which uses the current estimate as a "target" network. This target network is used to make a consistent assignment of instances to mixture components, which increases performance and stabilizes training. The effectiveness of our end-to-end feature space learning approach is demonstrated with extensive experiments on four standard datasets and four backbones. Notably, we demonstrate that when we combine our robust representation with recent alignment-based approaches, we achieve new state-of-the-art results in the inductive setting, with an absolute accuracy for 5-shot classification of 82.45 on miniImageNet, 88.20 with tieredImageNet, and 60.70 in FC100 using the ResNet-12 backbone. △ Less

Submitted 17 August, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

arXiv:1912.05094 [pdf, other]

Associative Alignment for Few-shot Image Classification

Authors: Arman Afrasiyabi, Jean-François Lalonde, Christian Gagné

Abstract: Few-shot image classification aims at training a model from only a few examples for each of the "novel" classes. This paper proposes the idea of associative alignment for leveraging part of the base data by aligning the novel training instances to the closely related ones in the base training set. This expands the size of the effective novel training set by adding extra "related base" instances to… ▽ More Few-shot image classification aims at training a model from only a few examples for each of the "novel" classes. This paper proposes the idea of associative alignment for leveraging part of the base data by aligning the novel training instances to the closely related ones in the base training set. This expands the size of the effective novel training set by adding extra "related base" instances to the few novel ones, thereby allowing a constructive fine-tuning. We propose two associative alignment strategies: 1) a metric-learning loss for minimizing the distance between related base samples and the centroid of novel instances in the feature space, and 2) a conditional adversarial alignment loss based on the Wasserstein distance. Experiments on four standard datasets and three backbones demonstrate that combining our centroid-based alignment loss results in absolute accuracy improvements of 4.4%, 1.2%, and 6.2% in 5-shot learning over the state of the art for object recognition, fine-grained classification, and cross-domain adaptation, respectively. △ Less

Submitted 4 August, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

arXiv:1702.02676 [pdf, other]

Energy Saving Additive Neural Network

Authors: Arman Afrasiyabi, Ozan Yildiz, Baris Nasir, Fatos T. Yarman Vural, A. Enis Cetin

Abstract: In recent years, machine learning techniques based on neural networks for mobile computing become increasingly popular. Classical multi-layer neural networks require matrix multiplications at each stage. Multiplication operation is not an energy efficient operation and consequently it drains the battery of the mobile device. In this paper, we propose a new energy efficient neural network with the… ▽ More In recent years, machine learning techniques based on neural networks for mobile computing become increasingly popular. Classical multi-layer neural networks require matrix multiplications at each stage. Multiplication operation is not an energy efficient operation and consequently it drains the battery of the mobile device. In this paper, we propose a new energy efficient neural network with the universal approximation property over space of Lebesgue integrable functions. This network, called, additive neural network, is very suitable for mobile computing. The neural structure is based on a novel vector product definition, called ef-operator, that permits a multiplier-free implementation. In ef-operation, the "product" of two real numbers is defined as the sum of their absolute values, with the sign determined by the sign of the product of the numbers. This "product" is used to construct a vector product in $R^N$. The vector product induces the $l_1$ norm. The proposed additive neural network successfully solves the XOR problem. The experiments on MNIST dataset show that the classification performances of the proposed additive neural networks are very similar to the corresponding multi-layer perceptron and convolutional neural networks (LeNet). △ Less

Submitted 8 February, 2017; originally announced February 2017.

Comments: 8 pages (double column), 2 figures, 1 table

Showing 1–7 of 7 results for author: Afrasiyabi, A