Search | arXiv e-print repository

MODL: Multilearner Online Deep Learning

Authors: Antonios Valkanas, Boris N. Oreshkin, Mark Coates

Abstract: Online deep learning solves the problem of learning from streams of data, reconciling two opposing objectives: learn fast and learn deep. Existing work focuses almost exclusively on exploring pure deep learning solutions, which are much better suited to handle the "deep" than the "fast" part of the online learning equation. In our work, we propose a different paradigm, based on a hybrid multilearn… ▽ More Online deep learning solves the problem of learning from streams of data, reconciling two opposing objectives: learn fast and learn deep. Existing work focuses almost exclusively on exploring pure deep learning solutions, which are much better suited to handle the "deep" than the "fast" part of the online learning equation. In our work, we propose a different paradigm, based on a hybrid multilearner approach. First, we develop a fast online logistic regression learner. This learner does not rely on backpropagation. Instead, it uses closed form recursive updates of model parameters, handling the fast learning part of the online learning problem. We then analyze the existing online deep learning theory and show that the widespread ODL approach, currently operating at complexity $O(L^2)$ in terms of the number of layers $L$, can be equivalently implemented in $O(L)$ complexity. This further leads us to the cascaded multilearner design, in which multiple shallow and deep learners are co-trained to solve the online learning problem in a cooperative, synergistic fashion. We show that this approach achieves state-of-the-art results on common online learning datasets, while also being able to handle missing features gracefully. Our code is publicly available at https://github.com/AntonValk/MODL. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2404.17451 [pdf, other]

Any-Quantile Probabilistic Forecasting of Short-Term Electricity Demand

Authors: Slawek Smyl, Boris N. Oreshkin, Paweł Pełka, Grzegorz Dudek

Abstract: Power systems operate under uncertainty originating from multiple factors that are impossible to account for deterministically. Distributional forecasting is used to control and mitigate risks associated with this uncertainty. Recent progress in deep learning has helped to significantly improve the accuracy of point forecasts, while accurate distributional forecasting still presents a significant… ▽ More Power systems operate under uncertainty originating from multiple factors that are impossible to account for deterministically. Distributional forecasting is used to control and mitigate risks associated with this uncertainty. Recent progress in deep learning has helped to significantly improve the accuracy of point forecasts, while accurate distributional forecasting still presents a significant challenge. In this paper, we propose a novel general approach for distributional forecasting capable of predicting arbitrary quantiles. We show that our general approach can be seamlessly applied to two distinct neural architectures leading to the state-of-the-art distributional forecasting results in the context of short-term electricity demand forecasting task. We empirically validate our method on 35 hourly electricity demand time-series for European countries. Our code is available here: https://github.com/boreshkinai/any-quantile. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2302.04774 [pdf, ps, other]

3D Human Pose and Shape Estimation via HybrIK-Transformer

Authors: Boris N. Oreshkin

Abstract: HybrIK relies on a combination of analytical inverse kinematics and deep learning to produce more accurate 3D pose estimation from 2D monocular images. HybrIK has three major components: (1) pretrained convolution backbone, (2) deconvolution to lift 3D pose from 2D convolution features, (3) analytical inverse kinematics pass correcting deep learning prediction using learned distribution of plausib… ▽ More HybrIK relies on a combination of analytical inverse kinematics and deep learning to produce more accurate 3D pose estimation from 2D monocular images. HybrIK has three major components: (1) pretrained convolution backbone, (2) deconvolution to lift 3D pose from 2D convolution features, (3) analytical inverse kinematics pass correcting deep learning prediction using learned distribution of plausible twist and swing angles. In this paper we propose an enhancement of the 2D to 3D lifting module, replacing deconvolution with Transformer, resulting in accuracy and computational efficiency improvement relative to the original HybrIK method. We demonstrate our results on commonly used H36M, PW3D, COCO and HP3D datasets. Our code is publicly available https://github.com/boreshkinai/hybrik-transformer. △ Less

Submitted 22 April, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

arXiv:2208.08274 [pdf, other]

SMPL-IK: Learned Morphology-Aware Inverse Kinematics for AI Driven Artistic Workflows

Authors: Vikram Voleti, Boris N. Oreshkin, Florent Bocquelet, Félix G. Harvey, Louis-Simon Ménard, Christopher Pal

Abstract: Inverse Kinematics (IK) systems are often rigid with respect to their input character, thus requiring user intervention to be adapted to new skeletons. In this paper we aim at creating a flexible, learned IK solver applicable to a wide variety of human morphologies. We extend a state-of-the-art machine learning IK solver to operate on the well known Skinned Multi-Person Linear model (SMPL). We cal… ▽ More Inverse Kinematics (IK) systems are often rigid with respect to their input character, thus requiring user intervention to be adapted to new skeletons. In this paper we aim at creating a flexible, learned IK solver applicable to a wide variety of human morphologies. We extend a state-of-the-art machine learning IK solver to operate on the well known Skinned Multi-Person Linear model (SMPL). We call our model SMPL-IK, and show that when integrated into real-time 3D software, this extended system opens up opportunities for defining novel AI-assisted animation workflows. For example, pose authoring can be made more flexible with SMPL-IK by allowing users to modify gender and body shape while posing a character. Additionally, when chained with existing pose estimation algorithms, SMPL-IK accelerates posing by allowing users to bootstrap 3D scenes from 2D images while allowing for further editing. Finally, we propose a novel SMPL Shape Inversion mechanism (SMPL-SI) to map arbitrary humanoid characters to the SMPL space, allowing artists to leverage SMPL-IK on custom characters. In addition to qualitative demos showing proposed tools, we present quantitative SMPL-IK baselines on the H36M and AMASS datasets. △ Less

Submitted 16 August, 2022; originally announced August 2022.

arXiv:2201.12886 [pdf, other]

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

Authors: Cristian Challu, Kin G. Olivares, Boris N. Oreshkin, Federico Garza, Max Mergenthaler-Canseco, Artur Dubrawski

Abstract: Recent progress in neural forecasting accelerated improvements in the performance of large-scale forecasting systems. Yet, long-horizon forecasting remains a very difficult task. Two common challenges afflicting the task are the volatility of the predictions and their computational complexity. We introduce N-HiTS, a model which addresses both challenges by incorporating novel hierarchical interpol… ▽ More Recent progress in neural forecasting accelerated improvements in the performance of large-scale forecasting systems. Yet, long-horizon forecasting remains a very difficult task. Two common challenges afflicting the task are the volatility of the predictions and their computational complexity. We introduce N-HiTS, a model which addresses both challenges by incorporating novel hierarchical interpolation and multi-rate data sampling techniques. These techniques enable the proposed method to assemble its predictions sequentially, emphasizing components with different frequencies and scales while decomposing the input signal and synthesizing the forecast. We prove that the hierarchical interpolation technique can efficiently approximate arbitrarily long horizons in the presence of smoothness. Additionally, we conduct extensive large-scale dataset experiments from the long-horizon forecasting literature, demonstrating the advantages of our method over the state-of-the-art methods, where N-HiTS provides an average accuracy improvement of almost 20% over the latest Transformer architectures while reducing the computation time by an order of magnitude (50 times). Our code is available at bit.ly/3VA5DoT △ Less

Submitted 29 November, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

Comments: Accepted at the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI-23)

arXiv:2201.06701 [pdf, other]

Motion Inbetweening via Deep $Δ$-Interpolator

Authors: Boris N. Oreshkin, Antonios Valkanas, Félix G. Harvey, Louis-Simon Ménard, Florent Bocquelet, Mark J. Coates

Abstract: We show that the task of synthesizing human motion conditioned on a set of key frames can be solved more accurately and effectively if a deep learning based interpolator operates in the delta mode using the spherical linear interpolator as a baseline. We empirically demonstrate the strength of our approach on publicly available datasets achieving state-of-the-art performance. We further generalize… ▽ More We show that the task of synthesizing human motion conditioned on a set of key frames can be solved more accurately and effectively if a deep learning based interpolator operates in the delta mode using the spherical linear interpolator as a baseline. We empirically demonstrate the strength of our approach on publicly available datasets achieving state-of-the-art performance. We further generalize these results by showing that the $Δ$-regime is viable with respect to the reference of the last known frame (also known as the zero-velocity model). This supports the more general conclusion that operating in the reference frame local to input frames is more accurate and robust than in the global (world) reference frame advocated in previous work. Our code is publicly available at https://github.com/boreshkinai/delta-interpolator. △ Less

Submitted 16 August, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

arXiv:2109.09705 [pdf, other]

Neural forecasting at scale

Authors: Philippe Chatigny, Shengrui Wang, Jean-Marc Patenaude, Boris N. Oreshkin

Abstract: We study the problem of efficiently scaling ensemble-based deep neural networks for multi-step time series (TS) forecasting on a large set of time series. Current state-of-the-art deep ensemble models have high memory and computational requirements, hampering their use to forecast millions of TS in practical scenarios. We propose N-BEATS(P), a global parallel variant of the N-BEATS model designed… ▽ More We study the problem of efficiently scaling ensemble-based deep neural networks for multi-step time series (TS) forecasting on a large set of time series. Current state-of-the-art deep ensemble models have high memory and computational requirements, hampering their use to forecast millions of TS in practical scenarios. We propose N-BEATS(P), a global parallel variant of the N-BEATS model designed to allow simultaneous training of multiple univariate TS forecasting models. Our model addresses the practical limitations of related models, reducing the training time by half and memory requirement by a factor of 5, while kee** the same level of accuracy in all TS forecasting settings. We have performed multiple experiments detailing the various ways to train our model and have obtained results that demonstrate its capacity to generalize in various forecasting conditions and setups. △ Less

Submitted 28 January, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

arXiv:2106.01981 [pdf, other]

ProtoRes: Proto-Residual Network for Pose Authoring via Learned Inverse Kinematics

Authors: Boris N. Oreshkin, Florent Bocquelet, Félix G. Harvey, Bay Raitt, Dominic Laflamme

Abstract: Our work focuses on the development of a learnable neural representation of human pose for advanced AI assisted animation tooling. Specifically, we tackle the problem of constructing a full static human pose based on sparse and variable user inputs (e.g. locations and/or orientations of a subset of body joints). To solve this problem, we propose a novel neural architecture that combines residual c… ▽ More Our work focuses on the development of a learnable neural representation of human pose for advanced AI assisted animation tooling. Specifically, we tackle the problem of constructing a full static human pose based on sparse and variable user inputs (e.g. locations and/or orientations of a subset of body joints). To solve this problem, we propose a novel neural architecture that combines residual connections with prototype encoding of a partially specified pose to create a new complete pose from the learned latent space. We show that our architecture outperforms a baseline based on Transformer, both in terms of accuracy and computational efficiency. Additionally, we develop a user interface to integrate our neural model in Unity, a real-time 3D development platform. Furthermore, we introduce two new datasets representing the static human pose modeling problem, based on high-quality human motion capture data, which will be released publicly along with model code. △ Less

Submitted 16 August, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

arXiv:2012.15440 [pdf, other]

Adaptive filters for the moving target indicator system

Authors: Boris N. Oreshkin

Abstract: Adaptive algorithms belong to an important class of algorithms used in radar target detection to overcome prior uncertainty of interference covariance. The contamination of the empirical covariance matrix by the useful signal leads to significant degradation of performance of this class of adaptive algorithms. Regularization, also known in radar literature as sample covariance loading, can be used… ▽ More Adaptive algorithms belong to an important class of algorithms used in radar target detection to overcome prior uncertainty of interference covariance. The contamination of the empirical covariance matrix by the useful signal leads to significant degradation of performance of this class of adaptive algorithms. Regularization, also known in radar literature as sample covariance loading, can be used to combat both ill conditioning of the original problem and contamination of the empirical covariance by the desired signal for the adaptive algorithms based on sample covariance matrix inversion. However, the optimum value of loading factor cannot be derived unless strong assumptions are made regarding the structure of covariance matrix and useful signal penetration model. Similarly, least mean square algorithm with linear constraint or without constraint, is also sensitive to the contamination of the learning sample with the target signal. We synthesize two approaches to improve the convergence of adaptive algorithms and protect them from the contamination of the learning sample with the signal from the target. The proposed approach is based on the maximization of empirical signal to interference plus noise ratio (SINR). Its effectiveness is demonstrated using simulated data. △ Less

Submitted 30 December, 2020; originally announced December 2020.

arXiv:2010.07010 [pdf, ps, other]

doi 10.1109/ICASSP.2008.4518113

Optimization of loading factor preventing target cancellation

Authors: Boris N. Oreshkin, Peter A. Bakulev

Abstract: Adaptive algorithms based on sample matrix inversion belong to an important class of algorithms used in radar target detection to overcome prior uncertainty of interference covariance. Sample matrix inversion problem is generally ill conditioned. Moreover, the contamination of the empirical covariance matrix by the useful signal leads to significant degradation of performance of this class of adap… ▽ More Adaptive algorithms based on sample matrix inversion belong to an important class of algorithms used in radar target detection to overcome prior uncertainty of interference covariance. Sample matrix inversion problem is generally ill conditioned. Moreover, the contamination of the empirical covariance matrix by the useful signal leads to significant degradation of performance of this class of adaptive algorithms. Regularization, also known in radar literature as sample covariance loading, can be used to combat both ill conditioning of the original problem and contamination of the empirical covariance by the desired signal. However, the optimum value of loading factor cannot be derived unless strong assumptions are made regarding the structure of covariance matrix and useful signal penetration model. In this paper an iterative algorithm for loading factor optimization based on the maximization of empirical signal to interference plus noise ratio (SINR) is proposed. The proposed solution does not rely on any assumptions regarding the structure of empirical covariance matrix and signal penetration model. The paper also presents simulation examples showing the effectiveness of the proposed solution. △ Less

Submitted 9 October, 2020; originally announced October 2020.

Journal ref: 2008 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2010.02505 [pdf, other]

doi 10.1007/978-3-642-31340-0_16

Optimization over Random and Gradient Probabilistic Pixel Sampling for Fast, Robust Multi-Resolution Image Registration

Authors: Boris N. Oreshkin, Tal Arbel

Abstract: This paper presents an approach to fast image registration through probabilistic pixel sampling. We propose a practical scheme to leverage the benefits of two state-of-the-art pixel sampling approaches: gradient magnitude based pixel sampling and uniformly random sampling. Our framework involves learning the optimal balance between the two sampling schemes off-line during training, based on a smal… ▽ More This paper presents an approach to fast image registration through probabilistic pixel sampling. We propose a practical scheme to leverage the benefits of two state-of-the-art pixel sampling approaches: gradient magnitude based pixel sampling and uniformly random sampling. Our framework involves learning the optimal balance between the two sampling schemes off-line during training, based on a small training dataset, using particle swarm optimization. We then test the proposed sampling approach on 3D rigid registration against two state-of-the-art approaches based on the popular, publicly available, Vanderbilt RIRE dataset. Our results indicate that the proposed sampling approach yields much faster, accurate and robust registration results when compared against the state-of-the-art. △ Less

Submitted 2 October, 2020; originally announced October 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2010.00988

Journal ref: WBIR 2012. Lecture Notes in Computer Science, vol 7359. Springer, Berlin, Heidelberg

arXiv:2010.00988 [pdf, other]

doi 10.1109/TMI.2013.2264467

Uncertainty driven probabilistic voxel selection for image registration

Authors: Boris N. Oreshkin, Tal Arbel

Abstract: This paper presents a novel probabilistic voxel selection strategy for medical image registration in time-sensitive contexts, where the goal is aggressive voxel sampling (e.g. using less than 1% of the total number) while maintaining registration accuracy and low failure rate. We develop a Bayesian framework whereby, first, a voxel sampling probability field (VSPF) is built based on the uncertaint… ▽ More This paper presents a novel probabilistic voxel selection strategy for medical image registration in time-sensitive contexts, where the goal is aggressive voxel sampling (e.g. using less than 1% of the total number) while maintaining registration accuracy and low failure rate. We develop a Bayesian framework whereby, first, a voxel sampling probability field (VSPF) is built based on the uncertainty on the transformation parameters. We then describe a practical, multi-scale registration algorithm, where, at each optimization iteration, different voxel subsets are sampled based on the VSPF. The approach maximizes accuracy without committing to a particular fixed subset of voxels. The probabilistic sampling scheme developed is shown to manage the tradeoff between the robustness of traditional random voxel selection (by permitting more exploration) and the accuracy of fixed voxel selection (by permitting a greater proportion of informative voxels). △ Less

Submitted 2 October, 2020; originally announced October 2020.

Journal ref: in IEEE Transactions on Medical Imaging, vol. 32, no. 10, pp. 1777-1790, Oct. 2013

arXiv:2009.11961 [pdf, ps, other]

N-BEATS neural network for mid-term electricity load forecasting

Authors: Boris N. Oreshkin, Grzegorz Dudek, Paweł Pełka, Ekaterina Turkina

Abstract: This paper addresses the mid-term electricity load forecasting problem. Solving this problem is necessary for power system operation and planning as well as for negotiating forward contracts in deregulated energy markets. We show that our proposed deep neural network modeling approach based on the deep neural architecture is effective at solving the mid-term electricity load forecasting problem. P… ▽ More This paper addresses the mid-term electricity load forecasting problem. Solving this problem is necessary for power system operation and planning as well as for negotiating forward contracts in deregulated energy markets. We show that our proposed deep neural network modeling approach based on the deep neural architecture is effective at solving the mid-term electricity load forecasting problem. Proposed neural network has high expressive power to solve non-linear stochastic forecasting problems with time series including trends, seasonality and significant random fluctuations. At the same time, it is simple to implement and train, it does not require signal preprocessing, and it is equipped with a forecast bias reduction mechanism. We compare our approach against ten baseline methods, including classical statistical methods, machine learning and hybrid approaches, on 35 monthly electricity demand time series for European countries. The empirical study shows that proposed neural network clearly outperforms all competitors in terms of both accuracy and forecast bias. Code is available here: https://github.com/boreshkinai/nbeats-midterm. △ Less

Submitted 2 April, 2021; v1 submitted 24 September, 2020; originally announced September 2020.

arXiv:2007.15531 [pdf, other]

FC-GAGA: Fully Connected Gated Graph Architecture for Spatio-Temporal Traffic Forecasting

Authors: Boris N. Oreshkin, Arezou Amini, Lucy Coyle, Mark J. Coates

Abstract: Forecasting of multivariate time-series is an important problem that has applications in traffic management, cellular network configuration, and quantitative finance. A special case of the problem arises when there is a graph available that captures the relationships between the time-series. In this paper we propose a novel learning architecture that achieves performance competitive with or better… ▽ More Forecasting of multivariate time-series is an important problem that has applications in traffic management, cellular network configuration, and quantitative finance. A special case of the problem arises when there is a graph available that captures the relationships between the time-series. In this paper we propose a novel learning architecture that achieves performance competitive with or better than the best existing algorithms, without requiring knowledge of the graph. The key element of our proposed architecture is the learnable fully connected hard graph gating mechanism that enables the use of the state-of-the-art and highly computationally efficient fully connected time-series forecasting architecture in traffic forecasting applications. Experimental results for two public traffic network datasets illustrate the value of our approach, and ablation studies confirm the importance of each element of the architecture. The code is available here: https://github.com/boreshkinai/fc-gaga. △ Less

Submitted 14 December, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

arXiv:2002.02887 [pdf, other]

Meta-learning framework with applications to zero-shot time-series forecasting

Authors: Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio

Abstract: Can meta-learning discover generic ways of processing time series (TS) from a diverse dataset so as to greatly improve generalization on new TS coming from different datasets? This work provides positive evidence to this using a broad meta-learning framework which we show subsumes many existing meta-learning algorithms. Our theoretical analysis suggests that residual connections act as a meta-lear… ▽ More Can meta-learning discover generic ways of processing time series (TS) from a diverse dataset so as to greatly improve generalization on new TS coming from different datasets? This work provides positive evidence to this using a broad meta-learning framework which we show subsumes many existing meta-learning algorithms. Our theoretical analysis suggests that residual connections act as a meta-learning adaptation mechanism, generating a subset of task-specific parameters based on a given TS input, thus gradually expanding the expressive power of the architecture on-the-fly. The same mechanism is shown via linearization analysis to have the interpretation of a sequential update of the final linear layer. Our empirical results on a wide range of data emphasize the importance of the identified meta-learning mechanisms for successful zero-shot univariate forecasting, suggesting that it is viable to train a neural network on a source TS dataset and deploy it on a different target TS dataset without retraining, resulting in performance that is at least as good as that of state-of-practice univariate forecasting models. △ Less

Submitted 14 December, 2020; v1 submitted 7 February, 2020; originally announced February 2020.

arXiv:2001.09540 [pdf, other]

Weakly Supervised Few-shot Object Segmentation using Co-Attention with Visual and Semantic Embeddings

Authors: Mennatullah Siam, Naren Doraiswamy, Boris N. Oreshkin, Hengshuai Yao, Martin Jagersand

Abstract: Significant progress has been made recently in develo** few-shot object segmentation methods. Learning is shown to be successful in few-shot segmentation settings, using pixel-level, scribbles and bounding box supervision. This paper takes another approach, i.e., only requiring image-level label for few-shot object segmentation. We propose a novel multi-modal interaction module for few-shot obje… ▽ More Significant progress has been made recently in develo** few-shot object segmentation methods. Learning is shown to be successful in few-shot segmentation settings, using pixel-level, scribbles and bounding box supervision. This paper takes another approach, i.e., only requiring image-level label for few-shot object segmentation. We propose a novel multi-modal interaction module for few-shot object segmentation that utilizes a co-attention mechanism using both visual and word embedding. Our model using image-level labels achieves 4.8% improvement over previously proposed image-level few-shot object segmentation. It also outperforms state-of-the-art methods that use weak bounding box supervision on PASCAL-5i. Our results show that few-shot segmentation benefits from utilizing word embeddings, and that we are able to perform few-shot segmentation using stacked joint visual semantic processing with weak image-level labels. We further propose a novel setup, Temporal Object Segmentation for Few-shot Learning (TOSFL) for videos. TOSFL can be used on a variety of public video data such as Youtube-VOS, as demonstrated in both instance-level and category-level TOSFL experiments. △ Less

Submitted 17 May, 2020; v1 submitted 26 January, 2020; originally announced January 2020.

Comments: Accepted to IJCAI'20. The first three authors listed contributed equally

arXiv:1912.08936 [pdf, other]

One-Shot Weakly Supervised Video Object Segmentation

Authors: Mennatullah Siam, Naren Doraiswamy, Boris N. Oreshkin, Hengshuai Yao, Martin Jagersand

Abstract: Conventional few-shot object segmentation methods learn object segmentation from a few labelled support images with strongly labelled segmentation masks. Recent work has shown to perform on par with weaker levels of supervision in terms of scribbles and bounding boxes. However, there has been limited attention given to the problem of few-shot object segmentation with image-level supervision. We pr… ▽ More Conventional few-shot object segmentation methods learn object segmentation from a few labelled support images with strongly labelled segmentation masks. Recent work has shown to perform on par with weaker levels of supervision in terms of scribbles and bounding boxes. However, there has been limited attention given to the problem of few-shot object segmentation with image-level supervision. We propose a novel multi-modal interaction module for few-shot object segmentation that utilizes a co-attention mechanism using both visual and word embeddings. It enables our model to achieve 5.1% improvement over previously proposed image-level few-shot object segmentation. Our method compares relatively close to the state of the art methods that use strong supervision, while ours use the least possible supervision. We further propose a novel setup for few-shot weakly supervised video object segmentation(VOS) that relies on image-level labels for the first frame. The proposed setup uses weak annotation unlike semi-supervised VOS setting that utilizes strongly labelled segmentation masks. The setup evaluates the effectiveness of generalizing to novel classes in the VOS setting. The setup splits the VOS data into multiple folds with different categories per fold. It provides a potential setup to evaluate how few-shot object segmentation methods can benefit from additional object poses, or object interactions that is not available in static frames as in PASCAL-5i benchmark. △ Less

Submitted 18 December, 2019; originally announced December 2019.

arXiv:1906.11892 [pdf, other]

CLAREL: Classification via retrieval loss for zero-shot learning

Authors: Boris N. Oreshkin, Negar Rostamzadeh, Pedro O. Pinheiro, Christopher Pal

Abstract: We address the problem of learning fine-grained cross-modal representations. We propose an instance-based deep metric learning approach in joint visual and textual space. The key novelty of this paper is that it shows that using per-image semantic supervision leads to substantial improvement in zero-shot performance over using class-only supervision. On top of that, we provide a probabilistic just… ▽ More We address the problem of learning fine-grained cross-modal representations. We propose an instance-based deep metric learning approach in joint visual and textual space. The key novelty of this paper is that it shows that using per-image semantic supervision leads to substantial improvement in zero-shot performance over using class-only supervision. On top of that, we provide a probabilistic justification for a metric rescaling approach that solves a very common problem in the generalized zero-shot learning setting, i.e., classifying test images from unseen classes as one of the classes seen during training. We evaluate our approach on two fine-grained zero-shot learning datasets: CUB and FLOWERS. We find that on the generalized zero-shot classification task CLAREL consistently outperforms the existing approaches on both datasets. △ Less

Submitted 5 April, 2020; v1 submitted 31 May, 2019; originally announced June 2019.

arXiv:1905.10437 [pdf, other]

N-BEATS: Neural basis expansion analysis for interpretable time series forecasting

Authors: Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, Yoshua Bengio

Abstract: We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based on backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the… ▽ More We focus on solving the univariate times series point forecasting problem using deep learning. We propose a deep neural architecture based on backward and forward residual links and a very deep stack of fully-connected layers. The architecture has a number of desirable properties, being interpretable, applicable without modification to a wide array of target domains, and fast to train. We test the proposed architecture on several well-known datasets, including M3, M4 and TOURISM competition datasets containing time series from diverse domains. We demonstrate state-of-the-art performance for two configurations of N-BEATS for all the datasets, improving forecast accuracy by 11% over a statistical benchmark and by 3% over last year's winner of the M4 competition, a domain-adjusted hand-crafted hybrid between neural network and statistical time series models. The first configuration of our model does not employ any time-series-specific components and its performance on heterogeneous datasets strongly suggests that, contrarily to received wisdom, deep learning primitives such as residual blocks are by themselves sufficient to solve a wide range of forecasting problems. Finally, we demonstrate how the proposed architecture can be augmented to provide outputs that are interpretable without considerable loss in accuracy. △ Less

Submitted 20 February, 2020; v1 submitted 24 May, 2019; originally announced May 2019.

arXiv:1902.07104 [pdf, other]

Adaptive Cross-Modal Few-Shot Learning

Authors: Chen Xing, Negar Rostamzadeh, Boris N. Oreshkin, Pedro O. Pinheiro

Abstract: Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. In this paper, we propose to leverage cross-modal information to enhance metric-based few-shot learning methods. Visual and semantic feature spaces have different structures by definition. For certain concepts, visual features might be richer and more discriminative than text ones. While for o… ▽ More Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. In this paper, we propose to leverage cross-modal information to enhance metric-based few-shot learning methods. Visual and semantic feature spaces have different structures by definition. For certain concepts, visual features might be richer and more discriminative than text ones. While for others, the inverse might be true. Moreover, when the support from visual information is limited in image classification, semantic representations (learned from unsupervised text corpora) can provide strong prior knowledge and context to help learning. Based on these two intuitions, we propose a mechanism that can adaptively combine information from both modalities according to new image categories to be learned. Through a series of experiments, we show that by this adaptive combination of the two modalities, our model outperforms current uni-modality few-shot learning methods and modality-alignment methods by a large margin on all benchmarks and few-shot scenarios tested. Experiments also show that our model can effectively adjust its focus on the two modalities. The improvement in performance is particularly large when the number of shots is very small. △ Less

Submitted 17 February, 2020; v1 submitted 19 February, 2019; originally announced February 2019.

arXiv:1805.10123 [pdf, other]

TADAM: Task dependent adaptive metric for improved few-shot learning

Authors: Boris N. Oreshkin, Pau Rodriguez, Alexandre Lacoste

Abstract: Few-shot learning has become essential for producing models that generalize from few examples. In this work, we identify that metric scaling and metric task conditioning are important to improve the performance of few-shot algorithms. Our analysis reveals that simple metric scaling completely changes the nature of few-shot algorithm parameter updates. Metric scaling provides improvements up to 14%… ▽ More Few-shot learning has become essential for producing models that generalize from few examples. In this work, we identify that metric scaling and metric task conditioning are important to improve the performance of few-shot algorithms. Our analysis reveals that simple metric scaling completely changes the nature of few-shot algorithm parameter updates. Metric scaling provides improvements up to 14% in accuracy for certain metrics on the mini-Imagenet 5-way 5-shot classification task. We further propose a simple and effective way of conditioning a learner on the task sample set, resulting in learning a task-dependent metric space. Moreover, we propose and empirically test a practical end-to-end optimization procedure based on auxiliary task co-training to learn a task-dependent metric space. The resulting few-shot learning model based on the task-dependent scaled metric achieves state of the art on mini-Imagenet. We confirm these results on another few-shot dataset that we introduce in this paper based on CIFAR100. Our code is publicly available at https://github.com/ElementAI/TADAM. △ Less

Submitted 25 January, 2019; v1 submitted 23 May, 2018; originally announced May 2018.

Journal ref: Advances in Neural Information Processing Systems 31, 2018

arXiv:1009.4409 [pdf, ps, other]

doi 10.1109/TSP.2011.2140110

Efficient delay-tolerant particle filtering

Authors: Boris N. Oreshkin, Xuan Liu, Mark J. Coates

Abstract: This paper proposes a novel framework for delay-tolerant particle filtering that is computationally efficient and has limited memory requirements. Within this framework the informativeness of a delayed (out-of-sequence) measurement (OOSM) is estimated using a lightweight procedure and uninformative measurements are immediately discarded. The framework requires the identification of a threshold tha… ▽ More This paper proposes a novel framework for delay-tolerant particle filtering that is computationally efficient and has limited memory requirements. Within this framework the informativeness of a delayed (out-of-sequence) measurement (OOSM) is estimated using a lightweight procedure and uninformative measurements are immediately discarded. The framework requires the identification of a threshold that separates informative from uninformative; this threshold selection task is formulated as a constrained optimization problem, where the goal is to minimize tracking error whilst controlling the computational requirements. We develop an algorithm that provides an approximate solution for the optimization problem. Simulation experiments provide an example where the proposed framework processes less than 40% of all OOSMs with only a small reduction in tracking accuracy. △ Less

Submitted 22 September, 2010; originally announced September 2010.

arXiv:0903.3537 [pdf, ps, other]

doi 10.1109/TSP.2010.2043127

Optimization and Analysis of Distributed Averaging with Short Node Memory

Authors: Boris N. Oreshkin, Mark J. Coates, Michael G. Rabbat

Abstract: In this paper, we demonstrate, both theoretically and by numerical examples, that adding a local prediction component to the update rule can significantly improve the convergence rate of distributed averaging algorithms. We focus on the case where the local predictor is a linear combination of the node's two previous values (i.e., two memory taps), and our update rule computes a combination of t… ▽ More In this paper, we demonstrate, both theoretically and by numerical examples, that adding a local prediction component to the update rule can significantly improve the convergence rate of distributed averaging algorithms. We focus on the case where the local predictor is a linear combination of the node's two previous values (i.e., two memory taps), and our update rule computes a combination of the predictor and the usual weighted linear combination of values received from neighbouring nodes. We derive the optimal mixing parameter for combining the predictor with the neighbors' values, and carry out a theoretical analysis of the improvement in convergence rate that can be obtained using this acceleration methodology. For a chain topology on n nodes, this leads to a factor of n improvement over the one-step algorithm, and for a two-dimensional grid, our approach achieves a factor of n^1/2 improvement, in terms of the number of iterations required to reach a prescribed level of accuracy. △ Less

Submitted 5 February, 2010; v1 submitted 20 March, 2009; originally announced March 2009.

Showing 1–23 of 23 results for author: Oreshkin, B N