Search | arXiv e-print repository

Marginalization Consistent Mixture of Separable Flows for Probabilistic Irregular Time Series Forecasting

Authors: Vijaya Krishna Yalavarthi, Randolf Scholz, Kiran Madhusudhanan, Stefan Born, Lars Schmidt-Thieme

Abstract: Probabilistic forecasting models for joint distributions of targets in irregular time series are a heavily under-researched area in machine learning with, to the best of our knowledge, only three models researched so far: GPR, the Gaussian Process Regression model~\citep{Durichen2015.Multitask}, TACTiS, the Transformer-Attentional Copulas for Time Series~\cite{Drouin2022.Tactis, ashok2024tactis} a… ▽ More Probabilistic forecasting models for joint distributions of targets in irregular time series are a heavily under-researched area in machine learning with, to the best of our knowledge, only three models researched so far: GPR, the Gaussian Process Regression model~\citep{Durichen2015.Multitask}, TACTiS, the Transformer-Attentional Copulas for Time Series~\cite{Drouin2022.Tactis, ashok2024tactis} and ProFITi \citep{Yalavarthi2024.Probabilistica}, a multivariate normalizing flow model based on invertible attention layers. While ProFITi, thanks to using multivariate normalizing flows, is the more expressive model with better predictive performance, we will show that it suffers from marginalization inconsistency: it does not guarantee that the marginal distributions of a subset of variables in its predictive distributions coincide with the directly predicted distributions of these variables. Also, TACTiS does not provide any guarantees for marginalization consistency. We develop a novel probabilistic irregular time series forecasting model, Marginalization Consistent Mixtures of Separable Flows (moses), that mixes several normalizing flows with (i) Gaussian Processes with full covariance matrix as source distributions and (ii) a separable invertible transformation, aiming to combine the expressivity of normalizing flows with the marginalization consistency of Gaussians. In experiments on four different datasets we show that moses outperforms other state-of-the-art marginalization consistent models, performs on par with ProFITi, but different from ProFITi, guarantee marginalization consistency. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2405.09638 [pdf, other]

doi 10.1007/978-981-97-2262-4_11

HMAR: Hierarchical Masked Attention for Multi-Behaviour Recommendation

Authors: Shereen Elsayed, Ahmed Rashed, Lars Schmidt-Thieme

Abstract: In the context of recommendation systems, addressing multi-behavioral user interactions has become vital for understanding the evolving user behavior. Recent models utilize techniques like graph neural networks and attention mechanisms for modeling diverse behaviors, but capturing sequential patterns in historical interactions remains challenging. To tackle this, we introduce Hierarchical Masked A… ▽ More In the context of recommendation systems, addressing multi-behavioral user interactions has become vital for understanding the evolving user behavior. Recent models utilize techniques like graph neural networks and attention mechanisms for modeling diverse behaviors, but capturing sequential patterns in historical interactions remains challenging. To tackle this, we introduce Hierarchical Masked Attention for multi-behavior recommendation (HMAR). Specifically, our approach applies masked self-attention to items of the same behavior, followed by self-attention across all behaviors. Additionally, we propose historical behavior indicators to encode the historical frequency of each items behavior in the input sequence. Furthermore, the HMAR model operates in a multi-task setting, allowing it to learn item behaviors and their associated ranking scores concurrently. Extensive experimental results on four real-world datasets demonstrate that our proposed model outperforms state-of-the-art methods. Our code and datasets are available here (https://github.com/Shereen-Elsayed/HMAR). △ Less

Submitted 29 April, 2024; originally announced May 2024.

arXiv:2405.03582 [pdf, other]

Functional Latent Dynamics for Irregularly Sampled Time Series Forecasting

Authors: Christian Klötergens, Vijaya Krishna Yalavarthi, Maximilian Stubbemann, Lars Schmidt-Thieme

Abstract: Irregularly sampled time series with missing values are often observed in multiple real-world applications such as healthcare, climate and astronomy. They pose a significant challenge to standard deep learn- ing models that operate only on fully observed and regularly sampled time series. In order to capture the continuous dynamics of the irreg- ular time series, many models rely on solving an Ord… ▽ More Irregularly sampled time series with missing values are often observed in multiple real-world applications such as healthcare, climate and astronomy. They pose a significant challenge to standard deep learn- ing models that operate only on fully observed and regularly sampled time series. In order to capture the continuous dynamics of the irreg- ular time series, many models rely on solving an Ordinary Differential Equation (ODE) in the hidden state. These ODE-based models tend to perform slow and require large memory due to sequential operations and a complex ODE solver. As an alternative to complex ODE-based mod- els, we propose a family of models called Functional Latent Dynamics (FLD). Instead of solving the ODE, we use simple curves which exist at all time points to specify the continuous latent state in the model. The coefficients of these curves are learned only from the observed values in the time series ignoring the missing values. Through extensive experi- ments, we demonstrate that FLD achieves better performance compared to the best ODE-based model while reducing the runtime and memory overhead. Specifically, FLD requires an order of magnitude less time to infer the forecasts compared to the best performing forecasting model. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.06966 [pdf, other]

Are EEG Sequences Time Series? EEG Classification with Time Series Models and Joint Subject Training

Authors: Johannes Burchert, Thorben Werner, Vijaya Krishna Yalavarthi, Diego Coello de Portugal, Maximilian Stubbemann, Lars Schmidt-Thieme

Abstract: As with most other data domains, EEG data analysis relies on rich domain-specific preprocessing. Beyond such preprocessing, machine learners would hope to deal with such data as with any other time series data. For EEG classification many models have been developed with layer types and architectures we typically do not see in time series classification. Furthermore, typically separate models for e… ▽ More As with most other data domains, EEG data analysis relies on rich domain-specific preprocessing. Beyond such preprocessing, machine learners would hope to deal with such data as with any other time series data. For EEG classification many models have been developed with layer types and architectures we typically do not see in time series classification. Furthermore, typically separate models for each individual subject are learned, not one model for all of them. In this paper, we systematically study the differences between EEG classification models and generic time series classification models. We describe three different model setups to deal with EEG data from different subjects, subject-specific models (most EEG literature), subject-agnostic models and subject-conditional models. In experiments on three datasets, we demonstrate that off-the-shelf time series classification models trained per subject perform close to EEG classification models, but that do not quite reach the performance of domain-specific modeling. Additionally, we combine time-series models with subject embeddings to train one joint subject-conditional classifier on all subjects. The resulting models are competitive with dedicated EEG models in 2 out of 3 datasets, even outperforming all EEG methods on one of them. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2403.04477 [pdf, other]

Hyperparameter Tuning MLPs for Probabilistic Time Series Forecasting

Authors: Kiran Madhusudhanan, Shayan Jawed, Lars Schmidt-Thieme

Abstract: Time series forecasting attempts to predict future events by analyzing past trends and patterns. Although well researched, certain critical aspects pertaining to the use of deep learning in time series forecasting remain ambiguous. Our research primarily focuses on examining the impact of specific hyperparameters related to time series, such as context length and validation strategy, on the perfor… ▽ More Time series forecasting attempts to predict future events by analyzing past trends and patterns. Although well researched, certain critical aspects pertaining to the use of deep learning in time series forecasting remain ambiguous. Our research primarily focuses on examining the impact of specific hyperparameters related to time series, such as context length and validation strategy, on the performance of the state-of-the-art MLP model in time series forecasting. We have conducted a comprehensive series of experiments involving 4800 configurations per dataset across 20 time series forecasting datasets, and our findings demonstrate the importance of tuning these parameters. Furthermore, in this work, we introduce the largest metadataset for timeseries forecasting to date, named TSBench, comprising 97200 evaluations, which is a twentyfold increase compared to previous works in the field. Finally, we demonstrate the utility of the created metadataset on multi-fidelity hyperparameter optimization tasks. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 14 pages, 5 figures, Accepted at PAKDD24

arXiv:2403.03812 [pdf, other]

ProbSAINT: Probabilistic Tabular Regression for Used Car Pricing

Authors: Kiran Madhusudhanan, Gunnar Behrens, Maximilian Stubbemann, Lars Schmidt-Thieme

Abstract: Used car pricing is a critical aspect of the automotive industry, influenced by many economic factors and market dynamics. With the recent surge in online marketplaces and increased demand for used cars, accurate pricing would benefit both buyers and sellers by ensuring fair transactions. However, the transition towards automated pricing algorithms using machine learning necessitates the comprehen… ▽ More Used car pricing is a critical aspect of the automotive industry, influenced by many economic factors and market dynamics. With the recent surge in online marketplaces and increased demand for used cars, accurate pricing would benefit both buyers and sellers by ensuring fair transactions. However, the transition towards automated pricing algorithms using machine learning necessitates the comprehension of model uncertainties, specifically the ability to flag predictions that the model is unsure about. Although recent literature proposes the use of boosting algorithms or nearest neighbor-based approaches for swift and precise price predictions, encapsulating model uncertainties with such algorithms presents a complex challenge. We introduce ProbSAINT, a model that offers a principled approach for uncertainty quantification of its price predictions, along with accurate point predictions that are comparable to state-of-the-art boosting techniques. Furthermore, acknowledging that the business prefers pricing used cars based on the number of days the vehicle was listed for sale, we show how ProbSAINT can be used as a dynamic forecasting model for predicting price probabilities for different expected offer duration. Our experiments further indicate that ProbSAINT is especially accurate on instances where it is highly certain. This proves the applicability of its probabilistic predictions in real-world scenarios where trustworthiness is crucial. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 9 pages, 4 figures

arXiv:2402.06293 [pdf, other]

Probabilistic Forecasting of Irregular Time Series via Conditional Flows

Authors: Vijaya Krishna Yalavarthi, Randolf Scholz, Stefan Born, Lars Schmidt-Thieme

Abstract: Probabilistic forecasting of irregularly sampled multivariate time series with missing values is an important problem in many fields, including health care, astronomy, and climate. State-of-the-art methods for the task estimate only marginal distributions of observations in single channels and at single timepoints, assuming a fixed-shape parametric distribution. In this work, we propose a novel mo… ▽ More Probabilistic forecasting of irregularly sampled multivariate time series with missing values is an important problem in many fields, including health care, astronomy, and climate. State-of-the-art methods for the task estimate only marginal distributions of observations in single channels and at single timepoints, assuming a fixed-shape parametric distribution. In this work, we propose a novel model, ProFITi, for probabilistic forecasting of irregularly sampled time series with missing values using conditional normalizing flows. The model learns joint distributions over the future values of the time series conditioned on past observations and queried channels and times, without assuming any fixed shape of the underlying distribution. As model components, we introduce a novel invertible triangular attention layer and an invertible non-linear activation function on and onto the whole real line. We conduct extensive experiments on four datasets and demonstrate that the proposed model provides $4$ times higher likelihood over the previously best model. △ Less

Submitted 21 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

arXiv:2402.04915 [pdf, other]

Moco: A Learnable Meta Optimizer for Combinatorial Optimization

Authors: Tim Dernedde, Daniela Thyssens, Sören Dittrich, Maximilian Stubbemann, Lars Schmidt-Thieme

Abstract: Relevant combinatorial optimization problems (COPs) are often NP-hard. While they have been tackled mainly via handcrafted heuristics in the past, advances in neural networks have motivated the development of general methods to learn heuristics from data. Many approaches utilize a neural network to directly construct a solution, but are limited in further improving based on already constructed sol… ▽ More Relevant combinatorial optimization problems (COPs) are often NP-hard. While they have been tackled mainly via handcrafted heuristics in the past, advances in neural networks have motivated the development of general methods to learn heuristics from data. Many approaches utilize a neural network to directly construct a solution, but are limited in further improving based on already constructed solutions at inference time. Our approach, Moco, learns a graph neural network that updates the solution construction procedure based on features extracted from the current search state. This meta training procedure targets the overall best solution found during the search procedure given information such as the search budget. This allows Moco to adapt to varying circumstances such as different computational budgets. Moco is a fully learnable meta optimizer that does not utilize any problem specific local search or decomposition. We test Moco on the Traveling Salesman Problem (TSP) and Maximum Independent Set (MIS) and show that it outperforms other approaches on MIS and is overall competitive on the TSP, especially outperforming related approaches, partially even if they use additional local search. △ Less

Submitted 9 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

Comments: 13 pages, 3 figures

arXiv:2312.09684 [pdf, other]

Context-Aware Sequential Model for Multi-Behaviour Recommendation

Authors: Shereen Elsayed, Ahmed Rashed, Lars Schmidt-Thieme

Abstract: Sequential recommendation models are crucial for next-item recommendations in online platforms, capturing complex patterns in user interactions. However, many focus on a single behavior, overlooking valuable implicit interactions like clicks and favorites. Existing multi-behavioral models often fail to simultaneously capture sequential patterns. We propose CASM, a Context-Aware Sequential Model, l… ▽ More Sequential recommendation models are crucial for next-item recommendations in online platforms, capturing complex patterns in user interactions. However, many focus on a single behavior, overlooking valuable implicit interactions like clicks and favorites. Existing multi-behavioral models often fail to simultaneously capture sequential patterns. We propose CASM, a Context-Aware Sequential Model, leveraging sequential models to seamlessly handle multiple behaviors. CASM employs context-aware multi-head self-attention for heterogeneous historical interactions and a weighted binary cross-entropy loss for precise control over behavior contributions. Experimental results on four datasets demonstrate CASM's superiority over state-of-the-art approaches. △ Less

Submitted 15 December, 2023; originally announced December 2023.

arXiv:2311.18356 [pdf, ps, other]

Towards Comparable Active Learning

Authors: Thorben Werner, Johannes Burchert, Lars Schmidt-Thieme

Abstract: Active Learning has received significant attention in the field of machine learning for its potential in selecting the most informative samples for labeling, thereby reducing data annotation costs. However, we show that the reported lifts in recent literature generalize poorly to other domains leading to an inconclusive landscape in Active Learning research. Furthermore, we highlight overlooked pr… ▽ More Active Learning has received significant attention in the field of machine learning for its potential in selecting the most informative samples for labeling, thereby reducing data annotation costs. However, we show that the reported lifts in recent literature generalize poorly to other domains leading to an inconclusive landscape in Active Learning research. Furthermore, we highlight overlooked problems for reproducing AL experiments that can lead to unfair comparisons and increased variance in the results. This paper addresses these issues by providing an Active Learning framework for a fair comparison of algorithms across different tasks and domains, as well as a fast and performant oracle algorithm for evaluation. To the best of our knowledge, we propose the first AL benchmark that tests algorithms in 3 major domains: Tabular, Image, and Text. We report empirical results for 6 widely used algorithms on 7 real-world and 2 synthetic datasets and aggregate them into a domain-specific ranking of AL algorithms. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2310.04140 [pdf, other]

Routing Arena: A Benchmark Suite for Neural Routing Solvers

Authors: Daniela Thyssens, Tim Dernedde, Jonas K. Falkner, Lars Schmidt-Thieme

Abstract: Neural Combinatorial Optimization has been researched actively in the last eight years. Even though many of the proposed Machine Learning based approaches are compared on the same datasets, the evaluation protocol exhibits essential flaws and the selection of baselines often neglects State-of-the-Art Operations Research approaches. To improve on both of these shortcomings, we propose the Routing A… ▽ More Neural Combinatorial Optimization has been researched actively in the last eight years. Even though many of the proposed Machine Learning based approaches are compared on the same datasets, the evaluation protocol exhibits essential flaws and the selection of baselines often neglects State-of-the-Art Operations Research approaches. To improve on both of these shortcomings, we propose the Routing Arena, a benchmark suite for Routing Problems that provides a seamless integration of consistent evaluation and the provision of baselines and benchmarks prevalent in the Machine Learning- and Operations Research field. The proposed evaluation protocol considers the two most important evaluation cases for different applications: First, the solution quality for an a priori fixed time budget and secondly the anytime performance of the respective methods. By setting the solution trajectory in perspective to a Best Known Solution and a Base Solver's solutions trajectory, we furthermore propose the Weighted Relative Average Performance (WRAP), a novel evaluation metric that quantifies the often claimed runtime efficiency of Neural Routing Solvers. A comprehensive first experimental evaluation demonstrates that the most recent Operations Research solvers generate state-of-the-art results in terms of solution quality and runtime efficiency when it comes to the vehicle routing problem. Nevertheless, some findings highlight the advantages of neural approaches and motivate a shift in how neural solvers should be conceptualized. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2309.17089 [pdf, other]

Too Big, so Fail? -- Enabling Neural Construction Methods to Solve Large-Scale Routing Problems

Authors: Jonas K. Falkner, Lars Schmidt-Thieme

Abstract: In recent years new deep learning approaches to solve combinatorial optimization problems, in particular NP-hard Vehicle Routing Problems (VRP), have been proposed. The most impactful of these methods are sequential neural construction approaches which are usually trained via reinforcement learning. Due to the high training costs of these models, they usually are trained on limited instance sizes… ▽ More In recent years new deep learning approaches to solve combinatorial optimization problems, in particular NP-hard Vehicle Routing Problems (VRP), have been proposed. The most impactful of these methods are sequential neural construction approaches which are usually trained via reinforcement learning. Due to the high training costs of these models, they usually are trained on limited instance sizes (e.g. serving 100 customers) and later applied to vastly larger instance size (e.g. 2000 customers). By means of a systematic scale-up study we show that even state-of-the-art neural construction methods are outperformed by simple heuristics, failing to generalize to larger problem instances. We propose to use the ruin recreate principle that alternates between completely destroying a localized part of the solution and then recreating an improved variant. In this way, neural construction methods like POMO are never applied to the global problem but just in the reconstruction step, which only involves partial problems much closer in size to their original training instances. In thorough experiments on four datasets of varying distributions and modalities we show that our neural ruin recreate approach outperforms alternative forms of improving construction methods such as sampling and beam search and in several experiments also advanced local search approaches. △ Less

Submitted 29 September, 2023; originally announced September 2023.

arXiv:2307.09796 [pdf, other]

Forecasting Early with Meta Learning

Authors: Shayan Jawed, Kiran Madhusudhanan, Vijaya Krishna Yalavarthi, Lars Schmidt-Thieme

Abstract: In the early observation period of a time series, there might be only a few historic observations available to learn a model. However, in cases where an existing prior set of datasets is available, Meta learning methods can be applicable. In this paper, we devise a Meta learning method that exploits samples from additional datasets and learns to augment time series through adversarial learning as… ▽ More In the early observation period of a time series, there might be only a few historic observations available to learn a model. However, in cases where an existing prior set of datasets is available, Meta learning methods can be applicable. In this paper, we devise a Meta learning method that exploits samples from additional datasets and learns to augment time series through adversarial learning as an auxiliary task for the target dataset. Our model (FEML), is equipped with a shared Convolutional backbone that learns features for varying length inputs from different datasets and has dataset specific heads to forecast for different output lengths. We show that FEML can meta learn across datasets and by additionally learning on adversarial generated samples as auxiliary samples for the target dataset, it can improve the forecasting performance compared to single task learning, and various solutions adapted from Joint learning, Multi-task learning and classic forecasting baselines. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: IJCNN 2023

arXiv:2306.06068 [pdf, other]

DeepStay: Stay Region Extraction from Location Trajectories using Weak Supervision

Authors: Christian Löwens, Daniela Thyssens, Emma Andersson, Christina Jenkins, Lars Schmidt-Thieme

Abstract: Nowadays, mobile devices enable constant tracking of the user's position and location trajectories can be used to infer personal points of interest (POIs) like homes, workplaces, or stores. A common way to extract POIs is to first identify spatio-temporal regions where a user spends a significant amount of time, known as stay regions (SRs). Common approaches to SR extraction are evaluated either… ▽ More Nowadays, mobile devices enable constant tracking of the user's position and location trajectories can be used to infer personal points of interest (POIs) like homes, workplaces, or stores. A common way to extract POIs is to first identify spatio-temporal regions where a user spends a significant amount of time, known as stay regions (SRs). Common approaches to SR extraction are evaluated either solely unsupervised or on a small-scale private dataset, as popular public datasets are unlabeled. Most of these methods rely on hand-crafted features or thresholds and do not learn beyond hyperparameter optimization. Therefore, we propose a weakly and self-supervised transformer-based model called DeepStay, which is trained on location trajectories to predict stay regions. To the best of our knowledge, this is the first approach based on deep learning and the first approach that is evaluated on a public, labeled dataset. Our SR extraction method outperforms state-of-the-art methods. In addition, we conducted a limited experiment on the task of transportation mode detection from GPS trajectories using the same architecture and achieved significantly higher scores than the state-of-the-art. Our code is available at https://github.com/christianll9/deepstay. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: Paper under peer review

arXiv:2305.12932 [pdf, ps, other]

Forecasting Irregularly Sampled Time Series using Graphs

Authors: Vijaya Krishna Yalavarthi, Kiran Madhusudhanan, Randolf Sholz, Nourhan Ahmed, Johannes Burchert, Shayan Jawed, Stefan Born, Lars Schmidt-Thieme

Abstract: Forecasting irregularly sampled time series with missing values is a crucial task for numerous real-world applications such as healthcare, astronomy, and climate sciences. State-of-the-art approaches to this problem rely on Ordinary Differential Equations (ODEs) which are known to be slow and often require additional features to handle missing values. To address this issue, we propose a novel mode… ▽ More Forecasting irregularly sampled time series with missing values is a crucial task for numerous real-world applications such as healthcare, astronomy, and climate sciences. State-of-the-art approaches to this problem rely on Ordinary Differential Equations (ODEs) which are known to be slow and often require additional features to handle missing values. To address this issue, we propose a novel model using Graphs for Forecasting Irregularly Sampled Time Series with missing values which we call GraFITi. GraFITi first converts the time series to a Sparsity Structure Graph which is a sparse bipartite graph, and then reformulates the forecasting problem as the edge weight prediction task in the graph. It uses the power of Graph Neural Networks to learn the graph and predict the target edge weights. GraFITi has been tested on 3 real-world and 1 synthetic irregularly sampled time series dataset with missing values and compared with various state-of-the-art models. The experimental results demonstrate that GraFITi improves the forecasting accuracy by up to 17% and reduces the run time up to 5 times compared to the state-of-the-art forecasting models. △ Less

Submitted 10 August, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2304.07262 [pdf, other]

Phantom Embeddings: Using Embedding Space for Model Regularization in Deep Neural Networks

Authors: Mofassir ul Islam Arif, Mohsan Jameel, Josif Grabocka, Lars Schmidt-Thieme

Abstract: The strength of machine learning models stems from their ability to learn complex function approximations from data; however, this strength also makes training deep neural networks challenging. Notably, the complex models tend to memorize the training data, which results in poor regularization performance on test data. The regularization techniques such as L1, L2, dropout, etc. are proposed to red… ▽ More The strength of machine learning models stems from their ability to learn complex function approximations from data; however, this strength also makes training deep neural networks challenging. Notably, the complex models tend to memorize the training data, which results in poor regularization performance on test data. The regularization techniques such as L1, L2, dropout, etc. are proposed to reduce the overfitting effect; however, they bring in additional hyperparameters tuning complexity. These methods also fall short when the inter-class similarity is high due to the underlying data distribution, leading to a less accurate model. In this paper, we present a novel approach to regularize the models by leveraging the information-rich latent embeddings and their high intra-class correlation. We create phantom embeddings from a subset of homogenous samples and use these phantom embeddings to decrease the inter-class similarity of instances in their latent embedding space. The resulting models generalize better as a combination of their embedding and regularize them without requiring an expensive hyperparameter search. We evaluate our method on two popular and challenging image classification datasets (CIFAR and FashionMNIST) and show how our approach outperforms the standard baselines while displaying better training behavior. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2304.07256 [pdf, other]

Directly Optimizing IoU for Bounding Box Localization

Authors: Mofassir ul Islam Arif, Mohsan Jameel, Lars Schmidt-Thieme

Abstract: Object detection has seen remarkable progress in recent years with the introduction of Convolutional Neural Networks (CNN). Object detection is a multi-task learning problem where both the position of the objects in the images as well as their classes needs to be correctly identified. The idea here is to maximize the overlap between the ground-truth bounding boxes and the predictions i.e. the Inte… ▽ More Object detection has seen remarkable progress in recent years with the introduction of Convolutional Neural Networks (CNN). Object detection is a multi-task learning problem where both the position of the objects in the images as well as their classes needs to be correctly identified. The idea here is to maximize the overlap between the ground-truth bounding boxes and the predictions i.e. the Intersection over Union (IoU). In the scope of work seen currently in this domain, IoU is approximated by using the Huber loss as a proxy but this indirect method does not leverage the IoU information and treats the bounding box as four independent, unrelated terms of regression. This is not true for a bounding box where the four coordinates are highly correlated and hold a semantic meaning when taken together. The direct optimization of the IoU is not possible due to its non-convex and non-differentiable nature. In this paper, we have formulated a novel loss namely, the Smooth IoU, which directly optimizes the IoUs for the bounding boxes. This loss has been evaluated on the Oxford IIIT Pets, Udacity self-driving car, PASCAL VOC, and VWFS Car Damage datasets and has shown performance gains over the standard Huber loss. △ Less

Submitted 14 April, 2023; originally announced April 2023.

arXiv:2302.05134 [pdf, other]

Neural Capacitated Clustering

Authors: Jonas K. Falkner, Lars Schmidt-Thieme

Abstract: Recent work on deep clustering has found new promising methods also for constrained clustering problems. Their typically pairwise constraints often can be used to guide the partitioning of the data. Many problems however, feature cluster-level constraints, e.g. the Capacitated Clustering Problem (CCP), where each point has a weight and the total weight sum of all points in each cluster is bounded… ▽ More Recent work on deep clustering has found new promising methods also for constrained clustering problems. Their typically pairwise constraints often can be used to guide the partitioning of the data. Many problems however, feature cluster-level constraints, e.g. the Capacitated Clustering Problem (CCP), where each point has a weight and the total weight sum of all points in each cluster is bounded by a prescribed capacity. In this paper we propose a new method for the CCP, Neural Capacited Clustering, that learns a neural network to predict the assignment probabilities of points to cluster centers from a data set of optimal or near optimal past solutions of other problem instances. During inference, the resulting scores are then used in an iterative k-means like procedure to refine the assignment under capacity constraints. In our experiments on artificial data and two real world datasets our approach outperforms several state-of-the-art mathematical and heuristic solvers from the literature. Moreover, we apply our method in the context of a cluster-first-route-second approach to the Capacitated Vehicle Routing Problem (CVRP) and show competitive results on the well-known Uchoa benchmark. △ Less

Submitted 19 May, 2023; v1 submitted 10 February, 2023; originally announced February 2023.

Comments: Accepted at the 32nd International Joint Conference on Artificial Intelligence (IJCAI) 2023

arXiv:2212.11771 [pdf, other]

Few-shot human motion prediction for heterogeneous sensors

Authors: Rafael Rego Drumond, Lukas Brinkmeyer, Lars Schmidt-Thieme

Abstract: Human motion prediction is a complex task as it involves forecasting variables over time on a graph of connected sensors. This is especially true in the case of few-shot learning, where we strive to forecast motion sequences for previously unseen actions based on only a few examples. Despite this, almost all related approaches for few-shot motion prediction do not incorporate the underlying graph,… ▽ More Human motion prediction is a complex task as it involves forecasting variables over time on a graph of connected sensors. This is especially true in the case of few-shot learning, where we strive to forecast motion sequences for previously unseen actions based on only a few examples. Despite this, almost all related approaches for few-shot motion prediction do not incorporate the underlying graph, while it is a common component in classical motion prediction. Furthermore, state-of-the-art methods for few-shot motion prediction are restricted to motion tasks with a fixed output space meaning these tasks are all limited to the same sensor graph. In this work, we propose to extend recent works on few-shot time-series forecasting with heterogeneous attributes with graph neural networks to introduce the first few-shot motion approach that explicitly incorporates the spatial graph while also generalizing across motion tasks with heterogeneous sensors. In our experiments on motion tasks with heterogeneous sensors, we demonstrate significant performance improvements with lifts from 10.4% up to 39.3% compared to best state-of-the-art models. Moreover, we show that our model can perform on par with the best approach so far when evaluating on tasks with a fixed output space while maintaining two magnitudes fewer parameters. △ Less

Submitted 20 March, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

MSC Class: 68 ACM Class: I.2.6

arXiv:2212.02578 [pdf, ps, other]

Auxiliary Quantile Forecasting with Linear Networks

Authors: Shayan Jawed, Lars Schmidt-Thieme

Abstract: We propose a novel multi-task method for quantile forecasting with shared Linear layers. Our method is based on the Implicit quantile learning approach, where samples from the Uniform distribution $\mathcal{U}(0, 1)$ are reparameterized to quantile values of the target distribution. We combine the implicit quantile and input time series representations to directly forecast multiple quantile estima… ▽ More We propose a novel multi-task method for quantile forecasting with shared Linear layers. Our method is based on the Implicit quantile learning approach, where samples from the Uniform distribution $\mathcal{U}(0, 1)$ are reparameterized to quantile values of the target distribution. We combine the implicit quantile and input time series representations to directly forecast multiple quantile estimations for multiple horizons jointly. Prior works have adopted a Linear layer for the direct estimation of all forecasting horizons in a multi-task learning setup. We show that following similar intuition from multi-task learning to exploit correlations among forecast horizons, we can model multiple quantile estimates as auxiliary tasks for each of the forecast horizon to improve forecast accuracy across the quantile estimates compared to modeling only a single quantile estimate. We show learning auxiliary quantile tasks leads to state-of-the-art performance on deterministic forecasting benchmarks concerning the main-task of forecasting the 50$^{th}$ percentile estimate. △ Less

Submitted 5 December, 2022; originally announced December 2022.

Comments: Under submission

arXiv:2210.10664 [pdf, other]

Deep Multi-Representation Model for Click-Through Rate Prediction

Authors: Shereen Elsayed, Lars Schmidt-Thieme

Abstract: Click-Through Rate prediction (CTR) is a crucial task in recommender systems, and it gained considerable attention in the past few years. The primary purpose of recent research emphasizes obtaining meaningful and powerful representations through mining low and high feature interactions using various components such as Deep Neural Networks (DNN), CrossNets, or transformer blocks. In this work, we p… ▽ More Click-Through Rate prediction (CTR) is a crucial task in recommender systems, and it gained considerable attention in the past few years. The primary purpose of recent research emphasizes obtaining meaningful and powerful representations through mining low and high feature interactions using various components such as Deep Neural Networks (DNN), CrossNets, or transformer blocks. In this work, we propose the Deep Multi-Representation model (DeepMR) that jointly trains a mixture of two powerful feature representation learning components, namely DNNs and multi-head self-attentions. Furthermore, DeepMR integrates the novel residual with zero initialization (ReZero) connections to the DNN and the multi-head self-attention components for learning superior input representations. Experiments on three real-world datasets show that the proposed model significantly outperforms all state-of-the-art models in the task of click-through rate prediction. △ Less

Submitted 25 October, 2022; v1 submitted 18 October, 2022; originally announced October 2022.

arXiv:2210.02091 [pdf, ps, other]

Tripletformer for Probabilistic Interpolation of Irregularly sampled Time Series

Authors: Vijaya Krishna Yalavarthi, Johannes Burchert, Lars Schmidt-thieme

Abstract: Irregularly sampled time series data with missing values is observed in many fields like healthcare, astronomy, and climate science. Interpolation of these types of time series is crucial for tasks such as root cause analysis and medical diagnosis, as well as for smoothing out irregular or noisy data. To address this challenge, we present a novel encoder-decoder architecture called "Tripletformer"… ▽ More Irregularly sampled time series data with missing values is observed in many fields like healthcare, astronomy, and climate science. Interpolation of these types of time series is crucial for tasks such as root cause analysis and medical diagnosis, as well as for smoothing out irregular or noisy data. To address this challenge, we present a novel encoder-decoder architecture called "Tripletformer" for probabilistic interpolation of irregularly sampled time series with missing values. This attention-based model operates on sets of observations, where each element is composed of a triple of time, channel, and value. The encoder and decoder of the Tripletformer are designed with attention layers and fully connected layers, enabling the model to effectively process the presented set elements. We evaluate the Tripletformer against a range of baselines on multiple real-world and synthetic datasets and show that it produces more accurate and certain interpolations. Results indicate an improvement in negative loglikelihood error by up to 32% on real-world datasets and 85% on synthetic datasets when using the Tripletformer compared to the next best model. △ Less

Submitted 12 January, 2024; v1 submitted 5 October, 2022; originally announced October 2022.

Journal ref: IEEE International Conference on BigData, 2023

arXiv:2210.00275 [pdf, other]

Offline Handwritten Amharic Character Recognition Using Few-shot Learning

Authors: Mesay Samuel, Lars Schmidt-Thieme, DP Sharma, Abiot Sinamo, Abey Bruck

Abstract: Few-shot learning is an important, but challenging problem of machine learning aimed at learning from only fewer labeled training examples. It has become an active area of research due to deep learning requiring huge amounts of labeled dataset, which is not feasible in the real world. Learning from a few examples is also an important attempt towards learning like humans. Few-shot learning has prov… ▽ More Few-shot learning is an important, but challenging problem of machine learning aimed at learning from only fewer labeled training examples. It has become an active area of research due to deep learning requiring huge amounts of labeled dataset, which is not feasible in the real world. Learning from a few examples is also an important attempt towards learning like humans. Few-shot learning has proven a very good promise in different areas of machine learning applications, particularly in image classification. As it is a recent technique, most researchers focus on understanding and solving the issues related to its concept by focusing only on common image datasets like Mini-ImageNet and Omniglot. Few-shot learning also opens an opportunity to address low resource languages like Amharic. In this study, offline handwritten Amharic character recognition using few-shot learning is addressed. Particularly, prototypical networks, the popular and simpler type of few-shot learning, is implemented as a baseline. Using the opportunities explored in the nature of Amharic alphabet having row-wise and column-wise similarities, a novel way of augmenting the training episodes is proposed. The experimental results show that the proposed method outperformed the baseline method. This study has implemented few-shot learning for Amharic characters for the first time. More importantly, the findings of the study open new ways of examining the influence of training episodes in few-shot learning, which is one of the important issues that needs exploration. The datasets used for this study are collected from native Amharic language writers using an Android App developed as a part of this study. △ Less

Submitted 1 October, 2022; originally announced October 2022.

Comments: PanAfriCon AI 2022 virtual conference paper

arXiv:2209.01083 [pdf, other]

When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development

Authors: Nghia Duong-Trung, Stefan Born, Jong Woo Kim, Marie-Therese Schermeyer, Katharina Paulick, Maxim Borisyak, Mariano Nicolas Cruz-Bournazou, Thorben Werner, Randolf Scholz, Lars Schmidt-Thieme, Peter Neubauer, Ernesto Martinez

Abstract: Machine learning (ML) is becoming increasingly crucial in many fields of engineering but has not yet played out its full potential in bioprocess engineering. While experimentation has been accelerated by increasing levels of lab automation, experimental planning and data modeling are still largerly depend on human intervention. ML can be seen as a set of tools that contribute to the automation of… ▽ More Machine learning (ML) is becoming increasingly crucial in many fields of engineering but has not yet played out its full potential in bioprocess engineering. While experimentation has been accelerated by increasing levels of lab automation, experimental planning and data modeling are still largerly depend on human intervention. ML can be seen as a set of tools that contribute to the automation of the whole experimental cycle, including model building and practical planning, thus allowing human experts to focus on the more demanding and overarching cognitive tasks. First, probabilistic programming is used for the autonomous building of predictive models. Second, machine learning automatically assesses alternative decisions by planning experiments to test hypotheses and conducting investigations to gather informative data that focus on model selection based on the uncertainty of model predictions. This review provides a comprehensive overview of ML-based automation in bioprocess development. On the one hand, the biotech and bioengineering community should be aware of the potential and, most importantly, the limitation of existing ML solutions for their application in biotechnology and biopharma. On the other hand, it is essential to identify the missing links to enable the easy implementation of ML and Artificial Intelligence (AI) tools in valuable solutions for the bio-community. △ Less

Submitted 1 November, 2022; v1 submitted 2 September, 2022; originally announced September 2022.

arXiv:2208.11374 [pdf, ps, other]

DCSF: Deep Convolutional Set Functions for Classification of Asynchronous Time Series

Authors: Vijaya Krishna Yalavarthi, Johannes Burchert, Lars Schmidt-Thieme

Abstract: Asynchronous Time Series is a multivariate time series where all the channels are observed asynchronously-independently, making the time series extremely sparse when aligning them. We often observe this effect in applications with complex observation processes, such as health care, climate science, and astronomy, to name a few. Because of the asynchronous nature, they pose a significant challenge… ▽ More Asynchronous Time Series is a multivariate time series where all the channels are observed asynchronously-independently, making the time series extremely sparse when aligning them. We often observe this effect in applications with complex observation processes, such as health care, climate science, and astronomy, to name a few. Because of the asynchronous nature, they pose a significant challenge to deep learning architectures, which presume that the time series presented to them are regularly sampled, fully observed, and aligned with respect to time. This paper proposes a novel framework, that we call Deep Convolutional Set Functions (DCSF), which is highly scalable and memory efficient, for the asynchronous time series classification task. With the recent advancements in deep set learning architectures, we introduce a model that is invariant to the order in which time series' channels are presented to it. We explore convolutional neural networks, which are well researched for the closely related problem-classification of regularly sampled and fully observed time series, for encoding the set elements. We evaluate DCSF for AsTS classification, and online (per time point) AsTS classification. Our extensive experiments on multiple real-world and synthetic datasets verify that the suggested model performs substantially better than a range of state-of-the-art models in terms of accuracy and run time. △ Less

Submitted 24 August, 2022; originally announced August 2022.

arXiv:2207.07212 [pdf, other]

Attention, Filling in The Gaps for Generalization in Routing Problems

Authors: Ahmad Bdeir, Jonas K. Falkner, Lars Schmidt-Thieme

Abstract: Machine Learning (ML) methods have become a useful tool for tackling vehicle routing problems, either in combination with popular heuristics or as standalone models. However, current methods suffer from poor generalization when tackling problems of different sizes or different distributions. As a result, ML in vehicle routing has witnessed an expansion phase with new methodologies being created fo… ▽ More Machine Learning (ML) methods have become a useful tool for tackling vehicle routing problems, either in combination with popular heuristics or as standalone models. However, current methods suffer from poor generalization when tackling problems of different sizes or different distributions. As a result, ML in vehicle routing has witnessed an expansion phase with new methodologies being created for particular problem instances that become infeasible at larger problem sizes. This paper aims at encouraging the consolidation of the field through understanding and improving current existing models, namely the attention model by Kool et al. We identify two discrepancy categories for VRP generalization. The first is based on the differences that are inherent to the problems themselves, and the second relates to architectural weaknesses that limit the model's ability to generalize. Our contribution becomes threefold: We first target model discrepancies by adapting the Kool et al. method and its loss function for Sparse Dynamic Attention based on the alpha-entmax activation. We then target inherent differences through the use of a mixed instance training method that has been shown to outperform single instance training in certain scenarios. Finally, we introduce a framework for inference level data augmentation that improves performance by leveraging the model's lack of invariance to rotation and dilation changes. △ Less

Submitted 14 July, 2022; originally announced July 2022.

Comments: Accepted at ECML-PKDD 2022

arXiv:2207.01443 [pdf, ps, other]

doi 10.1007/978-3-031-15791-2_14

Solving the Traveling Salesperson Problem with Precedence Constraints by Deep Reinforcement Learning

Authors: Christian Löwens, Inaam Ashraf, Alexander Gembus, Genesis Cuizon, Jonas K. Falkner, Lars Schmidt-Thieme

Abstract: This work presents solutions to the Traveling Salesperson Problem with precedence constraints (TSPPC) using Deep Reinforcement Learning (DRL) by adapting recent approaches that work well for regular TSPs. Common to these approaches is the use of graph models based on multi-head attention (MHA) layers. One idea for solving the pickup and delivery problem (PDP) is using heterogeneous attentions to e… ▽ More This work presents solutions to the Traveling Salesperson Problem with precedence constraints (TSPPC) using Deep Reinforcement Learning (DRL) by adapting recent approaches that work well for regular TSPs. Common to these approaches is the use of graph models based on multi-head attention (MHA) layers. One idea for solving the pickup and delivery problem (PDP) is using heterogeneous attentions to embed the different possible roles each node can take. In this work, we generalize this concept of heterogeneous attentions to the TSPPC. Furthermore, we adapt recent ideas to sparsify attentions for better scalability. Overall, we contribute to the research community through the application and evaluation of recent DRL methods in solving the TSPPC. △ Less

Submitted 19 September, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

Comments: This preprint has not undergone peer review or any post-submission improvements or corrections. The Version of Record of this contribution is published in KI 2022: Advances in Artificial Intelligence, and is available online at https://doi.org/10.1007/978-3-031-15791-2_14

Journal ref: KI 2022: Advances in Artificial Intelligence pp 160-172

arXiv:2206.13181 [pdf, other]

doi 10.1007/978-3-031-26419-1_22

Learning to Control Local Search for Combinatorial Optimization

Authors: Jonas K. Falkner, Daniela Thyssens, Ahmad Bdeir, Lars Schmidt-Thieme

Abstract: Combinatorial optimization problems are encountered in many practical contexts such as logistics and production, but exact solutions are particularly difficult to find and usually NP-hard for considerable problem sizes. To compute approximate solutions, a zoo of generic as well as problem-specific variants of local search is commonly used. However, which variant to apply to which particular proble… ▽ More Combinatorial optimization problems are encountered in many practical contexts such as logistics and production, but exact solutions are particularly difficult to find and usually NP-hard for considerable problem sizes. To compute approximate solutions, a zoo of generic as well as problem-specific variants of local search is commonly used. However, which variant to apply to which particular problem is difficult to decide even for experts. In this paper we identify three independent algorithmic aspects of such local search algorithms and formalize their sequential selection over an optimization process as Markov Decision Process (MDP). We design a deep graph neural network as policy model for this MDP, yielding a learned controller for local search called NeuroLS. Ample experimental evidence shows that NeuroLS is able to outperform both, well-known general purpose local search controllers from Operations Research as well as latest machine learning-based approaches. △ Less

Submitted 13 July, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: Accepted at ECML-PKDD 2022

Journal ref: In: Amini, MR., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2022. Lecture Notes in Computer Science(), vol 13717. Springer, Cham

arXiv:2206.08476 [pdf, other]

Zero-Shot AutoML with Pretrained Models

Authors: Ekrem Öztürk, Fabio Ferreira, Hadi S. Jomaa, Lars Schmidt-Thieme, Josif Grabocka, Frank Hutter

Abstract: Given a new dataset D and a low compute budget, how should we choose a pre-trained model to fine-tune to D, and set the fine-tuning hyperparameters without risking overfitting, particularly if D is small? Here, we extend automated machine learning (AutoML) to best make these choices. Our domain-independent meta-learning approach learns a zero-shot surrogate model which, at test time, allows to sel… ▽ More Given a new dataset D and a low compute budget, how should we choose a pre-trained model to fine-tune to D, and set the fine-tuning hyperparameters without risking overfitting, particularly if D is small? Here, we extend automated machine learning (AutoML) to best make these choices. Our domain-independent meta-learning approach learns a zero-shot surrogate model which, at test time, allows to select the right deep learning (DL) pipeline (including the pre-trained model and fine-tuning hyperparameters) for a new dataset D given only trivial meta-features describing D such as image resolution or the number of classes. To train this zero-shot model, we collect performance data for many DL pipelines on a large collection of datasets and meta-train on this data to minimize a pairwise ranking objective. We evaluate our approach under the strict time limit of the vision track of the ChaLearn AutoDL challenge benchmark, clearly outperforming all challenge contenders. △ Less

Submitted 25 June, 2022; v1 submitted 16 June, 2022; originally announced June 2022.

Journal ref: International Conference on Machine Learning 2022

arXiv:2205.02923 [pdf, other]

End-to-End Image-Based Fashion Recommendation

Authors: Shereen Elsayed, Lukas Brinkmeyer, Lars Schmidt-Thieme

Abstract: In fashion-based recommendation settings, incorporating the item image features is considered a crucial factor, and it has shown significant improvements to many traditional models, including but not limited to matrix factorization, auto-encoders, and nearest neighbor models. While there are numerous image-based recommender approaches that utilize dedicated deep neural networks, comparisons to att… ▽ More In fashion-based recommendation settings, incorporating the item image features is considered a crucial factor, and it has shown significant improvements to many traditional models, including but not limited to matrix factorization, auto-encoders, and nearest neighbor models. While there are numerous image-based recommender approaches that utilize dedicated deep neural networks, comparisons to attribute-aware models are often disregarded despite their ability to be easily extended to leverage items' image features. In this paper, we propose a simple yet effective attribute-aware model that incorporates image features for better item representation learning in item recommendation tasks. The proposed model utilizes items' image features extracted by a calibrated ResNet50 component. We present an ablation study to compare incorporating the image features using three different techniques into the recommender system component that can seamlessly leverage any available items' attributes. Experiments on two image-based real-world recommender systems datasets show that the proposed model significantly outperforms all state-of-the-art image-based models. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: Accepted in FashionXRecsys 2021 workshop

arXiv:2205.00772 [pdf, ps, other]

Large Neighborhood Search based on Neural Construction Heuristics

Authors: Jonas K. Falkner, Daniela Thyssens, Lars Schmidt-Thieme

Abstract: We propose a Large Neighborhood Search (LNS) approach utilizing a learned construction heuristic based on neural networks as repair operator to solve the vehicle routing problem with time windows (VRPTW). Our method uses graph neural networks to encode the problem and auto-regressively decodes a solution and is trained with reinforcement learning on the construction task without requiring any labe… ▽ More We propose a Large Neighborhood Search (LNS) approach utilizing a learned construction heuristic based on neural networks as repair operator to solve the vehicle routing problem with time windows (VRPTW). Our method uses graph neural networks to encode the problem and auto-regressively decodes a solution and is trained with reinforcement learning on the construction task without requiring any labels for supervision. The neural repair operator is combined with a local search routine, heuristic destruction operators and a selection procedure applied to a small population to arrive at a sophisticated solution approach. The key idea is to use the learned model to re-construct the partially destructed solution and to introduce randomness via the destruction heuristics (or the stochastic policy itself) to effectively explore a large neighborhood. △ Less

Submitted 10 May, 2022; v1 submitted 2 May, 2022; originally announced May 2022.

arXiv:2204.06519 [pdf, other]

doi 10.1145/3523227.3546777

CARCA: Context and Attribute-Aware Next-Item Recommendation via Cross-Attention

Authors: Ahmed Rashed, Shereen Elsayed, Lars Schmidt-Thieme

Abstract: In sparse recommender settings, users' context and item attributes play a crucial role in deciding which items to recommend next. Despite that, recent works in sequential and time-aware recommendations usually either ignore both aspects or only consider one of them, limiting their predictive performance. In this paper, we address these limitations by proposing a context and attribute-aware recomme… ▽ More In sparse recommender settings, users' context and item attributes play a crucial role in deciding which items to recommend next. Despite that, recent works in sequential and time-aware recommendations usually either ignore both aspects or only consider one of them, limiting their predictive performance. In this paper, we address these limitations by proposing a context and attribute-aware recommender model (CARCA) that can capture the dynamic nature of the user profiles in terms of contextual features and item attributes via dedicated multi-head self-attention blocks that extract profile-level features and predicting item scores. Also, unlike many of the current state-of-the-art sequential item recommendation approaches that use a simple dot-product between the most recent item's latent features and the target items embeddings for scoring, CARCA uses cross-attention between all profile items and the target items to predict their final scores. This cross-attention allows CARCA to harness the correlation between old and recent items in the user profile and their influence on deciding which item to recommend next. Experiments on four real-world recommender system datasets show that the proposed model significantly outperforms all state-of-the-art models in the task of item recommendation and achieving improvements of up to 53% in Normalized Discounted Cumulative Gain (NDCG) and Hit-Ratio. Results also show that CARCA outperformed several state-of-the-art dedicated image-based recommender systems by merely utilizing image attributes extracted from a pre-trained ResNet50 in a black-box fashion. △ Less

Submitted 4 April, 2022; originally announced April 2022.

Journal ref: RecSys (2022) 71-80

arXiv:2204.03456 [pdf, other]

Few-Shot Forecasting of Time-Series with Heterogeneous Channels

Authors: Lukas Brinkmeyer, Rafael Rego Drumond, Johannes Burchert, Lars Schmidt-Thieme

Abstract: Learning complex time series forecasting models usually requires a large amount of data, as each model is trained from scratch for each task/data set. Leveraging learning experience with similar datasets is a well-established technique for classification problems called few-shot classification. However, existing approaches cannot be applied to time-series forecasting because i) multivariate time-s… ▽ More Learning complex time series forecasting models usually requires a large amount of data, as each model is trained from scratch for each task/data set. Leveraging learning experience with similar datasets is a well-established technique for classification problems called few-shot classification. However, existing approaches cannot be applied to time-series forecasting because i) multivariate time-series datasets have different channels and ii) forecasting is principally different from classification. In this paper we formalize the problem of few-shot forecasting of time-series with heterogeneous channels for the first time. Extending recent work on heterogeneous attributes in vector data, we develop a model composed of permutation-invariant deep set-blocks which incorporate a temporal embedding. We assemble the first meta-dataset of 40 multivariate time-series datasets and show through experiments that our model provides a good generalization, outperforming baselines carried over from simpler scenarios that either fail to learn across tasks or miss temporal information. △ Less

Submitted 18 August, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: Under review. Equal contribution (Brinkmeyer and Rego Drumond)

MSC Class: 68

arXiv:2202.12687 [pdf, other]

Improving Amharic Handwritten Word Recognition Using Auxiliary Task

Authors: Mesay Samuel Gondere, Lars Schmidt-Thieme, Durga Prasad Sharma, Abiot Sinamo Boltena

Abstract: Amharic is one of the official languages of the Federal Democratic Republic of Ethiopia. It is one of the languages that use an Ethiopic script which is derived from Gee'z, ancient and currently a liturgical language. Amharic is also one of the most widely used literature-rich languages of Ethiopia. There are very limited innovative and customized research works in Amharic optical character recogn… ▽ More Amharic is one of the official languages of the Federal Democratic Republic of Ethiopia. It is one of the languages that use an Ethiopic script which is derived from Gee'z, ancient and currently a liturgical language. Amharic is also one of the most widely used literature-rich languages of Ethiopia. There are very limited innovative and customized research works in Amharic optical character recognition (OCR) in general and Amharic handwritten text recognition in particular. In this study, Amharic handwritten word recognition will be investigated. State-of-the-art deep learning techniques including convolutional neural networks together with recurrent neural networks and connectionist temporal classification (CTC) loss were used to make the recognition in an end-to-end fashion. More importantly, an innovative way of complementing the loss function using the auxiliary task from the row-wise similarities of the Amharic alphabet was tested to show a significant recognition improvement over a baseline method. Such findings will promote innovative problem-specific solutions as well as will open insight to a generalized solution that emerges from problem-specific domains. △ Less

Submitted 25 February, 2022; originally announced February 2022.

arXiv:2202.05695 [pdf, other]

Positive-Unlabeled Domain Adaptation

Authors: Jonas Sonntag, Gunnar Behrens, Lars Schmidt-Thieme

Abstract: Domain Adaptation methodologies have shown to effectively generalize from a labeled source domain to a label scarce target domain. Previous research has either focused on unlabeled domain adaptation without any target supervision or semi-supervised domain adaptation with few labeled target examples per class. On the other hand Positive-Unlabeled (PU-) Learning has attracted increasing interest in… ▽ More Domain Adaptation methodologies have shown to effectively generalize from a labeled source domain to a label scarce target domain. Previous research has either focused on unlabeled domain adaptation without any target supervision or semi-supervised domain adaptation with few labeled target examples per class. On the other hand Positive-Unlabeled (PU-) Learning has attracted increasing interest in the weakly supervised learning literature since in quite some real world applications positive labels are much easier to obtain than negative ones. In this work we are the first to introduce the challenge of Positive-Unlabeled Domain Adaptation where we aim to generalise from a fully labeled source domain to a target domain where only positive and unlabeled data is available. We present a novel two-step learning approach to this problem by firstly identifying reliable positive and negative pseudo-labels in the target domain guided by source domain labels and a positive-unlabeled risk estimator. This enables us to use a standard classifier on the target domain in a second step. We validate our approach by running experiments on benchmark datasets for visual object recognition. Furthermore we propose real world examples for our setting and validate our superior performance on parking occupancy data. △ Less

Submitted 11 February, 2022; originally announced February 2022.

arXiv:2202.04411 [pdf, other]

A.I. and Data-Driven Mobility at Volkswagen Financial Services AG

Authors: Shayan Jawed, Mofassir ul Islam Arif, Ahmed Rashed, Kiran Madhusudhanan, Shereen Elsayed, Mohsan Jameel, Alexei Volk, Andre Hintsches, Marlies Kornfeld, Katrin Lange, Lars Schmidt-Thieme

Abstract: Machine learning is being widely adapted in industrial applications owing to the capabilities of commercially available hardware and rapidly advancing research. Volkswagen Financial Services (VWFS), as a market leader in vehicle leasing services, aims to leverage existing proprietary data and the latest research to enhance existing and derive new business processes. The collaboration between Infor… ▽ More Machine learning is being widely adapted in industrial applications owing to the capabilities of commercially available hardware and rapidly advancing research. Volkswagen Financial Services (VWFS), as a market leader in vehicle leasing services, aims to leverage existing proprietary data and the latest research to enhance existing and derive new business processes. The collaboration between Information Systems and Machine Learning Lab (ISMLL) and VWFS serves to realize this goal. In this paper, we propose methods in the fields of recommender systems, object detection, and forecasting that enable data-driven decisions for the vehicle life-cycle at VWFS. △ Less

Submitted 9 February, 2022; originally announced February 2022.

arXiv:2201.01529 [pdf, other]

Supervised Permutation Invariant Networks for Solving the CVRP with Bounded Fleet Size

Authors: Daniela Thyssens, Jonas Falkner, Lars Schmidt-Thieme

Abstract: Learning to solve combinatorial optimization problems, such as the vehicle routing problem, offers great computational advantages over classical operations research solvers and heuristics. The recently developed deep reinforcement learning approaches either improve an initially given solution iteratively or sequentially construct a set of individual tours. However, most of the existing learning-ba… ▽ More Learning to solve combinatorial optimization problems, such as the vehicle routing problem, offers great computational advantages over classical operations research solvers and heuristics. The recently developed deep reinforcement learning approaches either improve an initially given solution iteratively or sequentially construct a set of individual tours. However, most of the existing learning-based approaches are not able to work for a fixed number of vehicles and thus bypass the complex assignment problem of the customers onto an apriori given number of available vehicles. On the other hand, this makes them less suitable for real applications, as many logistic service providers rely on solutions provided for a specific bounded fleet size and cannot accommodate short term changes to the number of vehicles. In contrast we propose a powerful supervised deep learning framework that constructs a complete tour plan from scratch while respecting an apriori fixed number of available vehicles. In combination with an efficient post-processing scheme, our supervised approach is not only much faster and easier to train but also achieves competitive results that incorporate the practical aspect of vehicle costs. In thorough controlled experiments we compare our method to multiple state-of-the-art approaches where we demonstrate stable performance, while utilizing less vehicles and shed some light on existent inconsistencies in the experimentation protocols of the related work. △ Less

Submitted 5 January, 2022; originally announced January 2022.

Comments: 9 pages, 5 figures

arXiv:2110.08255 [pdf, other]

Yformer: U-Net Inspired Transformer Architecture for Far Horizon Time Series Forecasting

Authors: Kiran Madhusudhanan, Johannes Burchert, Nghia Duong-Trung, Stefan Born, Lars Schmidt-Thieme

Abstract: Time series data is ubiquitous in research as well as in a wide variety of industrial applications. Effectively analyzing the available historical data and providing insights into the far future allows us to make effective decisions. Recent research has witnessed the superior performance of transformer-based architectures, especially in the regime of far horizon time series forecasting. However, t… ▽ More Time series data is ubiquitous in research as well as in a wide variety of industrial applications. Effectively analyzing the available historical data and providing insights into the far future allows us to make effective decisions. Recent research has witnessed the superior performance of transformer-based architectures, especially in the regime of far horizon time series forecasting. However, the current state of the art sparse Transformer architectures fail to couple down- and upsampling procedures to produce outputs in a similar resolution as the input. We propose the Yformer model, based on a novel Y-shaped encoder-decoder architecture that (1) uses direct connection from the downscaled encoder layer to the corresponding upsampled decoder layer in a U-Net inspired architecture, (2) Combines the downscaling/upsampling with sparse attention to capture long-range effects, and (3) stabilizes the encoder-decoder stacks with the addition of an auxiliary reconstruction loss. Extensive experiments have been conducted with relevant baselines on four benchmark datasets, demonstrating an average improvement of 19.82, 18.41 percentage MSE and 13.62, 11.85 percentage MAE in comparison to the current state of the art for the univariate and the multivariate settings respectively. △ Less

Submitted 25 August, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

Comments: Accepted by the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD 2022)

Journal ref: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (2022)

arXiv:2110.08028 [pdf, other]

Improving Hyperparameter Optimization by Planning Ahead

Authors: Hadi S. Jomaa, Jonas Falkner, Lars Schmidt-Thieme

Abstract: Hyperparameter optimization (HPO) is generally treated as a bi-level optimization problem that involves fitting a (probabilistic) surrogate model to a set of observed hyperparameter responses, e.g. validation loss, and consequently maximizing an acquisition function using a surrogate model to identify good hyperparameter candidates for evaluation. The choice of a surrogate and/or acquisition funct… ▽ More Hyperparameter optimization (HPO) is generally treated as a bi-level optimization problem that involves fitting a (probabilistic) surrogate model to a set of observed hyperparameter responses, e.g. validation loss, and consequently maximizing an acquisition function using a surrogate model to identify good hyperparameter candidates for evaluation. The choice of a surrogate and/or acquisition function can be further improved via knowledge transfer across related tasks. In this paper, we propose a novel transfer learning approach, defined within the context of model-based reinforcement learning, where we represent the surrogate as an ensemble of probabilistic models that allows trajectory sampling. We further propose a new variant of model predictive control which employs a simple look-ahead strategy as a policy that optimizes a sequence of actions, representing hyperparameter candidates to expedite HPO. Our experiments on three meta-datasets comparing to state-of-the-art HPO algorithms including a model-free reinforcement learning approach show that the proposed method can outperform all baselines by exploiting a simple planning-based policy. △ Less

Submitted 15 October, 2021; originally announced October 2021.

arXiv:2109.01569 [pdf, other]

Deep Metric Learning for Ground Images

Authors: Raaghav Radhakrishnan, Jan Fabian Schmid, Randolf Scholz, Lars Schmidt-Thieme

Abstract: Ground texture based localization methods are potential prospects for low-cost, high-accuracy self-localization solutions for robots. These methods estimate the pose of a given query image, i.e. the current observation of the ground from a downward-facing camera, in respect to a set of reference images whose poses are known in the application area. In this work, we deal with the initial localizati… ▽ More Ground texture based localization methods are potential prospects for low-cost, high-accuracy self-localization solutions for robots. These methods estimate the pose of a given query image, i.e. the current observation of the ground from a downward-facing camera, in respect to a set of reference images whose poses are known in the application area. In this work, we deal with the initial localization task, in which we have no prior knowledge about the current robot positioning. In this situation, the localization method would have to consider all available reference images. However, in order to reduce computational effort and the risk of receiving a wrong result, we would like to consider only those reference images that are actually overlap** with the query image. For this purpose, we propose a deep metric learning approach that retrieves the most similar reference images to the query image. In contrast to existing approaches to image retrieval for ground images, our approach achieves significantly better recall performance and improves the localization performance of a state-of-the-art ground texture based localization method. △ Less

Submitted 3 September, 2021; originally announced September 2021.

arXiv:2108.02842 [pdf, other]

Multimodal Meta-Learning for Time Series Regression

Authors: Sebastian Pineda Arango, Felix Heinrich, Kiran Madhusudhanan, Lars Schmidt-Thieme

Abstract: Recent work has shown the efficiency of deep learning models such as Fully Convolutional Networks (FCN) or Recurrent Neural Networks (RNN) to deal with Time Series Regression (TSR) problems. These models sometimes need a lot of data to be able to generalize, yet the time series are sometimes not long enough to be able to learn patterns. Therefore, it is important to make use of information across… ▽ More Recent work has shown the efficiency of deep learning models such as Fully Convolutional Networks (FCN) or Recurrent Neural Networks (RNN) to deal with Time Series Regression (TSR) problems. These models sometimes need a lot of data to be able to generalize, yet the time series are sometimes not long enough to be able to learn patterns. Therefore, it is important to make use of information across time series to improve learning. In this paper, we will explore the idea of using meta-learning for quickly adapting model parameters to new short-history time series by modifying the original idea of Model Agnostic Meta-Learning (MAML) \cite{finn2017model}. Moreover, based on prior work on multimodal MAML \cite{vuorio2019multimodal}, we propose a method for conditioning parameters of the model through an auxiliary network that encodes global information of the time series to extract meta-features. Finally, we apply the data to time series of different domains, such as pollution measurements, heart-rate sensors, and electrical battery data. We show empirically that our proposed meta-learning method learns TSR with few data fast and outperforms the baselines in 9 of 12 experiments. △ Less

Submitted 2 November, 2021; v1 submitted 5 August, 2021; originally announced August 2021.

Comments: 16 pages

arXiv:2106.08267 [pdf, other]

doi 10.3233/JIFS-212233

Multi-script Handwritten Digit Recognition Using Multi-task Learning

Authors: Mesay Samuel Gondere, Lars Schmidt-Thieme, Durga Prasad Sharma, Randolf Scholz

Abstract: Handwritten digit recognition is one of the extensively studied area in machine learning. Apart from the wider research on handwritten digit recognition on MNIST dataset, there are many other research works on various script recognition. However, it is not very common for multi-script digit recognition which encourage the development of robust and multipurpose systems. Additionally working on mult… ▽ More Handwritten digit recognition is one of the extensively studied area in machine learning. Apart from the wider research on handwritten digit recognition on MNIST dataset, there are many other research works on various script recognition. However, it is not very common for multi-script digit recognition which encourage the development of robust and multipurpose systems. Additionally working on multi-script digit recognition enables multi-task learning, considering the script classification as a related task for instance. It is evident that multi-task learning improves model performance through inductive transfer using the information contained in related tasks. Therefore, in this study multi-script handwritten digit recognition using multi-task learning will be investigated. As a specific case of demonstrating the solution to the problem, Amharic handwritten character recognition will also be experimented. The handwritten digits of three scripts including Latin, Arabic and Kannada are studied to show that multi-task models with reformulation of the individual tasks have shown promising results. In this study a novel way of using the individual tasks predictions was proposed to help classification performance and regularize the different loss for the purpose of the main task. This finding has outperformed the baseline and the conventional multi-task learning models. More importantly, it avoided the need for weighting the different losses of the tasks, which is one of the challenges in multi-task learning. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2104.12226 [pdf, other]

RP-DQN: An application of Q-Learning to Vehicle Routing Problems

Authors: Ahmad Bdeir, Simon Boeder, Tim Dernedde, Kirill Tkachuk, Jonas K. Falkner, Lars Schmidt-Thieme

Abstract: In this paper we present a new approach to tackle complex routing problems with an improved state representation that utilizes the model complexity better than previous methods. We enable this by training from temporal differences. Specifically Q-Learning is employed. We show that our approach achieves state-of-the-art performance for autoregressive policies that sequentially insert nodes to const… ▽ More In this paper we present a new approach to tackle complex routing problems with an improved state representation that utilizes the model complexity better than previous methods. We enable this by training from temporal differences. Specifically Q-Learning is employed. We show that our approach achieves state-of-the-art performance for autoregressive policies that sequentially insert nodes to construct solutions on the CVRP. Additionally, we are the first to tackle the MDVRP with machine learning methods and demonstrate that this problem type greatly benefits from our approach over other ML methods. △ Less

Submitted 25 April, 2021; originally announced April 2021.

Comments: 14 pages, 4 figures

arXiv:2102.03776 [pdf, other]

Hyperparameter Optimization with Differentiable Metafeatures

Authors: Hadi S. Jomaa, Lars Schmidt-Thieme, Josif Grabocka

Abstract: Metafeatures, or dataset characteristics, have been shown to improve the performance of hyperparameter optimization (HPO). Conventionally, metafeatures are precomputed and used to measure the similarity between datasets, leading to a better initialization of HPO models. In this paper, we propose a cross dataset surrogate model called Differentiable Metafeature-based Surrogate (DMFBS), that predict… ▽ More Metafeatures, or dataset characteristics, have been shown to improve the performance of hyperparameter optimization (HPO). Conventionally, metafeatures are precomputed and used to measure the similarity between datasets, leading to a better initialization of HPO models. In this paper, we propose a cross dataset surrogate model called Differentiable Metafeature-based Surrogate (DMFBS), that predicts the hyperparameter response, i.e. validation loss, of a model trained on the dataset at hand. In contrast to existing models, DMFBS i) integrates a differentiable metafeature extractor and ii) is optimized using a novel multi-task loss, linking manifold regularization with a dataset similarity measure learned via an auxiliary dataset identification meta-task, effectively enforcing the response approximation for similar datasets to be similar. We compare DMFBS against several recent models for HPO on three large meta-datasets and show that it consistently outperforms all of them with an average 10% improvement. Finally, we provide an extensive ablation study that examines the different components of our approach. △ Less

Submitted 7 February, 2021; originally announced February 2021.

arXiv:2101.02118 [pdf, other]

Do We Really Need Deep Learning Models for Time Series Forecasting?

Authors: Shereen Elsayed, Daniela Thyssens, Ahmed Rashed, Hadi Samer Jomaa, Lars Schmidt-Thieme

Abstract: Time series forecasting is a crucial task in machine learning, as it has a wide range of applications including but not limited to forecasting electricity consumption, traffic, and air quality. Traditional forecasting models rely on rolling averages, vector auto-regression and auto-regressive integrated moving averages. On the other hand, deep learning and matrix factorization models have been rec… ▽ More Time series forecasting is a crucial task in machine learning, as it has a wide range of applications including but not limited to forecasting electricity consumption, traffic, and air quality. Traditional forecasting models rely on rolling averages, vector auto-regression and auto-regressive integrated moving averages. On the other hand, deep learning and matrix factorization models have been recently proposed to tackle the same problem with more competitive performance. However, one major drawback of such models is that they tend to be overly complex in comparison to traditional techniques. In this paper, we report the results of prominent deep learning models with respect to a well-known machine learning baseline, a Gradient Boosting Regression Tree (GBRT) model. Similar to the deep neural network (DNN) models, we transform the time series forecasting task into a window-based regression problem. Furthermore, we feature-engineered the input and output structure of the GBRT model, such that, for each training window, the target values are concatenated with external features, and then flattened to form one input instance for a multi-output GBRT model. We conducted a comparative study on nine datasets for eight state-of-the-art deep-learning models that were presented at top-level conferences in the last years. The results demonstrate that the window-based input transformation boosts the performance of a simple GBRT model to levels that outperform all state-of-the-art DNN models evaluated in this paper. △ Less

Submitted 20 October, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

Comments: 14 pages with appendix, 1 figure

arXiv:2006.09100 [pdf, other]

Learning to Solve Vehicle Routing Problems with Time Windows through Joint Attention

Authors: Jonas K. Falkner, Lars Schmidt-Thieme

Abstract: Many real-world vehicle routing problems involve rich sets of constraints with respect to the capacities of the vehicles, time windows for customers etc. While in recent years first machine learning models have been developed to solve basic vehicle routing problems faster than optimization heuristics, complex constraints rarely are taken into consideration. Due to their general procedure to constr… ▽ More Many real-world vehicle routing problems involve rich sets of constraints with respect to the capacities of the vehicles, time windows for customers etc. While in recent years first machine learning models have been developed to solve basic vehicle routing problems faster than optimization heuristics, complex constraints rarely are taken into consideration. Due to their general procedure to construct solutions sequentially route by route, these methods generalize unfavorably to such problems. In this paper, we develop a policy model that is able to start and extend multiple routes concurrently by using attention on the joint action space of several tours. In that way the model is able to select routes and customers and thus learns to make difficult trade-offs between routes. In comprehensive experiments on three variants of the vehicle routing problem with time windows we show that our model called JAMPR works well for different problem sizes and outperforms the existing state-of-the-art constructive model. For two of the three variants it also creates significantly better solutions than a comparable meta-heuristic solver. △ Less

Submitted 16 June, 2020; originally announced June 2020.

arXiv:1910.12749 [pdf, other]

HIDRA: Head Initialization across Dynamic targets for Robust Architectures

Authors: Rafael Rego Drumond, Lukas Brinkmeyer, Josif Grabocka, Lars Schmidt-Thieme

Abstract: The performance of gradient-based optimization strategies depends heavily on the initial weights of the parametric model. Recent works show that there exist weight initializations from which optimization procedures can find the task-specific parameters faster than from uniformly random initializations and that such a weight initialization can be learned by optimizing a specific model architecture… ▽ More The performance of gradient-based optimization strategies depends heavily on the initial weights of the parametric model. Recent works show that there exist weight initializations from which optimization procedures can find the task-specific parameters faster than from uniformly random initializations and that such a weight initialization can be learned by optimizing a specific model architecture across similar tasks via MAML (Model-Agnostic Meta-Learning). Current methods are limited to populations of classification tasks that share the same number of classes due to the static model architectures used during meta-learning. In this paper, we present HIDRA, a meta-learning approach that enables training and evaluating across tasks with any number of target variables. We show that Model-Agnostic Meta-Learning trains a distribution for all the neurons in the output layer and a specific weight initialization for the ones in the hidden layers. HIDRA explores this by learning one master neuron, which is used to initialize any number of output neurons for a new task. Extensive experiments on the Miniimagenet and Omniglot data sets demonstrate that HIDRA improves over standard approaches while generalizing to tasks with any number of target variables. Moreover, our approach is shown to robustify low-capacity models in learning across complex tasks with a high number of classes for which regular MAML fails to learn any feasible initialization. △ Less

Submitted 22 January, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

MSC Class: 68

arXiv:1909.13576 [pdf, other]

Chameleon: Learning Model Initializations Across Tasks With Different Schemas

Authors: Lukas Brinkmeyer, Rafael Rego Drumond, Randolf Scholz, Josif Grabocka, Lars Schmidt-Thieme

Abstract: Parametric models, and particularly neural networks, require weight initialization as a starting point for gradient-based optimization. Recent work shows that a specific initial parameter set can be learned from a population of supervised learning tasks. Using this initial parameter set enables a fast convergence for unseen classes even when only a handful of instances is available (model-agnostic… ▽ More Parametric models, and particularly neural networks, require weight initialization as a starting point for gradient-based optimization. Recent work shows that a specific initial parameter set can be learned from a population of supervised learning tasks. Using this initial parameter set enables a fast convergence for unseen classes even when only a handful of instances is available (model-agnostic meta-learning). Currently, methods for learning model initializations are limited to a population of tasks sharing the same schema, i.e., the same number, order, type, and semantics of predictor and target variables. In this paper, we address the problem of meta-learning parameter initialization across tasks with different schemas, i.e., if the number of predictors varies across tasks, while they still share some variables. We propose Chameleon, a model that learns to align different predictor schemas to a common representation. In experiments on 23 datasets of the OpenML-CC18 benchmark, we show that Chameleon can successfully learn parameter initializations across tasks with different schemas, presenting, to the best of our knowledge, the first cross-dataset few-shot classification approach for unstructured data. △ Less

Submitted 11 June, 2020; v1 submitted 30 September, 2019; originally announced September 2019.

Comments: 18 pages, 7 figures

MSC Class: 68

arXiv:1909.12943 [pdf, other]

doi 10.5445/KSP/1000098011/09

Handwritten Amharic Character Recognition Using a Convolutional Neural Network

Authors: Mesay Samuel Gondere, Lars Schmidt-Thieme, Abiot Sinamo Boltena, Hadi Samer Jomaa

Abstract: Amharic is the official language of the Federal Democratic Republic of Ethiopia. There are lots of historic Amharic and Ethiopic handwritten documents addressing various relevant issues including governance, science, religious, social rules, cultures and art works which are very reach indigenous knowledge. The Amharic language has its own alphabet derived from Ge'ez which is currently the liturgic… ▽ More Amharic is the official language of the Federal Democratic Republic of Ethiopia. There are lots of historic Amharic and Ethiopic handwritten documents addressing various relevant issues including governance, science, religious, social rules, cultures and art works which are very reach indigenous knowledge. The Amharic language has its own alphabet derived from Ge'ez which is currently the liturgical language in Ethiopia. Handwritten character recognition for non Latin scripts like Amharic is not addressed especially using the advantages of the state of the art techniques. This research work designs for the first time a model for Amharic handwritten character recognition using a convolutional neural network. The dataset was organized from collected sample handwritten documents and data augmentation was applied for machine learning. The model was further enhanced using multi-task learning from the relationships of the characters. Promising results are observed from the later model which can further be applied to word prediction. △ Less

Submitted 23 September, 2019; originally announced September 2019.

Comments: ECDA2019 Conference Oral Presentation

arXiv:1906.11527 [pdf, other]

Hyp-RL : Hyperparameter Optimization by Reinforcement Learning

Authors: Hadi S. Jomaa, Josif Grabocka, Lars Schmidt-Thieme

Abstract: Hyperparameter tuning is an omnipresent problem in machine learning as it is an integral aspect of obtaining the state-of-the-art performance for any model. Most often, hyperparameters are optimized just by training a model on a grid of possible hyperparameter values and taking the one that performs best on a validation sample (grid search). More recently, methods have been introduced that build a… ▽ More Hyperparameter tuning is an omnipresent problem in machine learning as it is an integral aspect of obtaining the state-of-the-art performance for any model. Most often, hyperparameters are optimized just by training a model on a grid of possible hyperparameter values and taking the one that performs best on a validation sample (grid search). More recently, methods have been introduced that build a so-called surrogate model that predicts the validation loss for a specific hyperparameter setting, model and dataset and then sequentially select the next hyperparameter to test, based on a heuristic function of the expected value and the uncertainty of the surrogate model called acquisition function (sequential model-based Bayesian optimization, SMBO). In this paper we model the hyperparameter optimization problem as a sequential decision problem, which hyperparameter to test next, and address it with reinforcement learning. This way our model does not have to rely on a heuristic acquisition function like SMBO, but can learn which hyperparameters to test next based on the subsequent reduction in validation loss they will eventually lead to, either because they yield good models themselves or because they allow the hyperparameter selection policy to build a better surrogate model that is able to choose better hyperparameters later on. Experiments on a large battery of 50 data sets demonstrate that our method outperforms the state-of-the-art approaches for hyperparameter learning. △ Less

Submitted 27 June, 2019; originally announced June 2019.

Showing 1–50 of 66 results for author: Schmidt-thieme, L