Search | arXiv e-print repository

arXiv:2402.06763 [pdf, other]

Scalable Kernel Logistic Regression with Nyström Approximation: Theoretical Analysis and Application to Discrete Choice Modelling

Authors: José Ángel Martín-Baos, Ricardo García-Ródenas, Luis Rodriguez-Benitez, Michel Bierlaire

Abstract: The application of kernel-based Machine Learning (ML) techniques to discrete choice modelling using large datasets often faces challenges due to memory requirements and the considerable number of parameters involved in these models. This complexity hampers the efficient training of large-scale models. This paper addresses these problems of scalability by introducing the Nyström approximation for K… ▽ More The application of kernel-based Machine Learning (ML) techniques to discrete choice modelling using large datasets often faces challenges due to memory requirements and the considerable number of parameters involved in these models. This complexity hampers the efficient training of large-scale models. This paper addresses these problems of scalability by introducing the Nyström approximation for Kernel Logistic Regression (KLR) on large datasets. The study begins by presenting a theoretical analysis in which: i) the set of KLR solutions is characterised, ii) an upper bound to the solution of KLR with Nyström approximation is provided, and finally iii) a specialisation of the optimisation algorithms to Nyström KLR is described. After this, the Nyström KLR is computationally validated. Four landmark selection methods are tested, including basic uniform sampling, a k-means sampling strategy, and two non-uniform methods grounded in leverage scores. The performance of these strategies is evaluated using large-scale transport mode choice datasets and is compared with traditional methods such as Multinomial Logit (MNL) and contemporary ML techniques. The study also assesses the efficiency of various optimisation techniques for the proposed Nyström KLR model. The performance of gradient descent, Momentum, Adam, and L-BFGS-B optimisation methods is examined on these datasets. Among these strategies, the k-means Nyström KLR approach emerges as a successful solution for applying KLR to large datasets, particularly when combined with the L-BFGS-B and Adam optimisation methods. The results highlight the ability of this strategy to handle datasets exceeding 200,000 observations while maintaining robust performance. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 32 pages, 5 figures

arXiv:2301.04982 [pdf]

An instance-based learning approach for evaluating the perception of ride-hailing waiting time variability

Authors: Nejc Geržinič, Oded Cats, Niels van Oort, Sascha Hoogendoorn-Lanser, Michel Bierlaire, Serge Hoogendoorn

Abstract: Understanding user's perception of service variability is essential to discern their overall perception of any type of (transport) service. We study the perception of waiting time variability for ride-hailing services. We carried out a stated preference survey in August 2021, yielding 936 valid responses. The respondents were faced with static pre-trip information on the expected waiting time, fol… ▽ More Understanding user's perception of service variability is essential to discern their overall perception of any type of (transport) service. We study the perception of waiting time variability for ride-hailing services. We carried out a stated preference survey in August 2021, yielding 936 valid responses. The respondents were faced with static pre-trip information on the expected waiting time, followed by the actually experienced waiting time for their selected alternative. We analyse this data by means of an instance-based learning (IBL) approach to evaluate how individuals respond to service performance variation and how this impacts their future decisions. Different novel specifications of memory fading, captured by the IBL approach, are tested to uncover which describes the user behaviour best. Additionally, existing and new specification of inertia (habit) are tested. Our model outcomes reveal that the perception of unexpected waiting time is within the expected range of 2-3 times the value-of-time. Travellers seem to place a higher reward on an early departure compared to a penalty for a late departure of equal magnitude. A cancelled service, after having made a booking, results in significant disutility for the passenger and a strong motivation to shift to a different provider. Considering memory decay, our results show that the most recent experience is by far the most relevant for the next decision, with memories fading quickly in importance. The role of inertia seems to gain importance with each additional consecutive choice for the same option, but then resetting back to zero following a shift in behaviour. △ Less

Submitted 12 January, 2023; originally announced January 2023.

arXiv:2212.02178 [pdf, other]

A Data Fusion Approach for Ride-sourcing Demand Estimation: A Discrete Choice Model with Sampling and Endogeneity Corrections

Authors: Rico Krueger, Michel Bierlaire, Prateek Bansal

Abstract: Ride-sourcing services offered by companies like Uber and Didi have grown rapidly in the last decade. Understanding the demand for these services is essential for planning and managing modern transportation systems. Existing studies develop statistical models for ride-sourcing demand estimation at an aggregate level due to limited data availability. These models lack foundations in microeconomic t… ▽ More Ride-sourcing services offered by companies like Uber and Didi have grown rapidly in the last decade. Understanding the demand for these services is essential for planning and managing modern transportation systems. Existing studies develop statistical models for ride-sourcing demand estimation at an aggregate level due to limited data availability. These models lack foundations in microeconomic theory, ignore competition of ride-sourcing with other travel modes, and cannot be seamlessly integrated into existing individual-level (disaggregate) activity-based models to evaluate system-level impacts of ride-sourcing services. In this paper, we present and apply an approach for estimating ride-sourcing demand at a disaggregate level using discrete choice models and multiple data sources. We first construct a sample of trip-based mode choices in Chicago, USA by enriching household travel survey with publicly available ride-sourcing and taxi trip records. We then formulate a multivariate extreme value-based discrete choice with sampling and endogeneity corrections to account for the construction of the estimation sample from multiple data sources and endogeneity biases arising from supply-side constraints and surge pricing mechanisms in ride-sourcing systems. Our analysis of the constructed dataset reveals insights into the influence of various socio-economic, land use and built environment features on ride-sourcing demand. We also derive elasticities of ride-sourcing demand relative to travel cost and time. Finally, we illustrate how the developed model can be employed to quantify the welfare implications of ride-sourcing policies and regulations such as terminating certain types of services and introducing ride-sourcing taxes. △ Less

Submitted 5 December, 2022; originally announced December 2022.

arXiv:2210.02404 [pdf, ps, other]

ciDATGAN: Conditional Inputs for Tabular GANs

Authors: Gael Lederrey, Tim Hillel, Michel Bierlaire

Abstract: Conditionality has become a core component for Generative Adversarial Networks (GANs) for generating synthetic images. GANs are usually using latent conditionality to control the generation process. However, tabular data only contains manifest variables. Thus, latent conditionality either restricts the generated data or does not produce sufficiently good results. Therefore, we propose a new method… ▽ More Conditionality has become a core component for Generative Adversarial Networks (GANs) for generating synthetic images. GANs are usually using latent conditionality to control the generation process. However, tabular data only contains manifest variables. Thus, latent conditionality either restricts the generated data or does not produce sufficiently good results. Therefore, we propose a new methodology to include conditionality in tabular GANs inspired by image completion methods. This article presents ciDATGAN, an evolution of the Directed Acyclic Tabular GAN (DATGAN) that has already been shown to outperform state-of-the-art tabular GAN models. First, we show that the addition of conditional inputs does hinder the model's performance compared to its predecessor. Then, we demonstrate that ciDATGAN can be used to unbias datasets with the help of well-chosen conditional inputs. Finally, it shows that ciDATGAN can learn the logic behind the data and, thus, be used to complete large synthetic datasets using data from a smaller feeder dataset. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: Technical report, 21 pages

arXiv:2203.03489 [pdf, other]

DATGAN: Integrating expert knowledge into deep learning for synthetic tabular data

Authors: Gael Lederrey, Tim Hillel, Michel Bierlaire

Abstract: Synthetic data can be used in various applications, such as correcting bias datasets or replacing scarce original data for simulation purposes. Generative Adversarial Networks (GANs) are considered state-of-the-art for develo** generative models. However, these deep learning models are data-driven, and it is, thus, difficult to control the generation process. It can, therefore, lead to the follo… ▽ More Synthetic data can be used in various applications, such as correcting bias datasets or replacing scarce original data for simulation purposes. Generative Adversarial Networks (GANs) are considered state-of-the-art for develo** generative models. However, these deep learning models are data-driven, and it is, thus, difficult to control the generation process. It can, therefore, lead to the following issues: lack of representativity in the generated data, the introduction of bias, and the possibility of overfitting the sample's noise. This article presents the Directed Acyclic Tabular GAN (DATGAN) to address these limitations by integrating expert knowledge in deep learning models for synthetic tabular data generation. This approach allows the interactions between variables to be specified explicitly using a Directed Acyclic Graph (DAG). The DAG is then converted to a network of modified Long Short-Term Memory (LSTM) cells to accept multiple inputs. Multiple DATGAN versions are systematically tested on multiple assessment metrics. We show that the best versions of the DATGAN outperform state-of-the-art generative models on multiple case studies. Finally, we show how the DAG can create hypothetical synthetic datasets. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Comments: 43 pages for the article and 32 pages of supplementary materials. This preprint will soon be submitted

arXiv:2012.12155 [pdf, other]

doi 10.1016/j.jocm.2020.100226

Estimation of discrete choice models with hybrid stochastic adaptive batch size algorithms

Authors: Gael Lederrey, Virginie Lurkin, Tim Hillel, Michel Bierlaire

Abstract: The emergence of Big Data has enabled new research perspectives in the discrete choice community. While the techniques to estimate Machine Learning models on a massive amount of data are well established, these have not yet been fully explored for the estimation of statistical Discrete Choice Models based on the random utility framework. In this article, we provide new ways of dealing with large d… ▽ More The emergence of Big Data has enabled new research perspectives in the discrete choice community. While the techniques to estimate Machine Learning models on a massive amount of data are well established, these have not yet been fully explored for the estimation of statistical Discrete Choice Models based on the random utility framework. In this article, we provide new ways of dealing with large datasets in the context of Discrete Choice Models. We achieve this by proposing new efficient stochastic optimization algorithms and extensively testing them alongside existing approaches. We develop these algorithms based on three main contributions: the use of a stochastic Hessian, the modification of the batch size, and a change of optimization algorithm depending on the batch size. A comprehensive experimental comparison of fifteen optimization algorithms is conducted across ten benchmark Discrete Choice Model cases. The results indicate that the HAMABS algorithm, a hybrid adaptive batch size stochastic method, is the best performing algorithm across the optimization benchmarks. This algorithm speeds up the optimization time by a factor of 23 on the largest model compared to existing algorithms used in practice. The integration of the new algorithms in Discrete Choice Models estimation software will significantly reduce the time required for model estimation and therefore enable researchers and practitioners to explore new approaches for the specification of choice models. △ Less

Submitted 22 December, 2020; originally announced December 2020.

Comments: 43 pages

Journal ref: Journal of Choice Modelling 38 (2019): 100226

arXiv:2009.06383 [pdf, other]

doi 10.1007/s11222-022-10182-3

Robust discrete choice models with t-distributed kernel errors

Authors: Rico Krueger, Michel Bierlaire, Thomas Gasos, Prateek Bansal

Abstract: Outliers in discrete choice response data may result from misclassification and misreporting of the response variable and from choice behaviour that is inconsistent with modelling assumptions (e.g. random utility maximisation). In the presence of outliers, standard discrete choice models produce biased estimates and suffer from compromised predictive accuracy. Robust statistical models are less se… ▽ More Outliers in discrete choice response data may result from misclassification and misreporting of the response variable and from choice behaviour that is inconsistent with modelling assumptions (e.g. random utility maximisation). In the presence of outliers, standard discrete choice models produce biased estimates and suffer from compromised predictive accuracy. Robust statistical models are less sensitive to outliers than standard non-robust models. This paper analyses two robust alternatives to the multinomial probit (MNP) model. The two models are robit models whose kernel error distributions are heavy-tailed t-distributions to moderate the influence of outliers. The first model is the multinomial robit (MNR) model, in which a generic degrees of freedom parameter controls the heavy-tailedness of the kernel error distribution. The second model, the generalised multinomial robit (Gen-MNR) model, is more flexible than MNR, as it allows for distinct heavy-tailedness in each dimension of the kernel error distribution. For both models, we derive Gibbs samplers for posterior inference. In a simulation study, we illustrate the excellent finite sample properties of the proposed Bayes estimators and show that MNR and Gen-MNR produce more accurate estimates if the choice data contain outliers through the lens of the non-robust MNP model. In a case study on transport mode choice behaviour, MNR and Gen-MNR outperform MNP by substantial margins in terms of in-sample fit and out-of-sample predictive accuracy. The case study also highlights differences in elasticity estimates across models. △ Less

Submitted 5 December, 2022; v1 submitted 14 September, 2020; originally announced September 2020.

Journal ref: Statistics and Computing, 33 (2), 2023

arXiv:1906.03855 [pdf, other]

Bayesian Automatic Relevance Determination for Utility Function Specification in Discrete Choice Models

Authors: Filipe Rodrigues, Nicola Ortelli, Michel Bierlaire, Francisco Pereira

Abstract: Specifying utility functions is a key step towards applying the discrete choice framework for understanding the behaviour processes that govern user choices. However, identifying the utility function specifications that best model and explain the observed choices can be a very challenging and time-consuming task. This paper seeks to help modellers by leveraging the Bayesian framework and the conce… ▽ More Specifying utility functions is a key step towards applying the discrete choice framework for understanding the behaviour processes that govern user choices. However, identifying the utility function specifications that best model and explain the observed choices can be a very challenging and time-consuming task. This paper seeks to help modellers by leveraging the Bayesian framework and the concept of automatic relevance determination (ARD), in order to automatically determine an optimal utility function specification from an exponentially large set of possible specifications in a purely data-driven manner. Based on recent advances in approximate Bayesian inference, a doubly stochastic variational inference is developed, which allows the proposed DCM-ARD model to scale to very large and high-dimensional datasets. Using semi-artificial choice data, the proposed approach is shown to very accurately recover the true utility function specifications that govern the observed choices. Moreover, when applied to real choice data, DCM-ARD is shown to be able discover high quality specifications that can outperform previous ones from the literature according to multiple criteria, thereby demonstrating its practical applicability. △ Less

Submitted 10 June, 2019; originally announced June 2019.

Comments: 21 pages, 2 figures, 11 tables

arXiv:1905.00419 [pdf, other]

Variational Bayesian Inference for Mixed Logit Models with Unobserved Inter- and Intra-Individual Heterogeneity

Authors: Rico Krueger, Prateek Bansal, Michel Bierlaire, Ricardo A. Daziano, Taha H. Rashidi

Abstract: Variational Bayes (VB), a method originating from machine learning, enables fast and scalable estimation of complex probabilistic models. Thus far, applications of VB in discrete choice analysis have been limited to mixed logit models with unobserved inter-individual taste heterogeneity. However, such a model formulation may be too restrictive in panel data settings, since tastes may vary both bet… ▽ More Variational Bayes (VB), a method originating from machine learning, enables fast and scalable estimation of complex probabilistic models. Thus far, applications of VB in discrete choice analysis have been limited to mixed logit models with unobserved inter-individual taste heterogeneity. However, such a model formulation may be too restrictive in panel data settings, since tastes may vary both between individuals as well as across choice tasks encountered by the same individual. In this paper, we derive a VB method for posterior inference in mixed logit models with unobserved inter- and intra-individual heterogeneity. In a simulation study, we benchmark the performance of the proposed VB method against maximum simulated likelihood (MSL) and Markov chain Monte Carlo (MCMC) methods in terms of parameter recovery, predictive accuracy and computational efficiency. The simulation study shows that VB can be a fast, scalable and accurate alternative to MSL and MCMC estimation, especially in applications in which fast predictions are paramount. VB is observed to be between 2.8 and 17.7 times faster than the two competing methods, while affording comparable or superior accuracy. Besides, the simulation study demonstrates that a parallelised implementation of the MSL estimator with analytical gradients is a viable alternative to MCMC in terms of both estimation accuracy and computational efficiency, as the MSL estimator is observed to be between 0.9 and 2.1 times faster than MCMC. △ Less

Submitted 16 January, 2020; v1 submitted 1 May, 2019; originally announced May 2019.

arXiv:1904.07688 [pdf, other]

Pólygamma Data Augmentation to address Non-conjugacy in the Bayesian Estimation of Mixed Multinomial Logit Models

Authors: Prateek Bansal, Rico Krueger, Michel Bierlaire, Ricardo A. Daziano, Taha H. Rashidi

Abstract: The standard Gibbs sampler of Mixed Multinomial Logit (MMNL) models involves sampling from conditional densities of utility parameters using Metropolis-Hastings (MH) algorithm due to unavailability of conjugate prior for logit kernel. To address this non-conjugacy concern, we propose the application of Pólygamma data augmentation (PG-DA) technique for the MMNL estimation. The posterior estimates o… ▽ More The standard Gibbs sampler of Mixed Multinomial Logit (MMNL) models involves sampling from conditional densities of utility parameters using Metropolis-Hastings (MH) algorithm due to unavailability of conjugate prior for logit kernel. To address this non-conjugacy concern, we propose the application of Pólygamma data augmentation (PG-DA) technique for the MMNL estimation. The posterior estimates of the augmented and the default Gibbs sampler are similar for two-alternative scenario (binary choice), but we encounter empirical identification issues in the case of more alternatives ($J \geq 3$). △ Less

Submitted 13 April, 2019; originally announced April 2019.

Comments: arXiv admin note: text overlap with arXiv:1904.03647

arXiv:1904.03647 [pdf, other]

doi 10.1016/j.trb.2019.12.001

Bayesian Estimation of Mixed Multinomial Logit Models: Advances and Simulation-Based Evaluations

Authors: Prateek Bansal, Rico Krueger, Michel Bierlaire, Ricardo A. Daziano, Taha H. Rashidi

Abstract: Variational Bayes (VB) methods have emerged as a fast and computationally-efficient alternative to Markov chain Monte Carlo (MCMC) methods for scalable Bayesian estimation of mixed multinomial logit (MMNL) models. It has been established that VB is substantially faster than MCMC at practically no compromises in predictive accuracy. In this paper, we address two critical gaps concerning the usage a… ▽ More Variational Bayes (VB) methods have emerged as a fast and computationally-efficient alternative to Markov chain Monte Carlo (MCMC) methods for scalable Bayesian estimation of mixed multinomial logit (MMNL) models. It has been established that VB is substantially faster than MCMC at practically no compromises in predictive accuracy. In this paper, we address two critical gaps concerning the usage and understanding of VB for MMNL. First, extant VB methods are limited to utility specifications involving only individual-specific taste parameters. Second, the finite-sample properties of VB estimators and the relative performance of VB, MCMC and maximum simulated likelihood estimation (MSLE) are not known. To address the former, this study extends several VB methods for MMNL to admit utility specifications including both fixed and random utility parameters. To address the latter, we conduct an extensive simulation-based evaluation to benchmark the extended VB methods against MCMC and MSLE in terms of estimation times, parameter recovery and predictive accuracy. The results suggest that all VB variants with the exception of the ones relying on an alternative variational lower bound constructed with the help of the modified Jensen's inequality perform as well as MCMC and MSLE at prediction and parameter recovery. In particular, VB with nonconjugate variational message passing and the delta-method (VB-NCVMP-Delta) is up to 16 times faster than MCMC and MSLE. Thus, VB-NCVMP-Delta can be an attractive alternative to MCMC and MSLE for fast, scalable and accurate estimation of MMNL models. △ Less

Submitted 12 December, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

Journal ref: Transportation Research Part B: Methodological, Volume 131, January 2020, Pages 124-142

arXiv:1810.06077 [pdf, ps, other]

Dimension Reduction for Origin-Destination Flow Estimation: Blind Estimation Made Possible

Authors: **gyuan Xia, Wei Dai, John Polak, Michel Bierlaire

Abstract: This paper studies the problem of estimating origin-destination (OD) flows from link flows. As the number of link flows is typically much less than that of OD flows, the inverse problem is severely ill-posed and hence prior information is required to recover the ground truth. The basic approach in the literature relies on a forward model where the so called traffic assignment matrix maps OD flows… ▽ More This paper studies the problem of estimating origin-destination (OD) flows from link flows. As the number of link flows is typically much less than that of OD flows, the inverse problem is severely ill-posed and hence prior information is required to recover the ground truth. The basic approach in the literature relies on a forward model where the so called traffic assignment matrix maps OD flows to link flows. Due to the ill-posedness of the problem, prior information on the assignment matrix and OD flows are typically needed. The main contributions of this paper include a dimension reduction of the inquired flows from $O(n^2)$ to $O(n)$, and a demonstration that for the first time the ground truth OD flows can be uniquely identified with no or little prior information. To cope with the ill-posedness due to the large number of unknowns, a new forward model is developed which does not involve OD flows directly but is built upon the flows characterized only by their origins, henceforth referred as O-flows. The new model preserves all the OD information and more importantly reduces the dimension of the inverse problem substantially. A Gauss-Seidel method is deployed to solve the inverse problem, and a necessary condition for the uniqueness of the solution is proved. Simulations demonstrate that blind estimation where no prior information is available is possible for some network settings. Some challenging network settings are identified and discussed, where a remedy based on temporal patterns of the O-flows is developed and numerically shown effective. △ Less

Submitted 14 October, 2018; originally announced October 2018.

arXiv:1411.7101 [pdf]

The robust single machine scheduling problem with uncertain release and processing times

Authors: Nitish Umang, Alan L. Erera, Michel Bierlaire

Abstract: In this work, we study the single machine scheduling problem with uncertain release times and processing times of jobs. We adopt a robust scheduling approach, in which the measure of robustness to be minimized for a given sequence of jobs is the worst-case objective function value from the set of all possible realizations of release and processing times. The objective function value is the total f… ▽ More In this work, we study the single machine scheduling problem with uncertain release times and processing times of jobs. We adopt a robust scheduling approach, in which the measure of robustness to be minimized for a given sequence of jobs is the worst-case objective function value from the set of all possible realizations of release and processing times. The objective function value is the total flow time of all jobs. We discuss some important properties of robust schedules for zero and non-zero release times, and illustrate the added complexity in robust scheduling given non-zero release times. We propose heuristics based on variable neighborhood search and iterated local search to solve the problem and generate robust schedules. The algorithms are tested and their solution performance is compared with optimal solutions or lower bounds through numerical experiments based on synthetic data. △ Less

Submitted 25 November, 2014; originally announced November 2014.

Showing 1–13 of 13 results for author: Bierlaire, M