Search | arXiv e-print repository

arXiv:2402.11973 [pdf, other]

Bayesian Active Learning for Censored Regression

Authors: Frederik Boe Hüttel, Christoffer Riis, Filipe Rodrigues, Francisco Câmara Pereira

Abstract: Bayesian active learning is based on information theoretical approaches that focus on maximising the information that new observations provide to the model parameters. This is commonly done by maximising the Bayesian Active Learning by Disagreement (BALD) acquisitions function. However, we highlight that it is challenging to estimate BALD when the new data points are subject to censorship, where o… ▽ More Bayesian active learning is based on information theoretical approaches that focus on maximising the information that new observations provide to the model parameters. This is commonly done by maximising the Bayesian Active Learning by Disagreement (BALD) acquisitions function. However, we highlight that it is challenging to estimate BALD when the new data points are subject to censorship, where only clipped values of the targets are observed. To address this, we derive the entropy and the mutual information for censored distributions and derive the BALD objective for active learning in censored regression ($\mathcal{C}$-BALD). We propose a novel modelling approach to estimate the $\mathcal{C}$-BALD objective and use it for active learning in the censored setting. Across a wide range of datasets and models, we demonstrate that $\mathcal{C}$-BALD outperforms other Bayesian active learning methods in censored regression. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2308.10650 [pdf, other]

Deep Evidential Learning for Bayesian Quantile Regression

Authors: Frederik Boe Hüttel, Filipe Rodrigues, Francisco Câmara Pereira

Abstract: It is desirable to have accurate uncertainty estimation from a single deterministic forward-pass model, as traditional methods for uncertainty quantification are computationally expensive. However, this is difficult because single forward-pass models do not sample weights during inference and often make assumptions about the target distribution, such as assuming it is Gaussian. This can be restric… ▽ More It is desirable to have accurate uncertainty estimation from a single deterministic forward-pass model, as traditional methods for uncertainty quantification are computationally expensive. However, this is difficult because single forward-pass models do not sample weights during inference and often make assumptions about the target distribution, such as assuming it is Gaussian. This can be restrictive in regression tasks, where the mean and standard deviation are inadequate to model the target distribution accurately. This paper proposes a deep Bayesian quantile regression model that can estimate the quantiles of a continuous target distribution without the Gaussian assumption. The proposed method is based on evidential learning, which allows the model to capture aleatoric and epistemic uncertainty with a single deterministic forward-pass model. This makes the method efficient and scalable to large models and datasets. We demonstrate that the proposed method achieves calibrated uncertainties on non-Gaussian distributions, disentanglement of aleatoric and epistemic uncertainty, and robustness to out-of-distribution samples. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.03404 [pdf, other]

Applied metamodelling for ATM performance simulations

Authors: Christoffer Riis, Francisco N. Antunes, Tatjana Bolić, Gérald Gurtner, Andrew Cook, Carlos Lima Azevedo, Francisco Câmara Pereira

Abstract: The use of Air traffic management (ATM) simulators for planing and operations can be challenging due to their modelling complexity. This paper presents XALM (eXplainable Active Learning Metamodel), a three-step framework integrating active learning and SHAP (SHapley Additive exPlanations) values into simulation metamodels for supporting ATM decision-making. XALM efficiently uncovers hidden relatio… ▽ More The use of Air traffic management (ATM) simulators for planing and operations can be challenging due to their modelling complexity. This paper presents XALM (eXplainable Active Learning Metamodel), a three-step framework integrating active learning and SHAP (SHapley Additive exPlanations) values into simulation metamodels for supporting ATM decision-making. XALM efficiently uncovers hidden relationships among input and output variables in ATM simulators, those usually of interest in policy analysis. Our experiments show XALM's predictive performance comparable to the XGBoost metamodel with fewer simulations. Additionally, XALM exhibits superior explanatory capabilities compared to non-active learning metamodels. Using the `Mercury' (flight and passenger) ATM simulator, XALM is applied to a real-world scenario in Paris Charles de Gaulle airport, extending an arrival manager's range and scope by analysing six variables. This case study illustrates XALM's effectiveness in enhancing simulation interpretability and understanding variable interactions. By addressing computational challenges and improving explainability, XALM complements traditional simulation-based analyses. Lastly, we discuss two practical approaches for reducing the computational burden of the metamodelling further: we introduce a stop** criterion for active learning based on the inherent uncertainty of the metamodel, and we show how the simulations used for the metamodel can be reused across key performance indicators, thus decreasing the overall number of simulations needed. △ Less

Submitted 7 August, 2023; originally announced August 2023.

arXiv:2307.10892 [pdf, other]

Learning and Generalizing Polynomials in Simulation Metamodeling

Authors: Jesper Hauch, Christoffer Riis, Francisco C. Pereira

Abstract: The ability to learn polynomials and generalize out-of-distribution is essential for simulation metamodels in many disciplines of engineering, where the time step updates are described by polynomials. While feed forward neural networks can fit any function, they cannot generalize out-of-distribution for higher-order polynomials. Therefore, this paper collects and proposes multiplicative neural net… ▽ More The ability to learn polynomials and generalize out-of-distribution is essential for simulation metamodels in many disciplines of engineering, where the time step updates are described by polynomials. While feed forward neural networks can fit any function, they cannot generalize out-of-distribution for higher-order polynomials. Therefore, this paper collects and proposes multiplicative neural network (MNN) architectures that are used as recursive building blocks for approximating higher-order polynomials. Our experiments show that MNNs are better than baseline models at generalizing, and their performance in validation is true to their performance in out-of-distribution tests. In addition to MNN architectures, a simulation metamodeling approach is proposed for simulations with polynomial time step updates. For these simulations, simulating a time interval can be performed in fewer steps by increasing the step size, which entails approximating higher-order polynomials. While our approach is compatible with any simulation with polynomial time step updates, a demonstration is shown for an epidemiology simulation model, which also shows the inductive bias in MNNs for learning and generalizing higher-order polynomials. △ Less

Submitted 20 July, 2023; originally announced July 2023.

arXiv:2305.09129 [pdf, other]

Graph Reinforcement Learning for Network Control via Bi-Level Optimization

Authors: Daniele Gammelli, James Harrison, Kaidi Yang, Marco Pavone, Filipe Rodrigues, Francisco C. Pereira

Abstract: Optimization problems over dynamic networks have been extensively studied and widely used in the past decades to formulate numerous real-world problems. However, (1) traditional optimization-based approaches do not scale to large networks, and (2) the design of good heuristics or approximation algorithms often requires significant manual trial-and-error. In this work, we argue that data-driven str… ▽ More Optimization problems over dynamic networks have been extensively studied and widely used in the past decades to formulate numerous real-world problems. However, (1) traditional optimization-based approaches do not scale to large networks, and (2) the design of good heuristics or approximation algorithms often requires significant manual trial-and-error. In this work, we argue that data-driven strategies can automate this process and learn efficient algorithms without compromising optimality. To do so, we present network control problems through the lens of reinforcement learning and propose a graph network-based framework to handle a broad class of problems. Instead of naively computing actions over high-dimensional graph elements, e.g., edges, we propose a bi-level formulation where we (1) specify a desired next state via RL, and (2) solve a convex program to best achieve it, leading to drastically improved scalability and performance. We further highlight a collection of desirable features to system designers, investigate design decisions, and present experiments on real-world control problems showing the utility, scalability, and flexibility of our framework. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: 9 pages, 4 figures

arXiv:2302.14833 [pdf, other]

Learning to Control Autonomous Fleets from Observation via Offline Reinforcement Learning

Authors: Carolin Schmidt, Daniele Gammelli, Francisco Camara Pereira, Filipe Rodrigues

Abstract: Autonomous Mobility-on-Demand (AMoD) systems are an evolving mode of transportation in which a centrally coordinated fleet of self-driving vehicles dynamically serves travel requests. The control of these systems is typically formulated as a large network optimization problem, and reinforcement learning (RL) has recently emerged as a promising approach to solve the open challenges in this space. R… ▽ More Autonomous Mobility-on-Demand (AMoD) systems are an evolving mode of transportation in which a centrally coordinated fleet of self-driving vehicles dynamically serves travel requests. The control of these systems is typically formulated as a large network optimization problem, and reinforcement learning (RL) has recently emerged as a promising approach to solve the open challenges in this space. Recent centralized RL approaches focus on learning from online data, ignoring the per-sample-cost of interactions within real-world transportation systems. To address these limitations, we propose to formalize the control of AMoD systems through the lens of offline reinforcement learning and learn effective control strategies using solely offline data, which is readily available to current mobility operators. We further investigate design decisions and provide empirical evidence based on data from real-world mobility systems showing how offline learning allows to recover AMoD control policies that (i) exhibit performance on par with online methods, (ii) allow for sample-efficient online fine-tuning and (iii) eliminate the need for complex simulation environments. Crucially, this paper demonstrates that offline RL is a promising paradigm for the application of RL-based solutions within economically-critical systems, such as mobility systems. △ Less

Submitted 25 August, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

arXiv:2302.09871 [pdf, other]

Attitudes and Latent Class Choice Models using Machine learning

Authors: Lorena Torres Lahoz, Francisco Camara Pereira, Georges Sfeir, Ioanna Arkoudi, Mayara Moraes Monteiro, Carlos Lima Azevedo

Abstract: Latent Class Choice Models (LCCM) are extensions of discrete choice models (DCMs) that capture unobserved heterogeneity in the choice process by segmenting the population based on the assumption of preference similarities. We present a method of efficiently incorporating attitudinal indicators in the specification of LCCM, by introducing Artificial Neural Networks (ANN) to formulate latent variabl… ▽ More Latent Class Choice Models (LCCM) are extensions of discrete choice models (DCMs) that capture unobserved heterogeneity in the choice process by segmenting the population based on the assumption of preference similarities. We present a method of efficiently incorporating attitudinal indicators in the specification of LCCM, by introducing Artificial Neural Networks (ANN) to formulate latent variables constructs. This formulation overcomes structural equations in its capability of exploring the relationship between the attitudinal indicators and the decision choice, given the Machine Learning (ML) flexibility and power in capturing unobserved and complex behavioural features, such as attitudes and beliefs. All of this while still maintaining the consistency of the theoretical assumptions presented in the Generalized Random Utility model and the interpretability of the estimated parameters. We test our proposed framework for estimating a Car-Sharing (CS) service subscription choice with stated preference data from Copenhagen, Denmark. The results show that our proposed approach provides a complete and realistic segmentation, which helps design better policies. △ Less

Submitted 20 February, 2023; originally announced February 2023.

Comments: 25 pages, 8 figures

arXiv:2301.06418 [pdf, other]

Mind the Gap: Modelling Difference Between Censored and Uncensored Electric Vehicle Charging Demand

Authors: Frederik Boe Hüttel, Filipe Rodrigues, Francisco Câmara Pereira

Abstract: Electric vehicle charging demand models, with charging records as input, will inherently be biased toward the supply of available chargers. These models often fail to account for demand lost from occupied charging stations and competitors. The lost demand suggests that the actual demand is likely higher than the charging records reflect, i.e., the true demand is latent (unobserved), and the observ… ▽ More Electric vehicle charging demand models, with charging records as input, will inherently be biased toward the supply of available chargers. These models often fail to account for demand lost from occupied charging stations and competitors. The lost demand suggests that the actual demand is likely higher than the charging records reflect, i.e., the true demand is latent (unobserved), and the observations are censored. As a result, machine learning models that rely on these observed records for forecasting charging demand may be limited in their application in future infrastructure expansion and supply management, as they do not estimate the true demand for charging. We propose using censorship-aware models to model charging demand to address this limitation. These models incorporate censorship in their loss functions and learn the true latent demand distribution from observed charging records. We study how occupied charging stations and competing services censor demand using GPS trajectories from cars in Copenhagen, Denmark. We find that censorship occurs up to $61\%$ of the time in some areas of the city. We use the observed charging demand from our study to estimate the true demand and find that censorship-aware models provide better prediction and uncertainty estimation of actual demand than censorship-unaware models. We suggest that future charging models based on charging records should account for censoring to expand the application areas of machine learning models in supply management and infrastructure expansion. △ Less

Submitted 30 May, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

arXiv:2205.10186 [pdf, other]

Bayesian Active Learning with Fully Bayesian Gaussian Processes

Authors: Christoffer Riis, Francisco Antunes, Frederik Boe Hüttel, Carlos Lima Azevedo, Francisco Câmara Pereira

Abstract: The bias-variance trade-off is a well-known problem in machine learning that only gets more pronounced the less available data there is. In active learning, where labeled data is scarce or difficult to obtain, neglecting this trade-off can cause inefficient and non-optimal querying, leading to unnecessary data labeling. In this paper, we focus on active learning with Gaussian Processes (GPs). For… ▽ More The bias-variance trade-off is a well-known problem in machine learning that only gets more pronounced the less available data there is. In active learning, where labeled data is scarce or difficult to obtain, neglecting this trade-off can cause inefficient and non-optimal querying, leading to unnecessary data labeling. In this paper, we focus on active learning with Gaussian Processes (GPs). For the GP, the bias-variance trade-off is made by optimization of the two hyperparameters: the length scale and noise-term. Considering that the optimal mode of the joint posterior of the hyperparameters is equivalent to the optimal bias-variance trade-off, we approximate this joint posterior and utilize it to design two new acquisition functions. The first one is a Bayesian variant of Query-by-Committee (B-QBC), and the second is an extension that explicitly minimizes the predictive variance through a Query by Mixture of Gaussian Processes (QB-MGP) formulation. Across six simulators, we empirically show that B-QBC, on average, achieves the best marginal likelihood, whereas QB-MGP achieves the best predictive performance. We show that incorporating the bias-variance trade-off in the acquisition functions mitigates unnecessary and expensive data labeling. △ Less

Submitted 14 January, 2023; v1 submitted 20 May, 2022; originally announced May 2022.

Comments: In Proceedings of Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

arXiv:2205.01317 [pdf]

doi 10.1016/j.trc.2022.103589

Open vs Closed-ended questions in attitudinal surveys -- comparing, combining, and interpreting using natural language processing

Authors: Vishnu Baburajan, João de Abreu e Silva, Francisco Camara Pereira

Abstract: To improve the traveling experience, researchers have been analyzing the role of attitudes in travel behavior modeling. Although most researchers use closed-ended surveys, the appropriate method to measure attitudes is debatable. Topic Modeling could significantly reduce the time to extract information from open-ended responses and eliminate subjective bias, thereby alleviating analyst concerns. O… ▽ More To improve the traveling experience, researchers have been analyzing the role of attitudes in travel behavior modeling. Although most researchers use closed-ended surveys, the appropriate method to measure attitudes is debatable. Topic Modeling could significantly reduce the time to extract information from open-ended responses and eliminate subjective bias, thereby alleviating analyst concerns. Our research uses Topic Modeling to extract information from open-ended questions and compare its performance with closed-ended responses. Furthermore, some respondents might prefer answering questions using their preferred questionnaire type. So, we propose a modeling framework that allows respondents to use their preferred questionnaire type to answer the survey and enable analysts to use the modeling frameworks of their choice to predict behavior. We demonstrate this using a dataset collected from the USA that measures the intention to use Autonomous Vehicles for commute trips. Respondents were presented with alternative questionnaire versions (open- and closed- ended). Since our objective was also to compare the performance of alternative questionnaire versions, the survey was designed to eliminate influences resulting from statements, behavioral framework, and the choice experiment. Results indicate the suitability of using Topic Modeling to extract information from open-ended responses; however, the models estimated using the closed-ended questions perform better compared to them. Besides, the proposed model performs better compared to the models used currently. Furthermore, our proposed framework will allow respondents to choose the questionnaire type to answer, which could be particularly beneficial to them when using voice-based surveys. △ Less

Submitted 3 May, 2022; originally announced May 2022.

arXiv:2203.09279 [pdf]

Transfer learning for cross-modal demand prediction of bike-share and public transit

Authors: Mingzhuang Hua, Francisco Camara Pereira, Yu Jiang, Xuewu Chen

Abstract: The urban transportation system is a combination of multiple transport modes, and the interdependencies across those modes exist. This means that the travel demand across different travel modes could be correlated as one mode may receive demand from or create demand for another mode, not to mention natural correlations between different demand time series due to general demand flow patterns across… ▽ More The urban transportation system is a combination of multiple transport modes, and the interdependencies across those modes exist. This means that the travel demand across different travel modes could be correlated as one mode may receive demand from or create demand for another mode, not to mention natural correlations between different demand time series due to general demand flow patterns across the network. It is expectable that cross-modal ripple effects become more prevalent, with Mobility as a Service. Therefore, by propagating demand data across modes, a better demand prediction could be obtained. To this end, this study explores various machine learning models and transfer learning strategies for cross-modal demand prediction. The trip data of bike-share, metro, and taxi are processed as the station-level passenger flows, and then the proposed prediction method is tested in the large-scale case studies of Nan**g and Chicago. The results suggest that prediction models with transfer learning perform better than unimodal prediction models. Furthermore, stacked Long Short-Term Memory model performs particularly well in cross-modal demand prediction. These results verify our combined method's forecasting improvement over existing benchmarks and demonstrate the good transferability for cross-modal demand prediction in multiple cities. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Comments: 27 pages, 4 figures

arXiv:2202.11962 [pdf, other]

Large Scale Passenger Detection with Smartphone/Bus Implicit Interaction and Multisensory Unsupervised Cause-effect Learning

Authors: Valentino Servizi, Dan R. Persson, Francisco C. Pereira, Hannah Villadsen, Per Bækgaard, Jeppe Rich, Otto A. Nielsen

Abstract: Intelligent Transportation Systems (ITS) underpin the concept of Mobility as a Service (MaaS), which requires universal and seamless users' access across multiple public and private transportation systems while allowing operators' proportional revenue sharing. Current user sensing technologies such as Walk-in/Walk-out (WIWO) and Check-in/Check-out (CICO) have limited scalability for large-scale de… ▽ More Intelligent Transportation Systems (ITS) underpin the concept of Mobility as a Service (MaaS), which requires universal and seamless users' access across multiple public and private transportation systems while allowing operators' proportional revenue sharing. Current user sensing technologies such as Walk-in/Walk-out (WIWO) and Check-in/Check-out (CICO) have limited scalability for large-scale deployments. These limitations prevent ITS from supporting analysis, optimization, calculation of revenue sharing, and control of MaaS comfort, safety, and efficiency. We focus on the concept of implicit Be-in/Be-out (BIBO) smartphone-sensing and classification. To close the gap and enhance smartphones towards MaaS, we developed a proprietary smartphone-sensing platform collecting contemporary Bluetooth Low Energy (BLE) signals from BLE devices installed on buses and Global Positioning System (GPS) locations of both buses and smartphones. To enable the training of a model based on GPS features against the BLE pseudo-label, we propose the Cause-Effect Multitask Wasserstein Autoencoder (CEMWA). CEMWA combines and extends several frameworks around Wasserstein autoencoders and neural networks. As a dimensionality reduction tool, CEMWA obtains an auto-validated representation of a latent space describing users' smartphones within the transport system. This representation allows BIBO clustering via DBSCAN. We perform an ablation study of CEMWA's alternative architectures and benchmark against the best available supervised methods. We analyze performance's sensitivity to label quality. Under the naïve assumption of accurate ground truth, XGBoost outperforms CEMWA. Although XGBoost and Random Forest prove to be tolerant to label noise, CEMWA is agnostic to label noise by design and provides the best performance with an 88\% F1 score. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: 20 pages, 13 figures, 3 tables

arXiv:2202.11961 [pdf, other]

doi 10.1109/TITS.2023.3291493

"Is not the truth the truth?": Analyzing the Impact of User Validations for Bus In/Out Detection in Smartphone-based Surveys

Authors: Valentino Servizi., Dan R. Persson, Francisco C. Pereira, Hannah Villadsen, Per Bækgaard, Inon Peled, Otto A. Nielsen

Abstract: Passenger flow allows the study of users' behavior through the public network and assists in designing new facilities and services. This flow is observed through interactions between passengers and infrastructure. For this task, Bluetooth technology and smartphones represent the ideal solution. The latter component allows users' identification, authentication, and billing, while the former allows… ▽ More Passenger flow allows the study of users' behavior through the public network and assists in designing new facilities and services. This flow is observed through interactions between passengers and infrastructure. For this task, Bluetooth technology and smartphones represent the ideal solution. The latter component allows users' identification, authentication, and billing, while the former allows short-range implicit interactions, device-to-device. To assess the potential of such a use case, we need to verify how robust Bluetooth signal and related machine learning (ML) classifiers are against the noise of realistic contexts. Therefore, we model binary passenger states with respect to a public vehicle, where one can either be-in or be-out (BIBO). The BIBO label identifies a fundamental building block of continuously-valued passenger flow. This paper describes the Human-Computer interaction experimental setting in a semi-controlled environment, which involves: two autonomous vehicles operating on two routes, serving three bus stops and eighteen users, as well as a proprietary smartphone-Bluetooth sensing platform. The resulting dataset includes multiple sensors' measurements of the same event and two ground-truth levels, the first being validation by participants, the second by three video-cameras surveilling buses and track. We performed a Monte-Carlo simulation of labels-flip to emulate human errors in the labeling process, as is known to happen in smartphone surveys; next we used such flipped labels for supervised training of ML classifiers. The impact of errors on model performance bias can be large. Results show ML tolerance to label flips caused by human or machine errors up to 30%. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: 22 pages, 11 figures, 4 tables, 3 algorithms

arXiv:2201.10307 [pdf, other]

Unboxing the graph: Neural Relational Inference for Mobility Prediction

Authors: Mathias Niemann Tygesen, Francisco C. Pereira, Filipe Rodrigues

Abstract: Predicting the supply and demand of transport systems is vital for efficient traffic management, control, optimization, and planning. For example, predicting where from/to and when people intend to travel by taxi can support fleet managers to distribute resources; better predicting traffic speeds/congestion allows for pro-active control measures or for users to better choose their paths. Making sp… ▽ More Predicting the supply and demand of transport systems is vital for efficient traffic management, control, optimization, and planning. For example, predicting where from/to and when people intend to travel by taxi can support fleet managers to distribute resources; better predicting traffic speeds/congestion allows for pro-active control measures or for users to better choose their paths. Making spatio-temporal predictions is known to be a hard task, but recently Graph Neural Networks (GNNs) have been widely applied on non-euclidean spatial data. However, most GNN models require a predefined graph, and so far, researchers rely on heuristics to generate this graph for the model to use. In this paper, we use Neural Relational Inference to learn the optimal graph for the model. Our approach has several advantages: 1) a Variational Auto Encoder structure allows for the graph to be dynamically determined by the data, potentially changing through time; 2) the encoder structure allows the use of external data in the generation of the graph; 3) it is possible to place Bayesian priors on the generated graphs to encode domain knowledge. We conduct experiments on two datasets, namely the NYC Yellow Taxi and the PEMS road traffic datasets. In both datasets, we outperform benchmarks and show performance comparable to state-of-the-art. Furthermore, we do an in-depth analysis of the learned graphs, providing insights on what kinds of connections GNNs use for spatio-temporal predictions in the transport domain. △ Less

Submitted 25 January, 2022; originally announced January 2022.

arXiv:2109.12042 [pdf, other]

Combining Discrete Choice Models and Neural Networks through Embeddings: Formulation, Interpretability and Performance

Authors: Ioanna Arkoudi, Carlos Lima Azevedo, Francisco C. Pereira

Abstract: This study proposes a novel approach that combines theory and data-driven choice models using Artificial Neural Networks (ANNs). In particular, we use continuous vector representations, called embeddings, for encoding categorical or discrete explanatory variables with a special focus on interpretability and model transparency. Although embedding representations within the logit framework have been… ▽ More This study proposes a novel approach that combines theory and data-driven choice models using Artificial Neural Networks (ANNs). In particular, we use continuous vector representations, called embeddings, for encoding categorical or discrete explanatory variables with a special focus on interpretability and model transparency. Although embedding representations within the logit framework have been conceptualized by Pereira (2019), their dimensions do not have an absolute definitive meaning, hence offering limited behavioral insights in this earlier work. The novelty of our work lies in enforcing interpretability to the embedding vectors by formally associating each of their dimensions to a choice alternative. Thus, our approach brings benefits much beyond a simple parsimonious representation improvement over dummy encoding, as it provides behaviorally meaningful outputs that can be used in travel demand analysis and policy decisions. Additionally, in contrast to previously suggested ANN-based Discrete Choice Models (DCMs) that either sacrifice interpretability for performance or are only partially interpretable, our models preserve interpretability of the utility coefficients for all the input variables despite being based on ANN principles. The proposed models were tested on two real world datasets and evaluated against benchmark and baseline models that use dummy-encoding. The results of the experiments indicate that our models deliver state-of-the-art predictive performance, outperforming existing ANN-based models while drastically reducing the number of required network parameters. △ Less

Submitted 30 September, 2021; v1 submitted 24 September, 2021; originally announced September 2021.

arXiv:2108.00858 [pdf, other]

Predictive and Prescriptive Performance of Bike-Sharing Demand Forecasts for Inventory Management

Authors: Daniele Gammelli, Yihua Wang, Dennis Prak, Filipe Rodrigues, Stefan Minner, Francisco Camara Pereira

Abstract: Bike-sharing systems are a rapidly develo** mode of transportation and provide an efficient alternative to passive, motorized personal mobility. The asymmetric nature of bike demand causes the need for rebalancing bike stations, which is typically done during night time. To determine the optimal starting inventory level of a station for a given day, a User Dissatisfaction Function (UDF) models u… ▽ More Bike-sharing systems are a rapidly develo** mode of transportation and provide an efficient alternative to passive, motorized personal mobility. The asymmetric nature of bike demand causes the need for rebalancing bike stations, which is typically done during night time. To determine the optimal starting inventory level of a station for a given day, a User Dissatisfaction Function (UDF) models user pickups and returns as non-homogeneous Poisson processes with piece-wise linear rates. In this paper, we devise a deep generative model directly applicable in the UDF by introducing a variational Poisson recurrent neural network model (VP-RNN) to forecast future pickup and return rates. We empirically evaluate our approach against both traditional and learning-based forecasting methods on real trip travel data from the city of New York, USA, and show how our model outperforms benchmarks in terms of system efficiency and demand satisfaction. By explicitly focusing on the combination of decision-making algorithms with learning-based forecasting methods, we highlight a number of shortcomings in literature. Crucially, we show how more accurate predictions do not necessarily translate into better inventory decisions. By providing insights into the interplay between forecasts, model assumptions, and decisions, we point out that forecasts and decision models should be carefully evaluated and harmonized to optimally control shared mobility systems. △ Less

Submitted 28 July, 2021; originally announced August 2021.

Comments: 28 pages, 6 figures

arXiv:2106.10940 [pdf, other]

Deep Spatio-Temporal Forecasting of Electrical Vehicle Charging Demand

Authors: Frederik Boe Hüttel, Inon Peled, Filipe Rodrigues, Francisco C. Pereira

Abstract: Electric vehicles can offer a low carbon emission solution to reverse rising emission trends. However, this requires that the energy used to meet the demand is green. To meet this requirement, accurate forecasting of the charging demand is vital. Short and long-term charging demand forecasting will allow for better optimisation of the power grid and future infrastructure expansions. In this paper,… ▽ More Electric vehicles can offer a low carbon emission solution to reverse rising emission trends. However, this requires that the energy used to meet the demand is green. To meet this requirement, accurate forecasting of the charging demand is vital. Short and long-term charging demand forecasting will allow for better optimisation of the power grid and future infrastructure expansions. In this paper, we propose to use publicly available data to forecast the electric vehicle charging demand. To model the complex spatial-temporal correlations between charging stations, we argue that Temporal Graph Convolution Models are the most suitable to capture the correlations. The proposed Temporal Graph Convolutional Networks provide the most accurate forecasts for short and long-term forecasting compared with other forecasting methods. △ Less

Submitted 21 June, 2021; originally announced June 2021.

arXiv:2105.14716 [pdf, other]

doi 10.1016/j.trc.2021.103195

Improving the Accuracy and Efficiency of Online Calibration for Simulation-based Dynamic Traffic Assignment

Authors: Haizheng Zhang, Ravi Seshadri, A. Arun Prakash, Constantinos Antoniou, Francisco C. Pereira, Moshe Ben-Akiva

Abstract: Simulation-based Dynamic Traffic Assignment models have important applications in real-time traffic management and control. The efficacy of these systems rests on the ability to generate accurate estimates and predictions of traffic states, which necessitates online calibration. A widely used solution approach for online calibration is the Extended Kalman Filter (EKF), which -- although appealing… ▽ More Simulation-based Dynamic Traffic Assignment models have important applications in real-time traffic management and control. The efficacy of these systems rests on the ability to generate accurate estimates and predictions of traffic states, which necessitates online calibration. A widely used solution approach for online calibration is the Extended Kalman Filter (EKF), which -- although appealing in its flexibility to incorporate any class of parameters and measurements -- poses several challenges with regard to calibration accuracy and scalability, especially in congested situations for large-scale networks. This paper addresses these issues in turn so as to improve the accuracy and efficiency of EKF-based online calibration approaches for large and congested networks. First, the concept of state augmentation is revisited to handle violations of the Markovian assumption typically implicit in online applications of the EKF. Second, a method based on graph-coloring is proposed to operationalize the partitioned finite-difference approach that enhances scalability of the gradient computations. Several synthetic experiments and a real world case study demonstrate that application of the proposed approaches yields improvements in terms of both prediction accuracy and computational performance. The work has applications in real-world deployments of simulation-based dynamic traffic assignment systems. △ Less

Submitted 31 May, 2021; originally announced May 2021.

Comments: 26 pages, 15 figures

Journal ref: Transportation Research Part C: Emerging Technologies Volume 128, July 2021, 103195

arXiv:2104.11434 [pdf, other]

Graph Neural Network Reinforcement Learning for Autonomous Mobility-on-Demand Systems

Authors: Daniele Gammelli, Kaidi Yang, James Harrison, Filipe Rodrigues, Francisco C. Pereira, Marco Pavone

Abstract: Autonomous mobility-on-demand (AMoD) systems represent a rapidly develo** mode of transportation wherein travel requests are dynamically handled by a coordinated fleet of robotic, self-driving vehicles. Given a graph representation of the transportation network - one where, for example, nodes represent areas of the city, and edges the connectivity between them - we argue that the AMoD control pr… ▽ More Autonomous mobility-on-demand (AMoD) systems represent a rapidly develo** mode of transportation wherein travel requests are dynamically handled by a coordinated fleet of robotic, self-driving vehicles. Given a graph representation of the transportation network - one where, for example, nodes represent areas of the city, and edges the connectivity between them - we argue that the AMoD control problem is naturally cast as a node-wise decision-making problem. In this paper, we propose a deep reinforcement learning framework to control the rebalancing of AMoD systems through graph neural networks. Crucially, we demonstrate that graph neural networks enable reinforcement learning agents to recover behavior policies that are significantly more transferable, generalizable, and scalable than policies learned through other approaches. Empirically, we show how the learned policies exhibit promising zero-shot transfer capabilities when faced with critical portability tasks such as inter-city generalization, service area expansion, and adaptation to potentially complex urban topologies. △ Less

Submitted 16 August, 2021; v1 submitted 23 April, 2021; originally announced April 2021.

arXiv:2104.01214 [pdf, other]

Modeling Censored Mobility Demand through Quantile Regression Neural Networks

Authors: Frederik Boe Hüttel, Inon Peled, Filipe Rodrigues, Francisco C. Pereira

Abstract: Shared mobility services require accurate demand models for effective service planning. On the one hand, modeling the full probability distribution of demand is advantageous because the entire uncertainty structure preserves valuable information for decision-making. On the other hand, demand is often observed through the usage of the service itself, so that the observations are censored, as they a… ▽ More Shared mobility services require accurate demand models for effective service planning. On the one hand, modeling the full probability distribution of demand is advantageous because the entire uncertainty structure preserves valuable information for decision-making. On the other hand, demand is often observed through the usage of the service itself, so that the observations are censored, as they are inherently limited by available supply. Since the 1980s, various works on Censored Quantile Regression models have performed well under such conditions. Further, in the last two decades, several papers have proposed to implement these models flexibly through Neural Networks. However, the models in current works estimate the quantiles individually, thus incurring a computational overhead and ignoring valuable relationships between the quantiles. We address this gap by extending current Censored Quantile Regression models to learn multiple quantiles at once and apply these to synthetic baseline datasets and datasets from two shared mobility providers in the Copenhagen metropolitan area in Denmark. The results show that our extended models yield fewer quantile crossings and less computational overhead without compromising model performance. △ Less

Submitted 9 July, 2022; v1 submitted 2 April, 2021; originally announced April 2021.

Comments: 13 pages, 9 figures, 5 tables

arXiv:2011.06851 [pdf]

Population synthesis for urban resident modeling using deep generative models

Authors: Martin Johnsen, Oliver Brandt, Sergio Garrido, Francisco C. Pereira

Abstract: The impacts of new real estate developments are strongly associated to its population distribution (types and compositions of households, incomes, social demographics) conditioned on aspects such as dwelling typology, price, location, and floor level. This paper presents a Machine Learning based method to model the population distribution of upcoming developments of new buildings within larger nei… ▽ More The impacts of new real estate developments are strongly associated to its population distribution (types and compositions of households, incomes, social demographics) conditioned on aspects such as dwelling typology, price, location, and floor level. This paper presents a Machine Learning based method to model the population distribution of upcoming developments of new buildings within larger neighborhood/condo settings. We use a real data set from Ecopark Township, a real estate development project in Hanoi, Vietnam, where we study two machine learning algorithms from the deep generative models literature to create a population of synthetic agents: Conditional Variational Auto-Encoder (CVAE) and Conditional Generative Adversarial Networks (CGAN). A large experimental study was performed, showing that the CVAE outperforms both the empirical distribution, a non-trivial baseline model, and the CGAN in estimating the population distribution of new real estate development projects. △ Less

Submitted 13 November, 2020; originally announced November 2020.

arXiv:2008.13443 [pdf, other]

doi 10.1016/j.commtr.2021.100008

On the Quality Requirements of Demand Prediction for Dynamic Public Transport

Authors: Inon Peled, Kelvin Lee, Yu Jiang, Justin Dauwels, Francisco C. Pereira

Abstract: As Public Transport (PT) becomes more dynamic and demand-responsive, it increasingly depends on predictions of transport demand. But how accurate need such predictions be for effective PT operation? We address this question through an experimental case study of PT trips in Metropolitan Copenhagen, Denmark, which we conduct independently of any specific prediction models. First, we simulate errors… ▽ More As Public Transport (PT) becomes more dynamic and demand-responsive, it increasingly depends on predictions of transport demand. But how accurate need such predictions be for effective PT operation? We address this question through an experimental case study of PT trips in Metropolitan Copenhagen, Denmark, which we conduct independently of any specific prediction models. First, we simulate errors in demand prediction through unbiased noise distributions that vary considerably in shape. Using the noisy predictions, we then simulate and optimize demand-responsive PT fleets via a linear programming formulation and measure their performance. Our results suggest that the optimized performance is mainly affected by the skew of the noise distribution and the presence of infrequently large prediction errors. In particular, the optimized performance can improve under non-Gaussian vs. Gaussian noise. We also find that dynamic routing could reduce trip time by at least 23% vs. static routing. This reduction is estimated at 809,000 EUR/year in terms of Value of Travel Time Savings for the case study. △ Less

Submitted 6 November, 2021; v1 submitted 31 August, 2020; originally announced August 2020.

Comments: 26 pages, 9 tables, 6 figures

arXiv:2008.07283 [pdf, other]

Estimating Causal Effects with the Neural Autoregressive Density Estimator

Authors: Sergio Garrido, Stanislav S. Borysov, Jeppe Rich, Francisco C. Pereira

Abstract: Estimation of causal effects is fundamental in situations were the underlying system will be subject to active interventions. Part of building a causal inference engine is defining how variables relate to each other, that is, defining the functional relationship between variables given conditional dependencies. In this paper, we deviate from the common assumption of linear relationships in causal… ▽ More Estimation of causal effects is fundamental in situations were the underlying system will be subject to active interventions. Part of building a causal inference engine is defining how variables relate to each other, that is, defining the functional relationship between variables given conditional dependencies. In this paper, we deviate from the common assumption of linear relationships in causal models by making use of neural autoregressive density estimators and use them to estimate causal effects within the Pearl's do-calculus framework. Using synthetic data, we show that the approach can retrieve causal effects from non-linear systems without explicitly modeling the interactions between the variables. △ Less

Submitted 1 March, 2021; v1 submitted 17 August, 2020; originally announced August 2020.

arXiv:2007.02739 [pdf]

doi 10.1016/j.jocm.2021.100320

Semi-nonparametric Latent Class Choice Model with a Flexible Class Membership Component: A Mixture Model Approach

Authors: Georges Sfeir, Maya Abou-Zeid, Filipe Rodrigues, Francisco Camara Pereira, Isam Kaysi

Abstract: This study presents a semi-nonparametric Latent Class Choice Model (LCCM) with a flexible class membership component. The proposed model formulates the latent classes using mixture models as an alternative approach to the traditional random utility specification with the aim of comparing the two approaches on various measures including prediction accuracy and representation of heterogeneity in the… ▽ More This study presents a semi-nonparametric Latent Class Choice Model (LCCM) with a flexible class membership component. The proposed model formulates the latent classes using mixture models as an alternative approach to the traditional random utility specification with the aim of comparing the two approaches on various measures including prediction accuracy and representation of heterogeneity in the choice process. Mixture models are parametric model-based clustering techniques that have been widely used in areas such as machine learning, data mining and patter recognition for clustering and classification problems. An Expectation-Maximization (EM) algorithm is derived for the estimation of the proposed model. Using two different case studies on travel mode choice behavior, the proposed model is compared to traditional discrete choice models on the basis of parameter estimates' signs, value of time, statistical goodness-of-fit measures, and cross-validation tests. Results show that mixture models improve the overall performance of latent class choice models by providing better out-of-sample prediction accuracy in addition to better representations of heterogeneity without weakening the behavioral and economic interpretability of the choice models. △ Less

Submitted 6 July, 2020; originally announced July 2020.

arXiv:2003.04109 [pdf, other]

doi 10.1080/17477778.2020.1756702

QTIP: Quick simulation-based adaptation of Traffic model per Incident Parameters

Authors: Inon Peled, Raghuveer Kamalakar, Carlos Lima Azevedo, Francisco C. Pereira

Abstract: Current data-driven traffic prediction models are usually trained with large datasets, e.g. several months of speeds and flows. Such models provide very good fit for ordinary road conditions, but often fail just when they are most needed: when traffic suffers a sudden and significant disruption, such as a road incident. In this work, we describe QTIP: a simulation-based framework for quasi-instant… ▽ More Current data-driven traffic prediction models are usually trained with large datasets, e.g. several months of speeds and flows. Such models provide very good fit for ordinary road conditions, but often fail just when they are most needed: when traffic suffers a sudden and significant disruption, such as a road incident. In this work, we describe QTIP: a simulation-based framework for quasi-instantaneous adaptation of prediction models upon traffic disruption. In a nutshell, QTIP performs real-time simulations of the affected road for multiple scenarios, analyzes the results, and suggests a change to an ordinary prediction model accordingly. QTIP constructs the simulated scenarios per properties of the incident, as conveyed by immediate distress signals from affected vehicles. Such real-time signals are provided by In-Vehicle Monitor Systems, which are becoming increasingly prevalent world-wide. We experiment QTIP in a case study of a Danish motorway, and the results show that QTIP can improve traffic prediction in the first critical minutes of road incidents. △ Less

Submitted 9 March, 2020; originally announced March 2020.

Comments: 18 pages, 13 figures, 4 tables

arXiv:2002.00922 [pdf, other]

A Neural-embedded Choice Model: TasteNet-MNL Modeling Taste Heterogeneity with Flexibility and Interpretability

Authors: Yafei Han, Francisco Camara Pereira, Moshe Ben-Akiva, Christopher Zegras

Abstract: Discrete choice models (DCMs) require a priori knowledge of the utility functions, especially how tastes vary across individuals. Utility misspecification may lead to biased estimates, inaccurate interpretations and limited predictability. In this paper, we utilize a neural network to learn taste representation. Our formulation consists of two modules: a neural network (TasteNet) that learns taste… ▽ More Discrete choice models (DCMs) require a priori knowledge of the utility functions, especially how tastes vary across individuals. Utility misspecification may lead to biased estimates, inaccurate interpretations and limited predictability. In this paper, we utilize a neural network to learn taste representation. Our formulation consists of two modules: a neural network (TasteNet) that learns taste parameters (e.g., time coefficient) as flexible functions of individual characteristics; and a multinomial logit (MNL) model with utility functions defined with expert knowledge. Taste parameters learned by the neural network are fed into the choice model and link the two modules. Our approach extends the L-MNL model (Sifringer et al., 2020) by allowing the neural network to learn the interactions between individual characteristics and alternative attributes. Moreover, we formalize and strengthen the interpretability condition - requiring realistic estimates of behavior indicators (e.g., value-of-time, elasticity) at the disaggregated level, which is crucial for a model to be suitable for scenario analysis and policy decisions. Through a unique network architecture and parameter transformation, we incorporate prior knowledge and guide the neural network to output realistic behavior indicators at the disaggregated level. We show that TasteNet-MNL reaches the ground-truth model's predictability and recovers the nonlinear taste functions on synthetic data. Its estimated value-of-time and choice elasticities at the individual level are close to the ground truth. On a publicly available Swissmetro dataset, TasteNet-MNL outperforms benchmarking MNLs and Mixed Logit model's predictability. It learns a broader spectrum of taste variations within the population and suggests a higher average value-of-time. △ Less

Submitted 1 July, 2022; v1 submitted 3 February, 2020; originally announced February 2020.

arXiv:2001.11399 [pdf, other]

Uncovering life-course patterns with causal discovery and survival analysis

Authors: Bojan Kostic, Romain Crastes dit Sourd, Stephane Hess, Joachim Scheiner, Christian Holz-Rau, Francisco C. Pereira

Abstract: We provide a novel approach and an exploratory study for modelling life event choices and occurrence from a probabilistic perspective through causal discovery and survival analysis. Our approach is formulated as a bi-level problem. In the upper level, we build the life events graph, using causal discovery tools. In the lower level, for the pairs of life events, time-to-event modelling through surv… ▽ More We provide a novel approach and an exploratory study for modelling life event choices and occurrence from a probabilistic perspective through causal discovery and survival analysis. Our approach is formulated as a bi-level problem. In the upper level, we build the life events graph, using causal discovery tools. In the lower level, for the pairs of life events, time-to-event modelling through survival analysis is applied to model time-dependent transition probabilities. Several life events were analysed, such as getting married, buying a new car, child birth, home relocation and divorce, together with the socio-demographic attributes for survival modelling, some of which are age, nationality, number of children, number of cars and home ownership. The data originates from a survey conducted in Dortmund, Germany, with the questionnaire containing a series of retrospective questions about residential and employment biography, travel behaviour and holiday trips, as well as socio-economic characteristic. Although survival analysis has been used in the past to analyse life-course data, this is the first time that a bi-level model has been formulated. The inclusion of a causal discovery algorithm in the upper-level allows us to first identify causal relationships between life-course events and then understand the factors that might influence transition rates between events. This is very different from more classic choice models where causal relationships are subject to expert interpretations based on model results. △ Less

Submitted 30 January, 2020; originally announced January 2020.

Comments: 26 pages, 10 figures

arXiv:2001.07402 [pdf, other]

Estimating Latent Demand of Shared Mobility through Censored Gaussian Processes

Authors: Daniele Gammelli, Inon Peled, Filipe Rodrigues, Dario Pacino, Haci A. Kurtaran, Francisco C. Pereira

Abstract: Transport demand is highly dependent on supply, especially for shared transport services where availability is often limited. As observed demand cannot be higher than available supply, historical transport data typically represents a biased, or censored, version of the true underlying demand pattern. Without explicitly accounting for this inherent distinction, predictive models of demand would nec… ▽ More Transport demand is highly dependent on supply, especially for shared transport services where availability is often limited. As observed demand cannot be higher than available supply, historical transport data typically represents a biased, or censored, version of the true underlying demand pattern. Without explicitly accounting for this inherent distinction, predictive models of demand would necessarily represent a biased version of true demand, thus less effectively predicting the needs of service users. To counter this problem, we propose a general method for censorship-aware demand modeling, for which we devise a censored likelihood function. We apply this method to the task of shared mobility demand prediction by incorporating the censored likelihood within a Gaussian Process model, which can flexibly approximate arbitrary functional forms. Experiments on artificial and real-world datasets show how taking into account the limiting effect of supply on demand is essential in the process of obtaining an unbiased predictive model of user demand behavior. △ Less

Submitted 17 February, 2020; v1 submitted 21 January, 2020; originally announced January 2020.

Comments: 21 pages, 10 figures

arXiv:1912.11259 [pdf, other]

doi 10.1186/s12544-021-00516-z

Mining User Behaviour from Smartphone data: a literature review

Authors: Valentino Servizi, Francisco C. Pereira, Marie K. Anderson, Otto A. Nielsen

Abstract: To study users' travel behaviour and travel time between origin and destination, researchers employ travel surveys. Although there is consensus in the field about the potential, after over ten years of research and field experimentation, Smartphone-based travel surveys still did not take off to a large scale. Here, computer intelligence algorithms take the role that operators have in Traditional T… ▽ More To study users' travel behaviour and travel time between origin and destination, researchers employ travel surveys. Although there is consensus in the field about the potential, after over ten years of research and field experimentation, Smartphone-based travel surveys still did not take off to a large scale. Here, computer intelligence algorithms take the role that operators have in Traditional Travel Surveys; since we train each algorithm on data, performances rest on the data quality, thus on the ground truth. Inaccurate validations affect negatively: labels, algorithms' training, travel diaries precision, and therefore data validation, within a very critical loop. Interestingly, boundaries are proven burdensome to push even for Machine Learning methods. To support optimal investment decisions for practitioners, we expose the drivers they should consider when assessing what they need against what they get. This paper highlights and examines the critical aspects of the underlying research and provides some recommendations: (i) from the device perspective, on the main physical limitations; (ii) from the application perspective, the methodological framework deployed for the automatic generation of travel diaries; (iii)from the ground truth perspective, the relationship between user interaction, methods, and data. △ Less

Submitted 3 February, 2020; v1 submitted 24 December, 2019; originally announced December 2019.

arXiv:1909.07689 [pdf, other]

Prediction of rare feature combinations in population synthesis: Application of deep generative modelling

Authors: Sergio Garrido, Stanislav S. Borysov, Francisco C. Pereira, Jeppe Rich

Abstract: In population synthesis applications, when considering populations with many attributes, a fundamental problem is the estimation of rare combinations of feature attributes. Unsurprisingly, it is notably more difficult to reliably representthe sparser regions of such multivariate distributions and in particular combinations of attributes which are absent from the original sample. In the literature… ▽ More In population synthesis applications, when considering populations with many attributes, a fundamental problem is the estimation of rare combinations of feature attributes. Unsurprisingly, it is notably more difficult to reliably representthe sparser regions of such multivariate distributions and in particular combinations of attributes which are absent from the original sample. In the literature this is commonly known as sampling zeros for which no systematic solution has been proposed so far. In this paper, two machine learning algorithms, from the family of deep generative models,are proposed for the problem of population synthesis and with particular attention to the problem of sampling zeros. Specifically, we introduce the Wasserstein Generative Adversarial Network (WGAN) and the Variational Autoencoder(VAE), and adapt these algorithms for a large-scale population synthesis application. The models are implemented on a Danish travel survey with a feature-space of more than 60 variables. The models are validated in a cross-validation scheme and a set of new metrics for the evaluation of the sampling-zero problem is proposed. Results show how these models are able to recover sampling zeros while kee** the estimation of truly impossible combinations, the structural zeros, at a comparatively low level. Particularly, for a low dimensional experiment, the VAE, the marginal sampler and the fully random sampler generate 5%, 21% and 26%, respectively, more structural zeros per sampling zero generated by the WGAN, while for a high dimensional case, these figures escalate to 44%, 2217% and 170440%, respectively. This research directly supports the development of agent-based systems and in particular cases where detailed socio-economic or geographical representations are required. △ Less

Submitted 17 September, 2019; originally announced September 2019.

arXiv:1909.00154 [pdf, other]

Rethinking travel behavior modeling representations through embeddings

Authors: Francisco C. Pereira

Abstract: This paper introduces the concept of travel behavior embeddings, a method for re-representing discrete variables that are typically used in travel demand modeling, such as mode, trip purpose, education level, family type or occupation. This re-representation process essentially maps those variables into a latent space called the \emph{embedding space}. The benefit of this is that such spaces allow… ▽ More This paper introduces the concept of travel behavior embeddings, a method for re-representing discrete variables that are typically used in travel demand modeling, such as mode, trip purpose, education level, family type or occupation. This re-representation process essentially maps those variables into a latent space called the \emph{embedding space}. The benefit of this is that such spaces allow for richer nuances than the typical transformations used in categorical variables (e.g. dummy encoding, contrasted encoding, principal components analysis). While the usage of latent variable representations is not new per se in travel demand modeling, the idea presented here brings several innovations: it is an entirely data driven algorithm; it is informative and consistent, since the latent space can be visualized and interpreted based on distances between different categories; it preserves interpretability of coefficients, despite being based on Neural Network principles; and it is transferrable, in that embeddings learned from one dataset can be reused for other ones, as long as travel behavior keeps consistent between the datasets. The idea is strongly inspired on natural language processing techniques, namely the word2vec algorithm. Such algorithm is behind recent developments such as in automatic translation or next word prediction. Our method is demonstrated using a model choice model, and shows improvements of up to 60\% with respect to initial likelihood, and up to 20% with respect to likelihood of the corresponding traditional model (i.e. using dummy variables) in out-of-sample evaluation. We provide a new Python package, called PyTre (PYthon TRavel Embeddings), that others can straightforwardly use to replicate our results or improve their own models. Our experiments are themselves based on an open dataset (swissmetro). △ Less

Submitted 31 August, 2019; originally announced September 2019.

arXiv:1903.02791 [pdf, other]

doi 10.1016/j.eswa.2018.11.028

Multi-output Bus Travel Time Prediction with Convolutional LSTM Neural Network

Authors: Niklas Christoffer Petersen, Filipe Rodrigues, Francisco Camara Pereira

Abstract: Accurate and reliable travel time predictions in public transport networks are essential for delivering an attractive service that is able to compete with other modes of transport in urban areas. The traditional application of this information, where arrival and departure predictions are displayed on digital boards, is highly visible in the city landscape of most modern metropolises. More recently… ▽ More Accurate and reliable travel time predictions in public transport networks are essential for delivering an attractive service that is able to compete with other modes of transport in urban areas. The traditional application of this information, where arrival and departure predictions are displayed on digital boards, is highly visible in the city landscape of most modern metropolises. More recently, the same information has become critical as input for smart-phone trip planners in order to alert passengers about unreachable connections, alternative route choices and prolonged travel times. More sophisticated Intelligent Transport Systems (ITS) include the predictions of connection assurance, i.e. to hold back services in case a connecting service is delayed. In order to operate such systems, and to ensure the confidence of passengers in the systems, the information provided must be accurate and reliable. Traditional methods have trouble with this as congestion, and thus travel time variability, increases in cities, consequently making travel time predictions in urban areas a non-trivial task. This paper presents a system for bus travel time prediction that leverages the non-static spatio-temporal correlations present in urban bus networks, allowing the discovery of complex patterns not captured by traditional methods. The underlying model is a multi-output, multi-time-step, deep neural network that uses a combination of convolutional and long short-term memory (LSTM) layers. The method is empirically evaluated and compared to other popular approaches for link travel time prediction and currently available services, including the currently deployed model in Copenhagen, Denmark. We find that the proposed model significantly outperforms all the other methods we compare with, and is able to detect small irregular peaks in bus travel times very quickly. △ Less

Submitted 7 March, 2019; originally announced March 2019.

Journal ref: Expert Systems with Applications, Volume 120, 15 April 2019, Pages 426-435

arXiv:1902.09745 [pdf, other]

doi 10.1109/ITSC.2019.8916878

Online Predictive Optimization Framework for Stochastic Demand-Responsive Transit Services

Authors: Inon Peled, Kelvin Lee, Yu Jiang, Justin Dauwels, Francisco C. Pereira

Abstract: This study develops an online predictive optimization framework for dynamically operating a transit service in an area of crowd movements. The proposed framework integrates demand prediction and supply optimization to periodically redesign the service routes based on recently observed demand. To predict demand for the service, we use Quantile Regression to estimate the marginal distribution of mov… ▽ More This study develops an online predictive optimization framework for dynamically operating a transit service in an area of crowd movements. The proposed framework integrates demand prediction and supply optimization to periodically redesign the service routes based on recently observed demand. To predict demand for the service, we use Quantile Regression to estimate the marginal distribution of movement counts between each pair of serviced locations. The framework then combines these marginals into a joint demand distribution by constructing a Gaussian copula, which captures the structure of correlation between the marginals. For supply optimization, we devise a linear programming model, which simultaneously determines the route structure and the service frequency according to the predicted demand. Importantly, our framework both preserves the uncertainty structure of future demand and leverages this for robust route optimization, while kee** both components decoupled. We evaluate our framework using a real-world case study of autonomous mobility in a university campus in Denmark. The results show that our framework often obtains the ground truth optimal solution, and can outperform conventional methods for route optimization, which do not leverage full predictive distributions. △ Less

Submitted 21 May, 2019; v1 submitted 26 February, 2019; originally announced February 2019.

Comments: 34 pages, 12 figures, 5 tables

Journal ref: 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 2019, pp. 3043-3048

arXiv:1812.08755 [pdf, other]

doi 10.1109/TPAMI.2016.2635136

A Bayesian Additive Model for Understanding Public Transport Usage in Special Events

Authors: Filipe Rodrigues, Stanislav S. Borysov, Bernardete Ribeiro, Francisco C. Pereira

Abstract: Public special events, like sports games, concerts and festivals are well known to create disruptions in transportation systems, often catching the operators by surprise. Although these are usually planned well in advance, their impact is difficult to predict, even when organisers and transportation operators coordinate. The problem highly increases when several events happen concurrently. To solv… ▽ More Public special events, like sports games, concerts and festivals are well known to create disruptions in transportation systems, often catching the operators by surprise. Although these are usually planned well in advance, their impact is difficult to predict, even when organisers and transportation operators coordinate. The problem highly increases when several events happen concurrently. To solve these problems, costly processes, heavily reliant on manual search and personal experience, are usual practice in large cities like Singapore, London or Tokyo. This paper presents a Bayesian additive model with Gaussian process components that combines smart card records from public transport with context information about events that is continuously mined from the Web. We develop an efficient approximate inference algorithm using expectation propagation, which allows us to predict the total number of public transportation trips to the special event areas, thereby contributing to a more adaptive transportation system. Furthermore, for multiple concurrent event scenarios, the proposed algorithm is able to disaggregate gross trip counts into their most likely components related to specific events and routine behavior. Using real data from Singapore, we show that the presented model outperforms the best baseline model by up to 26% in R2 and also has explanatory power for its individual components. △ Less

Submitted 20 December, 2018; originally announced December 2018.

Comments: 14 pages, IEEE Transactions on Pattern Analysis and Machine Intelligence (Volume: 39 , Issue: 11 , Nov. 1 2017)

Journal ref: Rodrigues, F., Borysov, S. S., Ribeiro, B., & Pereira, F. C. (2017). A Bayesian additive model for understanding public transport usage in special events. IEEE transactions on pattern analysis and machine intelligence, 39(11), 2113-2126

arXiv:1812.08739 [pdf, other]

doi 10.1109/TITS.2018.2817879

Multi-Output Gaussian Processes for Crowdsourced Traffic Data Imputation

Authors: Filipe Rodrigues, Kristian Henrickson, Francisco C. Pereira

Abstract: Traffic speed data imputation is a fundamental challenge for data-driven transport analysis. In recent years, with the ubiquity of GPS-enabled devices and the widespread use of crowdsourcing alternatives for the collection of traffic data, transportation professionals increasingly look to such user-generated data for many analysis, planning, and decision support applications. However, due to the m… ▽ More Traffic speed data imputation is a fundamental challenge for data-driven transport analysis. In recent years, with the ubiquity of GPS-enabled devices and the widespread use of crowdsourcing alternatives for the collection of traffic data, transportation professionals increasingly look to such user-generated data for many analysis, planning, and decision support applications. However, due to the mechanics of the data collection process, crowdsourced traffic data such as probe-vehicle data is highly prone to missing observations, making accurate imputation crucial for the success of any application that makes use of that type of data. In this article, we propose the use of multi-output Gaussian processes (GPs) to model the complex spatial and temporal patterns in crowdsourced traffic data. While the Bayesian nonparametric formalism of GPs allows us to model observation uncertainty, the multi-output extension based on convolution processes effectively enables us to capture complex spatial dependencies between nearby road segments. Using 6 months of crowdsourced traffic speed data or "probe vehicle data" for several locations in Copenhagen, the proposed approach is empirically shown to significantly outperform popular state-of-the-art imputation methods. △ Less

Submitted 8 June, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

Comments: 10 pages, IEEE Transactions on Intelligent Transportation Systems, 2018

arXiv:1812.08733 [pdf, other]

doi 10.1016/j.trc.2018.08.007

Heteroscedastic Gaussian processes for uncertainty modeling in large-scale crowdsourced traffic data

Authors: Filipe Rodrigues, Francisco C. Pereira

Abstract: Accurately modeling traffic speeds is a fundamental part of efficient intelligent transportation systems. Nowadays, with the widespread deployment of GPS-enabled devices, it has become possible to crowdsource the collection of speed information to road users (e.g. through mobile applications or dedicated in-vehicle devices). Despite its rather wide spatial coverage, crowdsourced speed data also br… ▽ More Accurately modeling traffic speeds is a fundamental part of efficient intelligent transportation systems. Nowadays, with the widespread deployment of GPS-enabled devices, it has become possible to crowdsource the collection of speed information to road users (e.g. through mobile applications or dedicated in-vehicle devices). Despite its rather wide spatial coverage, crowdsourced speed data also brings very important challenges, such as the highly variable measurement noise in the data due to a variety of driving behaviors and sample sizes. When not properly accounted for, this noise can severely compromise any application that relies on accurate traffic data. In this article, we propose the use of heteroscedastic Gaussian processes (HGP) to model the time-varying uncertainty in large-scale crowdsourced traffic data. Furthermore, we develop a HGP conditioned on sample size and traffic regime (SRC-HGP), which makes use of sample size information (probe vehicles per minute) as well as previous observed speeds, in order to more accurately model the uncertainty in observed speeds. Using 6 months of crowdsourced traffic data from Copenhagen, we empirically show that the proposed heteroscedastic models produce significantly better predictive distributions when compared to current state-of-the-art methods for both speed imputation and short-term forecasting tasks. △ Less

Submitted 20 December, 2018; originally announced December 2018.

Comments: 22 pages, Transportation Research Part C: Emerging Technologies (Elsevier)

Journal ref: Rodrigues, F., & Pereira, F. C. (2018). Heteroscedastic Gaussian processes for uncertainty modeling in large-scale crowdsourced traffic data. Transportation Research Part C: Emerging Technologies, 95, 636-651

arXiv:1808.08798 [pdf, other]

Beyond expectation: Deep joint mean and quantile regression for spatio-temporal problems

Authors: Filipe Rodrigues, Francisco C. Pereira

Abstract: Spatio-temporal problems are ubiquitous and of vital importance in many research fields. Despite the potential already demonstrated by deep learning methods in modeling spatio-temporal data, typical approaches tend to focus solely on conditional expectations of the output variables being modeled. In this paper, we propose a multi-output multi-quantile deep learning approach for jointly modeling se… ▽ More Spatio-temporal problems are ubiquitous and of vital importance in many research fields. Despite the potential already demonstrated by deep learning methods in modeling spatio-temporal data, typical approaches tend to focus solely on conditional expectations of the output variables being modeled. In this paper, we propose a multi-output multi-quantile deep learning approach for jointly modeling several conditional quantiles together with the conditional expectation as a way to provide a more complete "picture" of the predictive density in spatio-temporal problems. Using two large-scale datasets from the transportation domain, we empirically demonstrate that, by approaching the quantile regression problem from a multi-task learning perspective, it is possible to solve the embarrassing quantile crossings problem, while simultaneously significantly outperforming state-of-the-art quantile regression methods. Moreover, we show that jointly modeling the mean and several conditional quantiles not only provides a rich description about the predictive density that can capture heteroscedastic properties at a neglectable computational overhead, but also leads to improved predictions of the conditional expectation due to the extra information and a regularization effect induced by the added quantiles. △ Less

Submitted 27 August, 2018; originally announced August 2018.

Comments: 12 pages, 9 figures

arXiv:1808.06910 [pdf, other]

doi 10.1016/j.trc.2019.07.006

Scalable Population Synthesis with Deep Generative Modeling

Authors: Stanislav S. Borysov, Jeppe Rich, Francisco C. Pereira

Abstract: Population synthesis is concerned with the generation of synthetic yet realistic representations of populations. It is a fundamental problem in the modeling of transport where the synthetic populations of micro-agents represent a key input to most agent-based models. In this paper, a new methodological framework for how to 'grow' pools of micro-agents is presented. The model framework adopts a dee… ▽ More Population synthesis is concerned with the generation of synthetic yet realistic representations of populations. It is a fundamental problem in the modeling of transport where the synthetic populations of micro-agents represent a key input to most agent-based models. In this paper, a new methodological framework for how to 'grow' pools of micro-agents is presented. The model framework adopts a deep generative modeling approach from machine learning based on a Variational Autoencoder (VAE). Compared to the previous population synthesis approaches, including Iterative Proportional Fitting (IPF), Gibbs sampling and traditional generative models such as Bayesian Networks or Hidden Markov Models, the proposed method allows fitting the full joint distribution for high dimensions. The proposed methodology is compared with a conventional Gibbs sampler and a Bayesian Network by using a large-scale Danish trip diary. It is shown that, while these two methods outperform the VAE in the low-dimensional case, they both suffer from scalability issues when the number of modeled attributes increases. It is also shown that the Gibbs sampler essentially replicates the agents from the original sample when the required conditional distributions are estimated as frequency tables. In contrast, the VAE allows addressing the problem of sampling zeros by generating agents that are virtually different from those in the original data but have similar statistical properties. The presented approach can support agent-based modeling at all levels by enabling richer synthetic populations with smaller zones and more detailed individual characteristics. △ Less

Submitted 1 May, 2019; v1 submitted 21 August, 2018; originally announced August 2018.

Comments: 27 pages, 15 figures, 4 tables

Journal ref: Transport. Res. Part C: Emerg. Technol., 106 (2019), pp. 73-97

arXiv:1710.07032 [pdf, other]

SLING: A framework for frame semantic parsing

Authors: Michael Ringgaard, Rahul Gupta, Fernando C. N. Pereira

Abstract: We describe SLING, a framework for parsing natural language into semantic frames. SLING supports general transition-based, neural-network parsing with bidirectional LSTM input encoding and a Transition Based Recurrent Unit (TBRU) for output decoding. The parsing model is trained end-to-end using only the text tokens as input. The transition system has been designed to output frame graphs directly… ▽ More We describe SLING, a framework for parsing natural language into semantic frames. SLING supports general transition-based, neural-network parsing with bidirectional LSTM input encoding and a Transition Based Recurrent Unit (TBRU) for output decoding. The parsing model is trained end-to-end using only the text tokens as input. The transition system has been designed to output frame graphs directly without any intervening symbolic representation. The SLING framework includes an efficient and scalable frame store implementation as well as a neural network JIT compiler for fast inference during parsing. SLING is implemented in C++ and it is available for download on GitHub. △ Less

Submitted 19 October, 2017; originally announced October 2017.

arXiv:1702.08745 [pdf]

Optimal Categorical Attribute Transformation for Granularity Change in Relational Databases for Binary Decision Problems in Educational Data Mining

Authors: Paulo J. L. Adeodato, Fábio C. Pereira, Rosalvo F. Oliveira Neto

Abstract: This paper presents an approach for transforming data granularity in hierarchical databases for binary decision problems by applying regression to categorical attributes at the lower grain levels. Attributes from a lower hierarchy entity in the relational database have their information content optimized through regression on the categories histogram trained on a small exclusive labelled sample, i… ▽ More This paper presents an approach for transforming data granularity in hierarchical databases for binary decision problems by applying regression to categorical attributes at the lower grain levels. Attributes from a lower hierarchy entity in the relational database have their information content optimized through regression on the categories histogram trained on a small exclusive labelled sample, instead of the usual mode category of the distribution. The paper validates the approach on a binary decision task for assessing the quality of secondary schools focusing on how logistic regression transforms the students and teachers attributes into school attributes. Experiments were carried out on Brazilian schools public datasets via 10-fold cross-validation comparison of the ranking score produced also by logistic regression. The proposed approach achieved higher performance than the usual distribution mode transformation and equal to the expert weighing approach measured by the maximum Kolmogorov-Smirnov distance and the area under the ROC curve at 0.01 significance level. △ Less

Submitted 28 February, 2017; originally announced February 2017.

Comments: 5 pages, 2 figures, 2 tables

ACM Class: I.2; H.2.8; J.1

arXiv:1502.03634 [pdf, other]

Activity recognition for a smartphone and web based travel survey

Authors: Youngsung Kim, Francisco C. Pereira, Fang Zhao, A**kya Ghorpade, P. Christopher Zegras, Moshe Ben-Akiva

Abstract: In transport modeling and prediction, trip purposes play an important role since mobility choices (e.g. modes, routes, departure times) are made in order to carry out specific activities. Activity based models, which have been gaining popularity in recent years, are built from a large number of observed trips and their purposes. However, data acquired through traditional interview-based travel sur… ▽ More In transport modeling and prediction, trip purposes play an important role since mobility choices (e.g. modes, routes, departure times) are made in order to carry out specific activities. Activity based models, which have been gaining popularity in recent years, are built from a large number of observed trips and their purposes. However, data acquired through traditional interview-based travel surveys lack the accuracy and quantity required by such models. Smartphones and interactive web interfaces have emerged as an attractive alternative to conventional travel surveys. A smartphone-based travel survey, Future Mobility Survey (FMS), was developed and field-tested in Singapore and collected travel data from more than 1000 participants for multiple days. To provide a more intelligent interface, inferring the activities of a user at a certain location is a crucial challenge. This paper presents a learning model that infers the most likely activity associated to a certain visited place. The data collected in FMS contain errors or noise due to various reasons, so a robust approach via ensemble learning is used to improve generalization performance. Our model takes advantage of cross-user historical data as well as user-specific information, including socio-demographics. Our empirical results using FMS data demonstrate that the proposed method contributes significantly to our travel survey application. △ Less

Submitted 12 February, 2015; originally announced February 2015.

ACM Class: D.2.8; I.5.2; I.5.5

arXiv:physics/0004057 [pdf, ps, other]

The information bottleneck method

Authors: Naftali Tishby, Fernando C. Pereira, William Bialek

Abstract: We define the relevant information in a signal $x\in X$ as being the information that this signal provides about another signal $y\in \Y$. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal $x$ requires more than just predicting $y$, it also requires spec… ▽ More We define the relevant information in a signal $x\in X$ as being the information that this signal provides about another signal $y\in \Y$. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal $x$ requires more than just predicting $y$, it also requires specifying which features of $\X$ play a role in the prediction. We formalize this problem as that of finding a short code for $\X$ that preserves the maximum information about $\Y$. That is, we squeeze the information that $\X$ provides about $\Y$ through a `bottleneck' formed by a limited set of codewords $\tX$. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure $d(x,\x)$ emerges from the joint statistics of $\X$ and $\Y$. This approach yields an exact set of self consistent equations for the coding rules $X \to \tX$ and $\tX \to \Y$. Solutions to these equations can be found by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm. Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere. △ Less

Submitted 24 April, 2000; originally announced April 2000.

arXiv:cs/9809110 [pdf, ps, other]

Similarity-Based Models of Word Cooccurrence Probabilities

Authors: Ido Dagan, Lillian Lee, Fernando C. N. Pereira

Abstract: In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations ``eat a peach'' and ``eat a beach'' is more likely. Statistical NLP methods determine the likelihood of a word combination from its frequency in a training corpus. However, the… ▽ More In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations ``eat a peach'' and ``eat a beach'' is more likely. Statistical NLP methods determine the likelihood of a word combination from its frequency in a training corpus. However, the nature of language is such that many word combinations are infrequent and do not occur in any given corpus. In this work we propose a method for estimating the probability of such previously unseen word combinations using available information on ``most similar'' words. We describe probabilistic word association models based on distributional word similarity, and apply them to two tasks, language modeling and pseudo-word disambiguation. In the language modeling task, a similarity-based model is used to improve probability estimates for unseen bigrams in a back-off language model. The similarity-based method yields a 20% perplexity improvement in the prediction of unseen bigrams and statistically significant reductions in speech-recognition error. We also compare four similarity-based estimation methods against back-off and maximum-likelihood estimation methods on a pseudo-word sense disambiguation task in which we controlled for both unigram and bigram frequency to avoid giving too much weight to easy-to-disambiguate high-frequency configurations. The similarity-based methods perform up to 40% better on this particular task. △ Less

Submitted 27 September, 1998; originally announced September 1998.

Comments: 26 pages, 5 figures

ACM Class: I.2.7; I.2.6

Journal ref: Machine Learning, 34, 43-69 (1999)

arXiv:cmp-lg/9607016 [pdf, ps, other]

Beyond Word N-Grams

Authors: Fernando C. N. Pereira, Yoram Singer, Naftali Tishby

Abstract: We describe, analyze, and evaluate experimentally a new probabilistic model for word-sequence prediction in natural language based on prediction suffix trees (PSTs). By using efficient data structures, we extend the notion of PST to unbounded vocabularies. We also show how to use a Bayesian approach based on recursive priors over all possible PSTs to efficiently maintain tree mixtures. These mix… ▽ More We describe, analyze, and evaluate experimentally a new probabilistic model for word-sequence prediction in natural language based on prediction suffix trees (PSTs). By using efficient data structures, we extend the notion of PST to unbounded vocabularies. We also show how to use a Bayesian approach based on recursive priors over all possible PSTs to efficiently maintain tree mixtures. These mixtures have provably and practically better performance than almost any single model. We evaluate the model on several corpora. The low perplexity achieved by relatively small PST mixture models suggests that they may be an advantageous alternative, both theoretically and practically, to the widely used n-gram models. △ Less

Submitted 13 July, 1996; originally announced July 1996.

Comments: 15 pages, one PostScript figure, uses psfig.sty and fullname.sty. Revised version of a paper in the Proceedings of the Third Workshop on Very Large Corpora, MIT, 1995

arXiv:cmp-lg/9603002 [pdf, ps, other]

Finite-State Approximation of Phrase-Structure Grammars

Authors: Fernando C. N. Pereira, Rebecca N. Wright

Abstract: Phrase-structure grammars are effective models for important syntactic and semantic aspects of natural languages, but can be computationally too demanding for use as language models in real-time speech recognition. Therefore, finite-state models are used instead, even though they lack expressive power. To reconcile those two alternatives, we designed an algorithm to compute finite-state approxim… ▽ More Phrase-structure grammars are effective models for important syntactic and semantic aspects of natural languages, but can be computationally too demanding for use as language models in real-time speech recognition. Therefore, finite-state models are used instead, even though they lack expressive power. To reconcile those two alternatives, we designed an algorithm to compute finite-state approximations of context-free grammars and context-free-equivalent augmented phrase-structure grammars. The approximation is exact for certain context-free grammars generating regular languages, including all left-linear and right-linear context-free grammars. The algorithm has been used to build finite-state language models for limited-domain speech recognition tasks. △ Less

Submitted 8 March, 1996; originally announced March 1996.

Comments: 24 pages, uses psfig.sty; revised and extended version of the 1991 ACL meeting paper with the same title

arXiv:cmp-lg/9603001 [pdf, ps, other]

Speech Recognition by Composition of Weighted Finite Automata

Authors: Fernando C. N. Pereira, Michael D. Riley

Abstract: We present a general framework based on weighted finite automata and weighted finite-state transducers for describing and implementing speech recognizers. The framework allows us to represent uniformly the information sources and data structures used in recognition, including context-dependent units, pronunciation dictionaries, language models and lattices. Furthermore, general but efficient alg… ▽ More We present a general framework based on weighted finite automata and weighted finite-state transducers for describing and implementing speech recognizers. The framework allows us to represent uniformly the information sources and data structures used in recognition, including context-dependent units, pronunciation dictionaries, language models and lattices. Furthermore, general but efficient algorithms can used for combining information sources in actual recognizers and for optimizing their application. In particular, a single composition algorithm is used both to combine in advance information sources such as language models and dictionaries, and to combine acoustic observations and information sources dynamically during recognition. △ Less

Submitted 7 March, 1996; originally announced March 1996.

Comments: 24 pages, uses psfig.sty

arXiv:cmp-lg/9503008 [pdf, ps, other]

Ellipsis and Higher-Order Unification

Authors: Mary Dalrymple, Stuart M. Shieber, Fernando C. N. Pereira

Abstract: We present a new method for characterizing the interpretive possibilities generated by elliptical constructions in natural language. Unlike previous analyses, which postulate ambiguity of interpretation or derivation in the full clause source of the ellipsis, our analysis requires no such hidden ambiguity. Further, the analysis follows relatively directly from an abstract statement of the ellipsis… ▽ More We present a new method for characterizing the interpretive possibilities generated by elliptical constructions in natural language. Unlike previous analyses, which postulate ambiguity of interpretation or derivation in the full clause source of the ellipsis, our analysis requires no such hidden ambiguity. Further, the analysis follows relatively directly from an abstract statement of the ellipsis interpretation problem. It predicts correctly a wide range of interactions between ellipsis and other semantic phenomena such as quantifier scope and bound anaphora. Finally, although the analysis itself is stated nonprocedurally, it admits of a direct computational method for generating interpretations. △ Less

Submitted 8 March, 1995; originally announced March 1995.

Comments: 54 pages

Report number: CSLI-19-91 and Xerox SSL-91-105

Journal ref: Linguistics and Philosophy 14(4):399-452

arXiv:cmp-lg/9404008 [pdf, ps, other]

Principles and Implementation of Deductive Parsing

Authors: Stuart M. Shieber, Yves Schabes, Fernando C. N. Pereira

Abstract: We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definite-clause grammars a… ▽ More We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definite-clause grammars and other logic grammar formalisms, and has been used for rapid prototy** of parsing algorithms for a variety of formalisms including variants of tree-adjoining grammars, categorial grammars, and lexicalized context-free grammars. △ Less

Submitted 26 April, 1994; originally announced April 1994.

Comments: 69 pages, includes full Prolog code

Report number: CRCT TR-11-94 (Computer Science Department, Harvard University)

Showing 1–48 of 48 results for author: Pereira, F C